#qi-hardware IRC log for Monday, 2011-09-26

wolfspraulif a full nanonote openwrt build on the buildhost takes 30 hours now, how can we determine what the bottleneck is?02:14
wolfspraul1) cpu 2) hdd 3) memory02:14
wolfspraulothers?02:15
larscif you havn't already you might want to consider to use ccache02:30
wolfspraulthanks!02:38
wolfspraulI guess we can try that on the existing machine first02:38
mthyes, most builds will be almost the same as the previous one, so ccache shoudl work really well02:38
wolfspraulcan it easily be enabled in openwrt?02:38
mthalternatively, try to not do full builds, but that might be a bigger developer time investment02:39
wolfspraulalso I guess we assume its02:39
wolfspraulbug-free :-)02:39
wolfspraulwell, one purpose of the "full" builds is to rule out problems from incremental builds02:39
wolfspraulthere's a reason so many devs first erase everything and build from scratch, must be from their experience :-)02:39
mthyes, it's something that is necessary in practice but not in theory02:40
mthI'm wondering if it is feasible to set up a build system where dependency checking is reliable enoug to actually trust it02:40
wolfspraulshow me one dev who is doing some "incremental" build magic, and when running into anything "strange" wouldn't first nuke all the temp files and start over? :-)02:41
mthbut that's a very long term approach02:41
wolfspraulok, so ccache. good idea, how can it be enabled?02:41
wolfspraulis it easily supported with openwrt? I'll look into it02:41
wolfspraulin parallel the raw performance of the machine is also something that can be improved02:41
wolfspraulbut I'm trying to find out the bottleneck - cpu/mem/hdd02:41
mthafaik it's done by setting CC and CXX to point to ccache rather than the actual compiler02:42
wolfspraulI'm wondering what it is doing all these 30 hours02:42
wolfspraulthe hdd is a raid-0 over 2 disks (no ssd)02:42
mth"time" should tell you whether CPU is the bottleneck: compare real time with user time02:42
wolfspraulwe could increase memory and try with a memory based /tmp or so02:42
mthmem could be checked by monitoring how much is swapped out and how much cache is available02:42
wolfspraulthe cpu is a single-core 64-bit, that could be increased as well02:42
mtha quad core would build about 3-3.5 times as fast as a single core in my experience02:43
mthmaybe a bit less if you have lots of small packages02:43
wolfspraulthat's assuming that amount of memory or hdd/sdd speed are not the bottleneck02:43
wolfspraulso you say in your experience it will be the CPU?02:44
mthI can build the OpenDingux rootfs in a quad-core VM on an i7 in 25 minutes02:44
wolfspraulok I think we build a few thousand packages here, in 30 hours02:44
wolfsprauland I'm trying to understand which hardware improvement would help the most02:44
mththat's far less packages than OpenWRT, I guess, but still quite a lot02:44
wolfspraulcpu, memory, hdd/ssd02:44
wolfspraulI don't think the build process will max out mem02:45
wolfsprauland we don't have a ramdisk (maybe we should?)02:45
wolfspraulso yeah, probably the cpu. an SSD would probably also help a lot.02:45
mthwould ramdisk be faster than sufficient memory for caching?02:46
mthat least with caching you don't have to manually manage it02:46
wolfspraulwell I don't know who is using the resources and to which degree02:46
wolfspraulactually I think it is running builds most of the time02:47
wolfspraulchecking...02:47
mthyou'd need some background process gathering vital stats of the system say, once a minute, and log them02:48
mthperhaps existing network monitoring tools already do that?02:48
mththe kind they use to keep track of server farms02:48
mthnagios etc02:48
mthone big problem that's hard to get rid of is autoconf02:49
mththat won't utilize multiple cores02:50
mthand it takes a significant amount of time for the build of small packages02:50
mthit's really overdue for replacement, imo02:50
wolfspraulit's so badly designed that it will never be possible to be replaced02:51
wolfspraulsurvival strategy02:51
mthyou could speed it up by caching probe results, but I don't know how reliable that is if you mix different versions and possibly customized rules02:51
wolfspraulno no02:51
wolfspraulI am looking for some easy way to speed up02:51
wolfspraulnot to be stuck with arcane problems for a few years02:51
wolfspraulccache sounds interesting if a) it's easy to enable b) it's bug-free02:52
wolfspraul:-)02:52
mthnothing non-trivial is bug-free, but I think ccache's approach is low-risk02:53
mthsince it uses the preprocessed input to do the lookup in the cache02:53
wolfspraulsure I was joking02:53
wolfspraula build is indeed running today02:53
wolfsprauland I think the machine is doing this for weeks02:53
wolfspraulis the kernel or anybody collecting any load statistics that I can easily look at now?02:54
mthyou might have to flush the cache if you update the compiler, I'm not sure about that02:54
mth"top" would be a start02:55
mthit should at least give you an impression of CPU and memory use02:55
kristianpauliotop may help a bit too02:57
wolfspraulok I looked at vmstat 1 for a while. indeed it looks like mostly cpu bound, and/or memory speed03:00
wolfspraulnot amoutn of memory (1.5gb of 2 used, but lots of buffers, swap very lightly used if at all)03:01
wolfspraulalso not disk speed I think03:01
wolfspraulall seems to be cpu and/or memory speed03:01
mthdisk speed might become a factor once you switch to multiple cores03:01
mthso don't spend all your money at once03:01
wolfspraulsure, something always bubbles up03:01
wolfspraulyou make one piece faster, then one or multiple of the others become relatively bigger :-)03:02
wolfspraulok so: 1) try ccache 2) upgrade cpu, maybe a little more memory just in case03:03
kristianpaulor if you still like visually/fun  debugging try watch --color -d 'ps -x -kpcpu -o pid,pcpu,args'03:03
mthnot just relatively, if you start using multiple cores the access pattern will change as well03:03
mthit will be less localized03:03
kristianpaulvmstat wont tell you i/o problems i remenber03:03
mthyou can detect I/O problems indirectly: if there is enough memory and the CPUs are not fully utilized, the I/O must be the bottleneck03:04
mthwell, or you're not actually running in parallel (small packages, scripts like configure)03:05
mthbuildroot will only use multiple jobs within one package, not build two packages at once03:05
mthI don't know if OpenWRT still has that limitation as well or whether it was removed there03:06
kristianpaulanyway if you all took 30hrs worth install munin munin-node i bet03:07
kristianpaulat least you can get interesting resources utilization stats over a week03:07
kristianpaulnot on the last second :)03:07
wolfspraulnot sure03:08
wolfspraulall I've seen munin create so far is a lot of data that adds a lot of confusion03:08
kristianpauljust check what you need ;)03:08
wolfspraulwhereas I can just login to the running machine and look at the load for a little while with simple commands, and get a good understanding where the bottleneck is03:08
wolfspraulwell, just saying from past experience. that could have well been me.03:09
kristianpaulnot over the time tought03:09
wolfspraulbut I just see dozens of pretty charts but little conclusive value03:09
kristianpaulindeed, it always depend what are you looking for03:09
wolfspraulthe pattern is quite stable, if you don't see something over 5 minutes I'd say it's not very relevant to the machine's performance anyway03:09
wolfspraulif you have some backup running once every 24h, that's a special thing and what is happening in those x minutes is not representative either03:10
kristianpaulfor example the process that eats more cpu/mem over a longer period of time, but i dont get to that yet tought :/03:10
kristianpaul(5 minutes) yeah ;)03:11
wolfspraulcpu seems super busy, ca 80% us, ca. 20% sy03:11
wolfspraulcpu upgrade it is03:11
wolfsprauland faster memory03:11
kristianpaulwhats load average?03:11
wolfspraulno need to waste money on an ssd now, I think the raid-0 over two normal hdds is not bad03:11
wolfspraulload average: 1.03, 1.01, 1.1503:12
kristianpauldint look that bad03:12
kristianpaulis still compiling right?03:13
wolfspraulyes compiling all the time I think :-)03:14
mthload is more-or-less the number of processes waiting for CPU time, correct?03:15
mththen a load of ~1 is what I'd expect on a single core -j1 compile03:15
mthwell, if I/O were a big problem the load would be below 1, so it does point towards the CPU as the bottleneck03:15
wpwrakccache is reasonably safe. i once managed to create a pathological case where the difference was deep in one of the more unusual ELF sections (i don't remember the details, but i think it was with umlsim), where the ccache folks just accepted defeat, but if you don't drive it to extremes, it'll serve you well. even compiler upgrades should be okay.04:56
wpwrakah, he already left04:56
qi-botThe build was successfull, see images here: http://fidelio.qi-hardware.com/~xiangfu/compile-log/openwrt-xburst.full_system-09252011-0252/07:40
kyakviric: ping16:29
virickyak: pong from thousands of km16:46
kyakviric: nevermind, i just built the offrss and giving it a try :)17:23
kyakhad some problems with my eyes. Somehow i thought that libmrss-0.9 > libmrss-0.19.217:24
kyakthat's a mindtrick :)17:24
virichaha17:33
viricsometimes even configure scripts make that errors17:33
virickyak: oth, I feel honored :)17:33
kyakviric: btw, i had to add -I/usr/include/curl in the Makefile and #include <curl.h> in the offrss.c17:34
kyak:)17:34
viricah17:34
viricinteresting17:34
viricI never built offrss on non-nix17:34
kyakdamn the X over network is slow.. even in my home network17:36
kyaki have to keep it locally or use console brwoser17:37
kyakfor newsbeuter, i sometimes use the "External actions" feature or whatever it is called. It is when the article is passed to some external program; i use to download things from torrent17:38
kyakviric: hm, it's interesting - when i start it like "WEBBROWSER=links ./offrss -w", it won't work. The links shows up, but can't connect to server17:41
kyakwhen i start as ./offrss -w and then just links http://localhost:8090, it works fine17:41
kyakoh, a segfaul in podofo...17:44
viricin podofo?18:06
virickyak: what version of podofo? Have you linked podofo?18:07
viric(any gdb bt?)18:07
kyakviric: podofo 0.7.0, the one supplied with my distro, no gdb bt yet18:23
qi-bot[commit] Werner Almesberger: m1/perf/eval.pl: warn if an instruction reads and writes from the same register (master) http://qi-hw.com/p/wernermisc/5bf9ae020:27
qi-bot[commit] Werner Almesberger: m1/perf/sched.c: use calloc instead of malloc plus memset (master) http://qi-hw.com/p/wernermisc/0a7e5b120:27
qi-bot[commit] Werner Almesberger: m1/perf/sched.c: return -1 if malloc fails (master) http://qi-hw.com/p/wernermisc/24a9b8520:27
qi-bot[commit] Werner Almesberger: m1/perf/sched.c: code cleanup (no functional changes) (master) http://qi-hw.com/p/wernermisc/35e990320:27
wpwraki like it when qi-bot calls me "master". i always imagine "i dream of jeannie" ;-)20:33
viric:)20:34
larsci don't want to destroy your dreams, but i think it's referring to the branch name ;)20:36
wpwraki'll choose to disregard this opinion of yours :)20:49
--- Tue Sep 27 201100:00

Generated by irclog2html.py 2.9.2 by Marius Gedminas - find it at mg.pov.lt!