#milkymist IRC log for Wednesday, 2011-08-24

awwpwrak, 0x3c: no critter being reproduced now after powered on, d2/d3 dimly lit. before powered-on, impedance TP36 - pin 34 of NOR: 20 KOhm(not constant) / 125 KOhm01:40
awwpwrak, sorry, mend above as: no critter being reproduced now after powered on, d2/d3 is fully off. before powered-on, impedance TP36 - pin 34 of NOR: 20 KOhm(not constant) / 125 KOhm01:41
awcontinue to test 'fix2b' boards...01:47
wpwrakhmm, so 0x3c pretends to be good now01:57
wpwrakaw: did you try 0x77 again ? this time the 3.3 V injection on TP3601:57
wpwrakaw: (0x77) monitor TP36 and pin 34. reproduce the nastiness. then, while monitoring, connect TP36 through 100 Ohm to 3V3. see what happens01:58
awwpwrak, (0x77) is not easily to reproduce today though. before I connect 100 Ohm to 3.3V, i've seen instability once, after connecting TP36 through 100 ohm to 3V3. I've NOT seen instability until now maybe 5 minutes passed02:43
awwpwrak, not sure if TP36 100 ohm pulled high to cause.02:44
awwpwrak, (after TP36 through 100 Ohm to 3.3V) DQ8 is normally low when rendering. Stay HIGH(3V) when in reconfigure stage, normal pulses accessed after pressed middle btn(to boot up) then kept low steadily02:56
kristianpaulhttp://www.linuxfordevices.com/c/a/News/IBM-SyNAPSE-neural-computing-project-demonstrated/ <- Moving beyond von Neumann05:03
xiangfukristianpaul, which version toolchian you using?05:22
xiangfuI try to compile the latest rtems gcc.05:23
xiangfubut always stop at "checking whether the target assembler supports thread-local storage..." stop there hours.05:23
GitHub68[scripts] xiangfu pushed 1 new commit to master: http://git.io/kHTZZg06:56
GitHub68[scripts/master] compile-lm32-rtems: update gcc to 4.6.1 - Xiangfu Liu06:56
wpwrakaw: (0x77) can you try to let it run without pulling TP36 up ? wait until the anomaly appears, and only then pull TP36 ? (i.e., when it has already started to act weird)07:01
wpwrakaw: from what you've described to far, pulling seems to prevent it from entering anomalous behaviour. but i'd also be interested to know if pulling makes it exit anomalous behaviour.07:02
awwpwrak, I'll try it but i'm testing other. ;-)07:03
wpwrakaw: ok. and after that, the next test would be similar: monitor TP36 and pin 34, wait until the anomaly happens, then pull DQ8 and see what happens.07:19
wpwrakaw: what i'm trying to find out is whether the synchronization between DQ8 and PROGRAM_B is cause or effect07:20
awwpwrak, ok07:20
wpwrakaw: also, when you saw it boot normally, did DQ8 also gave the runts ? runts = the little spikes at t = -900 ns, -500 ns, -200 ns, +200 ns, +550 ns, +900 ns, +1200 ns in http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_0x77_ch1-TP36_ch2-NOR-pin34-DQ8_500ns.JPG.JPG07:22
awwpwrak, used a 10us/div, so didn't see runts in details. I'll watch it.07:25
wpwrakaw: at 10 us/div, you still see them as ~1 V "noise floor": http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_0x77_ch1-TP36_ch2-NOR-pin34-DQ8.JPG07:29
awwpwrak, i didn't triggered this morning though. ;-)07:33
wpwrakaw: you mean you didn't see the anomaly on 0x77 today ? or that you don't remember seeing the runts today ?07:34
awwpwrak, NO. i saw anomaly once before TP36 pulled 100Ohm to 3.3V. but I didn't do trigger. So if it had have runts, I don't know. ;-)07:36
wpwrakah, i see07:37
awso 100Ohm // 10KOhm almost 99Ohm which seems preventing nastiness from anomalous behaviour, well...will see.07:40
lekernelxiangfu, gcc 4.6.x doesn't work09:08
xiangfulekernel, oh, then which version I should update to?09:08
lekernel4.5.3 + the latest RTEMS patches09:09
xiangfuok. try to compile now.09:11
xiangfuthe ubuntu repo already have the compiled 4.5.3: http://www.rtems.org/ftp/pub/rtems/linux/4.11/ubuntu/09:12
lekernelyes, but it lacks the divider enabled multilibs09:13
lekernelif you can get pesky ralf to add them, good09:13
xiangfuwill try. then I will add the crc button in flickernoise. next to the "Check Version"09:15
xiangfuI don't want crc command any more :)09:15
lekernelwhere is flickernoise supposed to get the CRCs from?09:16
xiangfulekernel, (4.5.3) I just install then uninstall them. needs compile version. maybe setup all those stuff one day in BUILDHOST09:16
xiangfubutton maybe not. crc needs length  :(09:17
lekerneland what feature does a "CRC" button bring to the user?09:17
lekernelimo this only belongs in RTEMS09:17
lekernelthe GUI shouldn't have system programming functions, only end user stuff09:18
Fallenou11:15 < lekernel> if you can get pesky ralf to add them, good < ahah09:19
xiangfulekernel, ok. got it.09:21
lekernelif you want to be able to use the RTEMS shell without a serial cable you can 1) use telnet 2) write a terminal program in the GUI (activated with a relatively long keyboard shortcut, e.g. ctrl-alt-shift-t)09:22
lekernelfor the terminal program, you could certainly use the "text editor" widget as a starting point09:24
lekernelit shouldn't be hard, if you don't want colors, cursor control sequences, and things like that09:25
xiangfudefinitely without colors etc. :)09:44
kristianpaulxiangfu, gcc versiĆ³n 4.5.2 (GCC)11:10
kristianpaulold-rtems i guess... i dint update toolchain since months ago..11:11
wolfspraulwpwrak: oh well. bad news (for me :-)) we got the first failure after 4th rendering cycle with fix2b applied11:22
wolfspraul0x4C is the magic number. http://en.qi-hardware.com/wiki/Milkymist_One_run_3_schedule#Test_Results11:22
wolfspraulcan't believe it but I guess that's what we found... So - Adam is still wrapping up some work, he will stop by here later to discuss what we can learn from 0x4C11:24
wpwrakwolfspraul: ah well, it had to happen sooner or later. will be interesting to see which symptoms he discovers.11:26
wolfspraulSebastien was wondering whether Adam is using leaded solder, otherwise he suspected whiskers http://en.wikipedia.org/wiki/Whisker_%28metallurgy%2911:26
wolfspraulwhy did it have to happen sooner or later?11:26
wolfspraulyou knew it coming?11:26
lekernelbecause fix2b isn't supposed to do anything, I'd guess .....11:27
wolfspraulI was hopeful that the other boards were contained at some earlier state of testing, but I guess you were right all along.11:27
wpwrakwolfspraul: i suspect we're not done yet with the 0x3c/0x77 cluster. and that one already showed all the promises of a long tail.11:27
wolfspraulI cannot even read the notes of 0x3C/0x77 reasonably, so long are they.11:27
wolfspraulwpwrak: but why does it fail after several rendering cycles? what triggers the failure?11:28
wpwrak(whiskers) hard to tell. i've never seen such things in real life and i don't quite know what you have to do wrong to get them.11:28
wolfspraulit seems Sebastien says if Adam uses leaded solder we can rule whiskers out11:29
wpwrakwolfspraul: maybe temperature. maybe it has a certain trigger condition. etc.11:29
wolfspraultemperature, argh11:29
wolfspraulbut how can we get the design stable enough so this goes away?11:29
wolfspraulwith the 0x4C results as I understand them now (not yet 100% confirmed), my confidence in selling boards has dropped quite low again11:30
wolfspraulhow can we rule out that boards just spontaneously fail?11:30
wpwrakwolfspraul: there are probably symptoms we can detect all the time. just rendering isn't a good test.11:30
wolfspraulany scope measurements we can do to see early warning symptoms?11:31
wpwrakwolfspraul: e.g., 0x77 has a "resistance" between TP36 and pin 34 that's different from the rest of the herd. there may be more such symptoms that can be used for diagnosis.11:31
wolfspraulsure if we can identify a strong pass/fail test that would solve the most critical rc3 problem11:31
wolfspraulactually in all the crazy reworks and testing, the yield is getting quite good, he he. not that there is much to laugh about in this run.11:32
wpwrakwolfspraul: i'd look more for DC things. the scope is often a tricky instrument to use. particularly if you're looking for the absence of an event11:32
wolfspraulwe are up to 60 'good' boards now11:32
wolfspraulin the end we make 80 I think11:32
wolfspraulbut this stuff is painful, so many people are waiting for their m1...11:32
wpwrakyes, the overall yield now looks very promising11:32
wolfsprauloh sure. measure resistance between tp36 and pin34 is a promising test?11:33
wpwrakif you want to push things forward, you can also indicate that there may be this problem that's still being analyzed, and offer a rebate or replacement for rc4 in case this turns out to be bad for those who decide to take the risk11:34
wolfspraulnow all the good work that went into 0x77 and 0x3C comes to help. I apologize for rushing earlier. I was wrong :-)11:34
wpwrakof course, it's trading time vs. risk of future expenses11:34
wolfspraulnah, most people after fully understanding the issue would say "please ship me the m1 when it really works"11:35
wolfspraulif I say "it could spontaneously fail at any time", that's not good11:35
wpwrak(tp46-pin34) it didn't yield anything suspicious on 0x3c. so that still needs more investigation.11:35
wolfspraulif we can find a strong test, that's all we may need11:35
wolfspraulthe rendering cycles test is no good for that11:35
wolfspraulunless we find a nor corruption now on 0x4C and we believe it's caused by the power down/reset ic situation.11:36
wpwrak(spontaneous failure) depends a bit on the use. if it's for development or evaluation, that would be acceptable. maybe even for studio work if a reasonable work-around can be found (such as "let it cool down for 10 minutes")11:36
wolfspraulbut we suspected that many times and so far I believe it hasn't materialized yet11:36
wpwrakof course, unreliable hw is the last thing you want during a live performance :)11:37
wolfspraulyes, no. it's not good.11:37
wolfsprauland if it were as easy as 10 minutes that would be nice.11:37
wolfspraulit could be a day, it could be forever.11:37
wolfspraultest! that's a good approach. we need to do some comparative testing to find early warning signs.11:37
wpwraklekernel: btw, under what conditions is NOR accessed (read or write) after booting. e.g., is the "file system" mirrored in RAM or do reads go to NOR as well ?11:37
wpwrak(test) yes, step one: find a pattern that leads to the underlying defect. then look for things the defect may affect and see if any of them can be tested.11:39
wolfspraulwpwrak: that's pin 34 of which chip? nor chip?11:39
wpwraki just hope it's not just some wild ESD havoc, because that could be fairly unpredictable11:39
wpwrakpin 34 of NOR, yes. it's on a ball next to PROGRAM_B on the FPGA's BGA11:40
wpwrakand the trace to pin 34 is adjacent to out rework zone. that's why i looked for it in the first place.11:40
wpwrak(i was hoping for some soldering bridge that somehow reached the trace. didn't quite expect something that looks like a semi-fried FPGA. but well, you have to take things as they come ;-)11:41
wpwrakif adam continues with the testing, we may also find more boards for the cluster. the more, the merrier as far as analysis is concerned :)11:43
wpwrakafk for a bit. have a quick medical checkup today and then have to be back in time for the fedex man.11:44
wolfspraulI keep thinking about a connection and whether there is any insight to be discovered there.11:59
wpwrakbtw, the jtag test joerg suggested would be a good thing to have. a systematic test would also catch things that don't cause a noticeable upset during regular operation.11:59
wolfspraulso lekernel says "fix2b is doing nothing".11:59
wolfspraulthat neglects the dynamics of the production and testing process of course. in reality fix2b helped us to reduce the number of boards that failed after x rendering cycles a lot.12:00
wolfspraulso let's only think about those now - boards that fail after rendering cycle >= 212:00
wolfspraulwhy did fix2b fix them?12:00
wpwrakfix2b removes potentially troublesome components. potentially as in we've already seen them act up.12:00
wolfsprauland why does there seem to be another case now with 0x4C ?12:01
wpwrakstatistics :)12:01
wolfspraulin other words - whatever risk fix2b removed, in the same line of thought may be more risks12:01
wolfspraulwell it could be unrelated phenomena12:01
wolfspraulor just statistical nirvana, yes12:01
wolfspraulbut to me a board that failed after successful render cycles is still special12:02
wolfspraulsomething happened12:02
wolfsprauland that something went away with fix2b12:02
wolfspraulmy thinkin may look for the wrong root cause of course, I admit. just trying different logic.12:02
wolfspraulwpwrak: but that doesn't explain why they suddenly fail12:03
wolfspraulmy point is not bad soldering, bad component, etc.12:03
wolfspraulI'm thinking about the event that triggers the failure.12:03
wolfspraulwhich jtag test did joerg suggest? can we implement it easily?12:03
wpwrak(jtag) drive the pins to a set of states, see if they read back correct values, and measure system current all the while. not _easy_ to implement. but worthwhile :)12:27
wpwrak(sudden failure) statistics can explain all this. make five tests, then do something, make another five tests. probability of failure is 10% in each test. some will fail before the change, some after, some before and after, some never.12:28
wolfspraulyes you are right could be statistical issues12:29
wpwrakif you have 100 such boards, in fact about 35% would pass all ten tests without showing any problem. 3% of them will fail the very next test. and so on :)12:29
wolfspraulso maybe we focus on a strong pass/fail test only12:29
wolfspraulthat must be possible, statistics or not ;-)12:30
wolfspraulok the jtag test doesn't sound like something we can have in a few days, for rc3 in fact12:30
wpwrakyes, we need to get behind the statistics. find something that's not statistical. or if we really can't, characterize the pattern and design tests that have a high probability of producing the problem. that usually means to automate the tests.12:31
wpwrak(jtag) more like weeks12:31
wpwrakmaybe for rc4 ;-)12:31
wolfspraulfirst I want to read back NOR on 0x4C12:32
wolfspraulyou think it will show writes (corruptions)?12:32
wpwrakdunno. we don't understand the NOR corruption we've seen well enough yet.12:33
wpwrakbut i wouldn't be surprised if it did12:33
wolfspraulI hope not12:34
wpwrakalso, we don't know for sure if the data gets corrupted on read, on write, or both ways12:34
wolfspraulbut I think we never saw corruptions after the first rendering, which is still my hope that at that point we have a 'good' board12:34
wpwrakthe CFI may give us some clues. at least there, we have a bit of static information.12:35
wpwrakmay also be statistics ;)12:35
wpwrakmy rule of thumb is to do manual tests until i see something happen at least ~3 times. repeat 2-5 times to see if the frequency stays the same. then calculate the number of test cycles i need for sufficient probability of the thing happening. multiply with 10. then automate and let machines do what machines do best ;-)12:36
wpwrakCFI = an information structure in the flash. basically a set of parameters. with factory-defined (and a priori known to us) content.12:37
lekernelwpwrak, reads go to the NOR12:37
wpwraklekernel: do reads happen "all the time" ? particularly in the tests adam does, do reads happen frequently throughout the test ? or maybe only at the beginning ?12:38
lekernelafter flickernoise has booted, they only happen when the system configuration is read and patches are read for compilation12:39
lekernelshould be only at the beginning12:39
lekernelcompiled patches are stored in the SDRAM12:39
wpwrakso this means that "n hours of rendering" wouldn't tell us of any gremlins on the NOR bus12:40
wpwrakpower cycling would, though12:45
wpwrakafk for ~30-60 min12:45
wpwrakkewl. fedex say there's nothing to pay for the M1 :)13:48
wpwrakthey'll deliver it tomorrow13:49
wpwraklekernel: have you ever used jtag for a boundary scan ? that would allow a more systematic examination of things than the current functional testing13:50
GitHub88[scripts] xiangfu force-pushed master from 09211c4 to afda277: http://git.io/DOsw5Q13:52
GitHub88[scripts/master] compile-lm32-rtems: update gcc to 4.5.3 - Xiangfu Liu13:52
xiangfuafter update gcc to 4.5.3 and use newlib 1.19.0 when compile rtems. I still get error "configure: error: missing define CLOCK_PROCESS_CPUTIME_ID" :(14:08
xiangfuI missed the newlib patch :(14:28
Fallenouxiangfu: use 1.21.1 newlib14:30
xiangfuthe newlib upload today: "newlib-1.19.0-rtems4.11-20110724.diff24-Jul-2011 09:14 204K "14:30
xiangfuyes. try new newlib now ,14:30
Fallenouoops sorry14:30
Fallenouwas thinking about binutils14:30
wolfspraulcalling it a day, n8 everybody14:31
xiangfujust grep found the CLOCK_PROCESS_CPUTIME_ID is in newlib patches.14:31
wolfspraullet's not be too worried about 0x4D, soon we have a high quality rc3 result. I can sense it :-)14:31
Fallenoudamn it Ralf14:32
GitHub164[scripts] xiangfu pushed 1 new commit to master: http://git.io/eWPeRQ14:35
GitHub164[scripts/master] compile-lm32-rtems: update newlib patch - Xiangfu Liu14:35
xiangfuneeds another hour to build from scratch. then another hour for build flickernoise from scratch. compiling...14:42
Action: xiangfu 's job is keep fan stay 6000 RPM :)14:42
xiangfuFallenou, you saw the email from Ralf ? :)14:43
Fallenouhe's really pissing me off sometimes14:43
wpwrakFallenou: who is Ralf and what did he do ?14:56
lekernelwe should offer him this t-shirt for his birthday: http://images4.cpcache.com/product/retentive-monk-is+there+a+hyphen+in+anal-retentive%3F/142359664v4_225x225_Front.jpg14:57
FallenouRalf is a RTEMS maintainer/developer/guy14:57
Fallenouand he is just refusing some patches for obscure reasons sometimes14:57
Fallenouit's annoying14:57
kristianpaulwpwrak: http://www.rtems.org/pipermail/rtems-users/2011-August/008850.html14:58
Fallenoulekernel: LOL14:58
wpwraklekernel: ;-))14:59
lekernelxiangfu, Joel proposed that you compile some newlib code with and without the -mdivide-enabled/-mbarrel-shift-enabled and show that there is an optimization made by using those flags14:59
wpwraknice ;-)15:00
lekernelxiangfu, just take some source file in newlib, and compile it with -c and with/without the -m*-enabled flags, then disassemble with objdump15:00
wpwrakmaybe someone could convince him that multiplication is overrated, too :) and a barrel shifter. c'mon ! all this fancy new stuff.15:01
lekernelalso, this patch is upstream GCC now (but unfortunately, right now only in those 4.6 releases that do not work at all)15:01
xiangfulekernel, ok. I will do that. maybe tomorrow. very late today. and I need about 1 hour wait toolchain compile15:02
wpwrakso this ralf was quite right about "short-sighted" ;-)15:02
kristianpaulnew excuses to migrate linux? :)15:03
wpwrakoh yes :)15:03
Fallenoulekernel: can't you back port the multilib patch to gcc upstream 4.5.3 ?15:05
Fallenoumaybe it would make Ralk stfu15:05
lekernelit's done15:05
lekernelshould be in 4.5.4, but it's not released yet15:05
kristianpauldoes openwrt somewhere iplement a memtest app?..17:00
rohdunno. dont thinnk so17:03
rohbut i guess you can use the bootloader for that17:03
rohuboot can do weird stuff sometimes17:03
kristianpaulcool, the memtester founded in debian looks prety portable so far..17:56
kristianpaulhum... /opt/rtems-4.11/libexec/gcc/lm32-rtems4.11/4.5.2/cc1: error while loading shared libraries: libmpc.so.2: cannot open shared object file: No such file or directory18:09
kristianpaulFallenou: you can run ISE on mac os?18:25
kristianpaullarsc: is it posible to use mmap on currently ulibc?18:45
larscshould be18:55
Fallenoukristianpaul: never tried to run ISE on mac sorry20:20
Fallenoudunno if it's even possible20:20
kristianpaulsure not, just wondering :)21:16
kristianpaulhum seems ther is no mmap support in rtems either..21:22
kristianpaultoo much ask, it seems it is arhc dependant?21:34
GitHub114[milkymist] sbourdeauducq pushed 2 new commits to master: http://git.io/L9-4Dw21:49
GitHub114[milkymist/master] tools: flterm: add log - Xiangfu Liu21:49
GitHub114[milkymist/master] flterm: cosmetic changes + bump version number - Sebastien Bourdeauducq21:49
--- Thu Aug 25 201100:00

Generated by irclog2html.py 2.9.2 by Marius Gedminas - find it at mg.pov.lt!