#milkymist IRC log for Thursday, 2011-09-08

kristianpaul(brochure) wow !!00:52
kristianpaullooks fresh00:52
kristianpaulhum, next brochure could include some small screenshot of flicernoise if there is space, i guess not much ..00:54
wpwrakstandby is still happy (with the NOR locked)02:21
qi-botThe build was successfull, see images here: http://fidelio.qi-hardware.com/~xiangfu/build-milkymist/milkymist-openwrt.minimal-09082011-0443/03:38
GitHub99[scripts] xiangfu pushed 1 new commit to master: https://github.com/milkymist/scripts/commit/f721e37f4aba8b858cb605f19046576381f7228205:38
GitHub99[scripts/master] compile-flickernoise, use full path for MILKYMIST_GIT_DIR - Xiangfu Liu05:38
wpwrakstep one: send the covert black ops ninjas to collect all voodoo dolls of me11:28
wpwrakstep two: make sure to be outside the range of weapons commonly used to kill messengers11:29
wpwrakstep three: bring on the news.11:29
wpwrakafter < 4325 cycles (due to a bug in labsw, not every test cycle resulted in a full on/off cycle), the standby partition still heroically resisted all corruption11:31
wpwrakhowever, i found a single-word corruption in the flickernoise partition, causing a CRC error and subsequent boot failure. this seems to have happened in the 3704th cycle11:32
wpwrakhttp://pastebin.com/QwmQ1La711:33
wpwrakdetails: http://projects.qi-hardware.com/index.php/p/wernermisc/source/tree/master/m1rc3/norruption/LOG11:35
lekernelwell... worst case we'll also lock this one in the field11:42
lekernelbtw, there's an easy way to rule out software bugs. from JTAG, issue "pld reconfigure" and then the boot commands - instead of a full power cycle11:45
wpwraki think what this really means is that nothing in the NOR is safe from corruption11:45
wpwrak("soft-boot") ah, good idea. that would rule out power up/down issues.11:47
wpwrak(that is, if the problem strikes during this testing)11:47
wpwraki suspect that it may be a combination of loss of power and NOR accesses11:48
lekernelthat would be ok, because then there's an easy field fix: make sure you shutdown the M1 before cutting power11:50
wpwrakyes, i hope that works11:55
wolfspraulthat's not a realistic 'fix', but it doesn't matter too much since the issue seems to be rare11:58
wolfspraulanyway it seems we are collecting very valuable data, great!11:58
wolfspraulmaybe the rc4 reset circuit will fix the root cause? too early to tell now...11:59
wpwraki need more data before being able to declare any test of some rc4 circuit a success. that's the problem here - if nothing happens, it could just mean that we were "lucky", but the problem is still there12:01
wpwrakwith a few more data points however, it'll be possible to make a crude statistical model that allows to make statements about the probability of events, and what the absence of events in a certain sample size means12:03
wpwrakanyway, for now i'm still in investigation mode - see what failure patterns are possible. only makes sense after that to hammer the M1 with a specific pattern.12:04
GitHub125[rtems-yaffs2] sebhub pushed 1 new commit to master: https://github.com/milkymist/rtems-yaffs2/commit/01ee204384ad4627189ef673b79086c04f74518d12:05
GitHub125[rtems-yaffs2/master] Flush during close (similar to yaffs_close()). - Sebastian Huber12:05
lekernelwolfspraul, many modern operating systems implement this "fix" you find unrealistic12:09
wolfspraulyou can say this is a fix, but you will only embarass yourself12:13
wolfspraulso better is to not talk about it at all12:13
wolfspraulit's a very rare thing it seems12:13
wolfspraul3700 cycles? :-)12:13
wpwrakthere may be yet unknown parameters that affect the frequency12:14
wolfspraulyes sure12:14
wolfspraulfirst we need to learn more, those are excellent results12:14
wolfspraulis there a button-press to shutdown?12:14
wpwrakwith enough analysis, i can probably make it happen on each try. but then, i don't think i'm quite persistent enough for that ;-))12:15
wolfspraulmaybe we can just focus on the rc4 fix and proove that fixes it entirely?12:27
lekernelwolfspraul, yes.... hold middle pushbutton in FN and it shuts down12:29
wpwrakwolfspraul: i agree with the general direction. but ... first more data is needed to make sure any "proof" we attempt actually proves anything. right now, my statistics are based on a whole three events. this is scarcely more than a proof of existence :)12:32
wpwrakof course, i'm a little optimistic here - if the rc4 fix doesn't work either, then it may not take all that much preparation to prove that it doesn't work.12:33
wolfspraulhold middle button, ok. we can start to always talk about that when people ask how to turn off the device.12:33
wpwrakone thing that's interesting is that all these corruptions seem to affect only data that's actually been written. i haven't seen one change the 0xffff ... unused end of a partition yet12:40
wpwraklekernel: do RTEMS/FN normally read anything from the standby partition ? e.g., some system constants or such ?12:41
lekernelno12:41
wpwrakhmm. maybe it's all just coincidence then.12:43
kristianpaulwell,if FN dont need write to NOR why not disable this support in the norflash16 core?16:33
kristianpaul"/* register only when needed to reduce EMI */" what problems were encountered when developing this core?16:35
qi-botThe build was successfull, see images here: http://fidelio.qi-hardware.com/~xiangfu/build-milkymist/milkymist-openwrt.minimal-09082011-1746/16:42
wpwrakkristianpaul: where is that EMI thing ?17:37
wpwrakkristianpaul: (disable writes) i think it may be a condition where the FPGA doesn't actually intentionally command a write. that may happen only in the confused state in which it ends up when powering down. (well, if the power-down ramp theory is correct)17:38
wpwrakevidence implicating power cycling mounts: 1338 cycles (with NOR unlocked) and standby is still healthy17:43
kristianpaulwpwrak: milkymist/cores/norflash16/rtl/norflash16.v line 5818:06
kristianpaulconfused state :)18:06
wpwrakdo we have something like a "NOR poke" in urjtag, BIOS, or RTEMS, that is known to work if the NOR is unlocked and known to fail (or show incorrect readback) if the NOR is locked ?18:06
wpwraklet's see if i have the prefix somewhere ...18:08
wpwrakah, simply https://github.com18:09
wpwraknope. 40418:09
wpwrakah, that's the source treee18:10
wpwrakokay, got it18:10
wpwrakmaybe lekernel once suspected the NOR corruption could be caused by EMI ? (if that code is from him)18:11
kristianpaulor mwalle ? :)18:11
lekernelno, it's just to avoid unnecessary toggling of external FPGA signals whenever there's system bus activity18:14
wpwrakor mwalle. or maybe the code is inspired by something else18:14
wpwrakaaah !18:14
lekernelespecially since the system bus toggles much faster than the flash can handle18:14
wpwrakwhat does the "register only when needed to reduce EMI" mean then ?18:15
wpwrakif sounds like #ifdef HAVE_EMI_TROUBLE18:15
wpwraks/if/it/18:16
wpwrakfrom your description, it seems that you'd always want to use this glitch avoidance, the contrary of what the comment suggests18:16
wpwrakabout the flyer, page 2: it lists many fancy features but it doesn't actually say that you can also just connect audio "line in" :)18:18
wpwrakonly page 3 has it18:18
lekernelit's not a glitch problem, the system bus signals remain constant when the flash is being read18:19
lekernelbut when it's not, there would be all sort of sorts on the flash address lines, causing completely unnecessary EMI and power consumption18:20
lekernels/sorts/signals18:21
wpwrakyes, that's what i mean. glitches that don't affect principal functionality but that are undesirable nevertheless18:21
wpwrak"register only when needed to reduce EMI" sounds as if it was something you normally don't want to enable.18:21
wpwrakso there seems to be a bit of a contradiction :)18:22
kristianpaulboards/milkymist-one/rtl/system.v line 233, how this flash reset goes with the ic reset recently added?18:22
wpwrakregarding the flyer, page 3. maybe put a line break in the "USB ports" box, before "You can even write {...]" ?18:22
wpwrak"to stimulate your guests" why do i have that fuzzy mental image of someone either handing out little pills or fondling people's genitals ? ;-)18:25
stekernboth sounds like a great party though ;)18:33
wpwrak(sorry, didn't have time yesterday for a closer look at the flyer. we had a rather interesting owner's meeting of my building last night. they're fighting some dirty mobbing war over the position of administrator. that's been going on for months. at the meeting, it came to blows, leading to its cancellation and postponement of further deliberation. the story is slowly approaching movie-grade levels of interestingness :)18:37
kristianpaulah nv, this flash release delays is more for the soc than the flash it self19:03
mwalleho21:44
mwallemh too much backlog ;)21:44
wpwrakthe curse of returning from vacations :)21:44
mwallealready worked for almost one week now again ;)21:45
wpwrakand only now you found the courage to even contemplate the backlog. qed ;-)21:45
mwallehaha ;)21:46
mwalleso was that writepld working?21:47
wpwrakwritereg ? only partially. for reconfiguring a specific bitstream, i has to go a  pld load. there, the stuff surrounding the writes is right.21:48
mwalleerr writereg.. ;) write pld is sth from my work21:48
wpwrakthe stuff is here: http://projects.qi-hardware.com/index.php/p/wernermisc/source/tree/master/m1/jtag-boot21:48
wpwrakwhat i found is that urj_tap_reset_bypass before the urj_tap_chain_flush is essential21:51
mwalleah so you are loading a bitstream, which causes a reconfiguration with a specific address?21:51
wpwrakyup. still need to teach the script a few tricks, such as selecting which bitstream to load (standby / recovery / regular). but that's trivial ;-)21:52
mwallenice, sth like pld reconfigure [address] would be handy too ;)21:53
wpwrakyeah, that would be a nice extrapolation21:54
wpwraknot sure how general it would be, though21:54
mwallemh?21:55
wpwrakanyway, this thing works quite well. the only anomalies i found were after flashing the NOR (without power-cycling between flashing and trying to reconfigure). not sure if it was really not working or if i was just clumsy.21:56
wpwrak(how general) i mean that other chips than the x6 may have slightly different sequences21:56
mwalleyeah21:58
mwalleat least for xc6s devices, your commands could be used with jtag the same way22:02
mwalleHacked 2001 by Werner Almesberger << 2001?22:03
wpwrakargh22:03
wpwrakyou know that you've been around too long when such things don't even look wrong22:04
mwallehehe22:04
wpwrakfixed :)22:08
wpwrakbtw, is there some kind of peek/poke command (in urtag/BIOS/RTEMS) i could use to test if the NOR is really not locked ?22:35
wpwrakah, urtag peek/poke :)22:40
wpwrak*hmm*22:43
wpwrakdoes this make sense ? http://pastebin.com/PtmBqG2Q22:44
wpwrakpower-cycles. let's have another try ...22:45
wpwrakseems that poke doesn't know about flash22:46
wpwrakokay. poke 0 0x40 0 <data> works22:59
wpwrakand Read Array (0xff) would probably restore reads to normality as well23:01
wpwrakgood. verified that the NOR was indeed unlocked.23:01
wpwraktrying locking ... got status 0x92. all very satisfying.23:04
wpwrakhmm, how good is everyone's confidence in M1 surviving brief interruptions of power on/off ?23:26
--- Fri Sep 9 201100:00

Generated by irclog2html.py 2.9.2 by Marius Gedminas - find it at mg.pov.lt!