#milkymist IRC log for Wednesday, 2011-08-17

aw0x32: fix2b, stopped @ 'Bitstream length: 1484404' while reflashing...01:33
aw0x32: tp36 - 690mV, tp37 - 793mV01:35
awthe voltage is not at correct Low or High, i am going to power off01:36
awvoltage of tp36, tp37 is the same. if first flash was not successed before, it seems that keep to stop at 'length: 1484404'...go for another board.01:39
GitHub120[scripts] xiangfu pushed 1 new commit to master: http://bit.ly/q2vsz701:41
GitHub120[scripts/master] add debug all to jtag - Xiangfu Liu01:41
xiangfuaw, Hi01:41
awxiangfu, hi, any news?01:41
wolfspraulaw: let's look at one more board01:42
wolfspraulwith fix2b01:42
wolfspraulalthough I think we already know it's no the magic solution yet01:42
wolfspraulif wpwrak is here we can look into 0x32 more, otherwise fix other bugs first, don't apply fix2b to a lot of boards until we know more01:43
wolfspraulaw: on 0x32, have you checked D16?01:43
xiangfuaw, I just update the reflash_m1.sh under for-rc3 folder, here: http://milkymist.org/updates/2011-07-13/for-rc3/reflash_m1.sh01:43
GitHub21[scripts] xiangfu force-pushed master from aa59a1a to dbd0372: http://bit.ly/nGHAhd01:44
GitHub21[scripts/master] add debug all to jtag - Xiangfu Liu01:44
xiangfuaw, it just enable the 'debug all' option for output more info for us to debug.01:44
awwolfspraul, yes, checked on D16...i found some interesting difference, second...will know soon to compare to the good one (0x39). ;-)01:47
awxiangfu, so with that 'debug all' with default ? , or i need to enable it?01:48
wolfspraulwe don't need that right now [debug all]01:48
awwolfspraul, wait01:48
awxiangfu, if this is just add 'debug all' option with default while I run it, i think that i can use it, why not?01:49
xiangfuaw, http://milkymist.org/updates/2011-07-13/for-rc3/reflash_m1.sh  just updated, it is default.01:49
wolfspraulit will not help with our problems01:50
aw0x39: D16 (in-circuit); forwarding voltage - 152mV, reversing voltage - 1545mV01:54
aw0x32: D16 (in-circuit); forwarding voltage - 153mV, reversing voltage - 1114mV01:54
awso I am going to replace a new D16 firstly to see if this problem01:55
wolfspraulaw: I just noticed 0x32 is a board that never rendered before02:11
wolfspraulthat could be a different problem...02:11
awwolfspraul, yes02:11
awit seems that different catagory failure02:12
wolfspraulwhat's your latest results with 0x32 now?02:12
awbtw: a good new diode(off-circuit): forwarding voltage - 153mV, reversing voltage - no voltage can measured02:13
wolfspraulyou replace D16 on 0x32 with a new one?02:15
awnow 0x32: D16 (in-circuit) reversing voltage is 1114mV, but I replaced a new diode on board, it got 886mV, it must porgram_b loop let reversing voltage down a bit.02:15
wolfsprauldoes flashing work?02:16
awalso the replaced new D16, I tried to take apart and measure its reversing voltage is still good02:16
awwolfspraul, i didn't do reflashing02:16
awtry again...if still can't reflashing ...leave it apart then02:17
wolfspraullet's look at 0x34 now02:17
awmaybe just another failure classification02:18
wolfspraulthat one rendered before02:18
wolfspraulwhat is TP36/TP37 on 0x32 now?02:18
awtp36 - 770mV, tp37 - 838mV , wrong02:19
wolfspraultry 0x3402:20
awyes, of course it can't reflashing and stop at 148440402:20
wolfspraulno need to test flashing with those tp36/tp37 values02:21
awi think this is good evidence. ;-)02:22
wolfspraulmakes no sense. I trust the tp36/tp37 values we measure.02:23
aw0x34: D16 (in circuit) forwarding - 154mV , rev. V - 1547 mV02:23
wolfspraulonce we have hard data, let's use it02:23
wolfspraulaw: one by one02:23
wolfspraulcan we measure meaningful data in-circuit or not?02:23
wolfspraulif not, let's stop doing it02:24
awwait wait02:24
wolfspraulif yes, those values mean the diode is damaged and needs to be replaced?02:24
awlet's test more boards and we see how reasonable For. & Rev. voltage they would be.02:24
wolfspraulI don't want all sorts of random data02:25
wolfspraulthat's a bad time waste02:25
wolfspraulaw: is the data meaningful?02:25
awnow 0x34 can reconfigure surely02:25
wolfsprauldid you apply fix2b to 0x34 already?02:25
awyou are like baby-watching though. ;-) no problems02:26
wolfspraulyes, sorry. try to understand the test data ;-)02:26
awit must somewhere let in-circuit voltage gets low (without power on)02:26
GitHub15[scripts] xiangfu force-pushed master from dbd0372 to b9585d9: http://bit.ly/nGHAhd02:26
GitHub15[scripts/master] add debug all to jtag - Xiangfu Liu02:26
awso later we consult with Werner, he may provide more details to us maybe. ;-)02:27
wolfspraulaw: you talk about measuring D16 performance in-circuit?02:27
awFor. & Rev. voltage measured before power-ed -on but in-circuit.02:28
wolfspraulalright, back to 0x3402:28
wolfspraulso it is booting now?02:28
wolfspraulI think you should reflash (reflash_m1.sh), and re-run all tests and rendering cycles (10)02:29
awnow 0x32 has worse Rev. voltage (below 1545mV), this means somewhere others influence D16's specification/behavior02:29
awnow to reflashing. ;-)02:29
wolfspraulsee how it goes...02:30
wolfspraulaw: maybe the reset ic on 0x32 has a problem?02:30
awxiangfu, wow..man! your debug log msg is many..let's see...reflashing now...;-)02:31
xiangfuaw, for disable is. just remove the whole "debug all" line, just fyi02:31
wolfspraulI was worried about that. I hope your terminal history is enough. You may have to increase it so that we don't loose data.02:32
wolfspraulxiangfu: maybe it should be disabled by default02:32
wolfspraulwe are currently (as of right now) not aware of any problem that 'debug all' may help us with02:32
wolfspraulso we can enable it when we run into such a problem02:32
awwolfspraul, not enough to show history. ;-)02:32
wolfspraulyeah, well02:32
xiangfuthis commit disable it by default: GitHub15> [scripts] xiangfu force-pushed master from dbd0372 to b9585d9: http://bit.ly/nGHAhd02:33
awno problem, just try first one. ;-)02:33
awthen i remove "debug all" line. :)02:34
awmsg log stops at http://pastebin.com/M1ezi1AG02:38
awbut m1 led still flashs, so it still in flashing i think...let's wait one more minutes02:39
awxiangfu, if its log is wrong, directly tell me.02:40
aw@ full speed, it seems needs more much time...good now led2 and led3 are still flashing...let's see02:41
wolfspraulfirst of all, we turn off debug all02:43
wolfspraulit's a bad idea to turn it on all the time02:43
wolfspraulfull-speed should not matter much but I'm guessing, I can try here to set a baseline if that helps02:43
wolfspraulif your log is screwed up now, you should redo the flashing02:44
wolfspraulnot "wait for some minutes"02:44
wolfspraulthat sounds wrong02:44
awleds are still flashing...;-)02:44
wolfspraulwhat does urjtag do on your notebook now?02:44
wolfspraulstill logging something?02:44
awyes still logging output02:45
awstop it anyway?02:45
wolfspraulthe crazy 'debug all' output?02:45
aw:-) don't know then02:45
wolfspraulyes, stop and redo02:46
wolfspraulwithout 'debug all'02:46
awredo now...02:47
aw0x34 good now...at least not stop at crazy 'length: 1484404' ;-)02:49
awgood new: finished reflashing successfully.02:51
wolfspraulxiangfu: can reflash_m1.sh log stdout and stderr into a file?02:52
wolfspraulAdam could run it with redirection too like > urjtag_0x32.log 2>&102:53
awcrc checked okay02:53
awnow go to rendering for 10 times02:53
wolfspraulaw: wait02:53
wolfspraulone idea for when you run reflash_m1.sh02:53
wolfspraulso right now you just execute "./reflash_m1.sh", right?02:53
awsudo ./reflash_m1.sh 00 3402:54
wolfspraulbut you can run "sudo ./reflash_m1.sh 00 34 >> urjtag_0x32.log 2>&1"02:54
awlistening..and standby02:54
wolfspraulthe >> should append to that log file, so even if you run multiple times it will be added to the end of the log file02:54
wolfsprauland the 2>&1 will redirect error messages into the log file as well02:55
xiangfuwolfspraul, yes.02:55
wolfspraulxiangfu: does this work? can you try?02:55
wolfspraulsometimes there are issues with redirecting bash scripts and sub-processes...02:55
xiangfuworks just fine.02:55
wolfspraulwell, then we should tell Adam02:55
wolfspraulit helps him02:55
wolfspraulaw: next time you reflash a board, try that02:56
wolfspraulfor example for 0x34, it would be:02:56
wolfspraulsudo ./reflash_m1.sh 00 34 >> urjtag_0x34.log 2>&102:56
awwolfspraul, do you mean that if everytime I run the same commands above, the massage will be "added" increasingly to .log file?02:56
wolfspraulyes correct02:56
wolfspraulso you only need to watch the board number02:57
wolfspraulso you don't write into the wrong log file02:57
wolfspraulyou can collect all log files on your disk, and upload to the downloads server later02:57
wolfspraulalso saves time02:57
wolfsprauljust remember two things:02:57
awokay...good so that i dont need to do such stupid work "copy" and "paste" fro terminal. ;-)02:57
wolfspraul1. use >> (two characters, not one)02:57
wolfspraul2. always use the same board number in the reflash_m1.sh parameter and the name of the log file02:57
awtry now...second02:58
wolfspraulxiangfu: can you try that this really works?02:58
wolfspraulI don't have my m1 here right now02:58
xiangfuwolfspraul, yes. I am running that now. only the output will be a little confuse since there are ^M when eraseflash. but it's ok02:59
wolfsprauldon't understand02:59
wolfspraulwhere does the ^M come from, and where is it written to?03:00
wolfspraulthe log file?03:00
xiangfuwhen you open the log file with VIM or Emacs there will be a little confuse, but open with 'gedit' will ok.03:02
wolfspraulok so it goes into the log file - good03:02
wolfsprauland where does it come from?03:02
wolfsprauland why?03:02
awafter I types that commands above, the terminal doesn't show msg log, it should write directly into .log file03:02
wolfspraulmaybe remove it?03:02
wolfspraulaw: yes, the terminal will show nothing03:02
wolfspraulthat's a little unfortunate if you run into an error03:03
wolfspraulbut eventually reflash_m1.sh will stop03:03
wolfsprauland then you can look in the log file03:03
awcan it show also msg log in the terminal? so i can see how it goes on..03:03
xiangfuwolfspraul, when eraseflash it output like: (0% Completed) FLASH Block 0 : Unlocking ... Erasing ... Ok.03:03
xiangfuwolfspraul, then ^M not '\n' %1 .. %203:03
wolfspraulalright, don't know whether that's the best/right but no time now :-)03:04
wolfspraulxiangfu: one thing we could do is this:03:04
xiangfuwolfspraul, so when you open with VIM there will be a BIG line. 0% --> 100% but this is ok in gedit03:04
xiangfuaw, you can use : /reflash_m1_rc3.sh 00 2a 2>&1 | tee >> log03:04
wolfspraulthe reflash_m1.sh makes the stdout/stderr redirection inside the script, into the file, and also shows it on its own stdout/stderr03:04
xiangfuaw, then you will get output both under terminal and log03:04
wolfspraulgood idea03:05
wolfspraulbut let's be more precise please03:05
wolfspraulsudo ./reflash_m1.sh 00 34 2>&1 | tee >> urjtag_0x34.log03:05
wolfspraulwhy is there a special _rc3.sh btw?03:06
wolfspraulaw: can you try that new line?03:06
wolfsprauljust reflash again with that line: sudo ./reflash_m1.sh 00 34 2>&1 | tee >> urjtag_0x34.log03:07
awxiangfu, is it the newest one: http://milkymist.org/updates/2011-07-13/for-rc3/reflash_m1.sh03:07
wolfspraulman I hope we have only one script03:07
wolfspraulaw: don't change anything with your script now, it worked before03:08
wolfsprauldon't touch it03:08
wolfsprauljust try the new line and add: 2>&1 | tee >> urjtag_0x34.log03:08
awi am asking that script if it's with default settings and newest?03:09
wolfsprauldon't touch your script03:09
awi want to download it again. ;-)03:09
wolfspraulit worked before, it works now03:09
wolfspraulyou can only get new bugs :-)03:09
wolfspraulaw: let's try the new line, and add: 2>&1 | tee >> urjtag_0x34.log03:10
awterminal doesn't show up msg. :(03:13
awnot parallel , so that i don't see anything in time.03:14
wolfspraulmaybe a side-effect of sudo?03:14
wolfspraulof course it's not properly tested before, sorry about that03:14
awdon't know03:15
awforget about this now. ;-)03:15
awjust copy and paste03:15
wolfspraulwait one moment03:15
wolfspraulaw: so you flashed the board?03:16
wolfsprauland there was no output in the terminal?03:16
wolfspraulor all at the end?03:16
awone point: a good command can let me stop anytime and it can still write into .log file and also shows up them in terminal though. I hope . ;-)03:17
wolfspraulyes sure it's easy. just needs to be properly tested and done.03:17
awyes, no any msg shows up in termianl with commands above03:17
awnow reflashed is doone03:17
awused gedit to open .log file, it's okay03:18
wolfspraullet's try one more random idea03:19
wolfspraulif this doesn't work, then we need to get this right first, then talk to you :-)03:19
wolfspraulbut one more, here it is:03:19
wolfspraulah wait03:20
wolfspraulxiangfu's line was wrong03:21
wolfspraultry this:03:21
wolfspraulsudo ./reflash_m1.sh 00 34 2>&1 | tee -a urjtag_0x34.log03:22
wolfspraulxiangfu: don't you think tee >> log is wrong? Adam needs tee -a log03:22
xiangfuwolfspraul, both are ok. I have tested. with >>03:23
awmm..now terminal shows msg. ;-)03:23
xiangfubut yes. sounds like -a is better03:23
wolfspraulAdam wants to see the output03:24
awbtw. can i use "ctrl + C" while reflashing..if I see reflashing stops03:24
wolfspraulso if he uses >>, then the tee output is gone (no file parameter for tee)03:24
wolfspraulaw: yes you can use ctrl-c, no hesitation03:24
awans still can write into log file, which won't interrupt by my CTRL + C?03:24
wolfspraulsure, it will all interrupt, like before03:25
wolfspraulbut the log is written03:25
wolfspraulthe log is always safe, no worries03:25
wolfspraulyou cannot loose anything in the log03:25
wolfsprauljust remember the syntax of the line03:25
wolfspraul2>&1 | tee -a urjtag_0x34.log03:25
wolfspraulthat will always append to the log, perfect for our use03:25
awyes, i recorded into my file already. ;-)03:26
wolfspraulof course you need to make sure the filename has the correct board number03:26
wolfspraulso whenever you work on a particular board, you add to the log file for that board03:26
wolfspraulthen upload all log files to the downloads server03:26
wolfspraulback to 0x34 :-)03:27
wolfspraulkeep us posted03:27
awyou can go to the server folder to see log file when you back. :)03:27
wolfspraulthe bigger problem is what we saw on 0x3203:27
wolfspraulbut let's finish 0x34 now03:28
awreflashed done again03:28
awlet's test it03:28
wolfspraulafter that is 0x39, also good (rendered before)03:28
wolfspraulbut 0x3A did not render before03:28
wolfspraulanyway one by one03:28
wolfspraulit's tough to mix design uncertainties with production surprises...03:29
wolfspraulbut we get through it03:29
awhow about "flterm --port /dev/ttyUSB0 --kernel boot.bin"?03:31
awcan it be added ">>" to log file too?03:31
awi still use stupid copy/paste method. :)03:32
wolfspraulyou can add 2>&1 | tee -a log_file03:33
wolfspraulI think we should write the urjtag and flterm into the same log file03:34
wolfspraulso let's give it another name03:34
wolfspraulfor example rc3_0x34.log03:34
wolfspraulso that would be:03:34
wolfspraul1. sudo ./reflash_m1.sh 00 34 2>&1 | tee -a rc3_0x34.log03:34
awsince I'll test 10 times, so the log will be longer03:35
wolfspraul2. flterm --port /dev/ttyUSB0 --kernel boot.bin 2>&1 | tee -a rc3_0x34.log03:35
awtry now03:35
wolfsprauleven if you run another script like read_flash_m1.sh, you can append to the same log file03:36
xiangfuflterm is different03:36
wolfspraulread_flash_m1.sh 2>&1 | tee -a rc3_0x34.log03:36
wolfspraulxiangfu: alright, what works?03:36
awhmm...seems 'flterm' doesn't accept other parameters .:(03:37
wolfsprauldo copy/paste for now03:40
awit wrote logs as: http://pastebin.com/R0uFzSN903:40
xiangfuwolfspraul, it needs modify the flterm source code for log03:40
wolfspraulbut you can already use the name rc3_0x34.log when running reflash_m1.sh03:40
wolfspraulit's a better name03:40
awnot fully all msg saved into log file while using 'flterm'03:41
awsure sure03:41
wolfspraulxiangfu: or we need to find a terminal program that supports logging/stdout somehow03:41
wolfsprauladam needs practical solutions now. which is copy/paste for flterm03:41
wolfsprauland the tee thing for reflash_m1.sh03:41
wolfspraulxiangfu: if you can find an easy solution for terminal logging, tell us :-)03:41
xiangfuaw, you can wrap the reflash_m1.sh to another script file like:03:45
xiangfumkdir -p log03:45
xiangfu./reflash_m1.sh $1 $2 2>&1 | tee -a log/urjtag_$2.log03:45
xiangfuthen you will not worry about the log name.03:45
awgood solutions! thanks.03:46
aw0x34 rendering pass04:24
wolfspraul0x39 now?04:35
aw0x39 I wrote reflash log again. will upload04:41
awrework 0x3a now04:42
wolfspraulaw: what happened on 0x39 ?04:46
aw0x39: this was successfully yesterday . ;-)04:46
wolfspraulah ok, all pass04:46
wolfsprauloh, forgot04:46
wolfspraulso 0x3A now, got it04:46
wolfspraulok - 0x3A now, then 0x3C04:47
wolfspraul0x3A did never render before, 0x3C did04:47
wolfspraullet's see...04:47
aw0x3A: D16(in circuit) For.V.=153mV, Rev.V.=1120mV, can reconfigure.04:48
awmm this is not the same 0x32. ;-)04:48
awmeasure tp36, tp37 for records first04:48
aw0x3A histories: never reflashed successfully before04:49
aw0x3A: tp36 - 2.66V, tp37 - 2.91V, no good; it must be reached to rough 3.3V04:52
awtry to reflash now04:52
awmm..yes ...stop at 'Bitstream length: 1484404'04:54
awso once the tp36, tp37 voltage is not high enough, reflashing must be unsuccessful04:55
wolfsprauloh sure04:55
awi leave 0x3A apart now.04:55
awor I replace a new diode . ;-)04:56
awlet's do it. ;-)04:56
wolfspraulyou mean replace D16 ?04:57
wolfspraulno I'm against that04:57
wolfspraulI don't want to make random experiments04:57
wolfspraulwhat is the theory behind that?04:57
wolfspraulthere is none04:57
wolfspraulso - no04:57
wolfspraullet me think for a moment04:57
awsince the tp37, tp36 is directly connected to diode04:58
wolfspraulok but I want to think more, not randomly switch parts04:58
awif diode (in-circuit) is not fully acted as 0x3904:58
awalright..just discuss first04:58
wolfspraulI still don't know whether those numbers are meaningful, when measure in-circuit04:58
wolfspraulso it's just noise04:59
wolfspraullet's see. so far we applied fix2b to 4 boards: 0x32 0x34 0x39 0x3A04:59
wolfspraul0x39 was the first one, and where we built the fix2b theory.05:00
wolfspraul0x34 works05:00
aw0x39, 0x34 with good diode(in-circuit) also tp36, tp37 are all good05:00
wolfspraul0x32 and 0x3A do not work. both never rendered before, and now they show bad tp36/tp37 values05:00
wolfspraulso far all correct?05:00
aw0x32: no good on D16(in-circuit): For.V. = 153mV, Rev.V. = 1114mV05:01
aw0x3A: relatively D16(in circuit) For.V.=153mV, Rev.V.=1120mV, can reconfigure.05:01
aw0x3A: tp36, tp37 voltage is not pull high enough05:02
wolfspraulwhat do you mean with "relatively"?05:02
wolfspraulok - let's measure forward and reverse voltage of D16 on 0x34. what is it there?05:03
aw0x32: tp36 - 900mV, tp37 - 1.1V05:03
aw0x34:  D16(in-circuit), For. V. = 154mV, Rev. V. = 1547mV05:04
wolfspraulI found it. "0x34: D16 (in circuit) forwarding - 154mV , rev. V - 1547 mV"05:04
aw i noitced if good diode(in-circuit) the Rev.V needs to be 1545mV05:04
awFor.V is almost ~153mV05:05
wolfsprauland you think the difference between ca. 1120mV and ca. 1545mV is the difference between bad and good?05:05
awif Rev. V is lower. means that could be have few current leakage05:06
awwell..i just noticed but no theory to approve it05:06
wolfspraulwell since nobody else is awake, just try05:06
wolfspraulrandom is fun ;-)05:07
wolfspraulso you put a new D16 on 0x3A ?05:07
wolfsprauland measure the old one after it's removed...05:07
awso probably only both in-circuit and tp36 tp37 are all correct. then d2/d3 is fully off and can reflash successfully05:08
awlet's try to replace now. ;-)05:08
awmmm...this made me think C23805:09
wolfspraulcheck C23805:10
awsince program_b is one of the terminal of D16, also connected to C238, if my soldering is no good, thus C238 may be also not good a little.05:10
wolfspraulwhat do you do now?05:11
wolfspraulyou put a new D16 on 0x3A?05:11
awmm try now05:11
awD16 I took apart is perfect: For.V = 154mV, Rev.V.=no value.05:14
wolfspraulok, so the problem was elsewhere05:14
wolfspraulbut we still know good values for D16 when measured in-circuit, which seems to be ca. 150mV and ca. 1550 mV05:15
wolfspraulso put the new one on, and measure05:15
wolfspraulmaybe the problem is C238, or somewhere else?05:15
aw0x3A: tp37 - 3.29V perfect, tp36 - 900mV05:16
wolfspraulwhat voltages do you measure on the new D16 (in-circuit) now?05:19
awmm...possible points: 1. C238 is not good quality after soldering 2. reset out05:20
awi haven't soldered new diode on boards05:20
awsomewhere is wrong to let tp36 not pull high enough05:20
wolfspraulreplace C238? replace reset ic? (I'm just guessing)05:21
awi am going to replace a new c238 first05:21
wpwrakaha ! more tests :) lemme catch up ...05:21
aw i need your help. ;-)05:23
wpwrak(fpga) oh, a bit of techno-mumble never hurts :)05:23
wolfsprauljust the right time for the savior, and I have to run to a meeting with Jon...05:23
wolfspraul(in a little bit)05:24
awafter replace a new C23805:31
wpwrakstill catching up .. lots of stuff :)05:31
awd2/d3 is fully OFF. man!05:31
awi hate myself though05:31
wolfspraultp36/tp37 good now?05:32
awtp36 and tp37 go back to good 3.3V05:32
awnow to solder diode back again05:32
wpwrak_let's parallelize this :)05:33
awfrom now on ...soldering back diode I always use a new one. ;-)05:33
wpwrak_lots of bad diodes ?05:34
wolfspraulaw: of course!05:35
wolfspraulcome on we don't need to slow ourselves down for trying to save 20 cent items05:35
wolfspraulevery chance that a diode is unsoldered, of course is a chance to put a new one there05:36
awgood now05:36
wpwrak_(0x32) that's all after removing the INIT_B diode ?05:36
wolfspraulwhere are you reading now?05:36
wolfspraulwe are a bit ahead already05:36
aw0x3A: D16(in-circuit): For.V. = 153mV, Rev.V = 1547mV05:36
wolfspraulyes perfect05:36
aw0x3A: tp36 and tp37 are all 3.29V05:36
wolfspraulwpwrak_: I don't think a lot of diode problems05:37
awso there's big FACTs now:05:37
wpwrak_(replace parts) in general, i would try to discard anything that got unsoldered (unless really really difficult to replace)05:37
wolfspraulcorrect, fully agree05:37
wpwrak_wolfspraul: i'm around "let's look at 0x34 now". just started catching up05:37
wolfspraulwpwrak_: basically we have a reference value for D16 now when measured in-circuit - ca. 150mV forward, 1545mV reverse05:37
wolfspraulwhen we see those numbers, we can assume D16 and C238 to be correct05:38
wpwrak_sounds reasonable. those 1.5 V are some obscure path, but that's the price of measuring in-circuit05:38
wolfspraulwpwrak_: ok, read top to bottom first...05:38
aw1. before I go to test these boards, just go for measure in-circuit voltage of D16, if not right. must be some other area is wrong, typical C238 and diode itself05:38
wolfspraulaw: let's try to fix 0x32 now05:39
wpwrak_ah, C238 acts up too ? interesting :) reading05:39
aw2. measure tp36 tp37 to confirm if 3.3V high enough05:39
awgood now is reflashing.....this won't stop at 1484404 there. ;-)05:40
awnow we have clear direction to fix these kinds of bugs. ;-)05:40
awbut bugs belongs to me Adam...;-)05:41
awafter reflash 0x3A, let;s back to 0x32. ;-)05:43
awoah~ no. 0x3A is d2/d3 dimly lit after reflash. :(05:44
wolfspraulno problem05:45
wolfspraulactually that's good05:45
wolfspraulaw: measure TP36/TP3705:45
awtp36, tp37 is still 3.3V. good05:45
wolfspraulD16 forward/reverse (in-circuit)05:45
awneed to power off to measure05:46
wolfsprauld2/d3 is dimly lit right now?05:46
wolfspraulwhat was the process?05:46
wolfspraul1. you ran reflash_m1.sh05:46
wolfspraul2. it succeeded05:46
wolfspraulthen what?05:46
wolfspraulyou power cycled?05:46
wolfspraulor press middle button?05:46
aw1. I ran reflash_m1.sh05:47
wolfspraulwpwrak_: caught up?05:47
aw2. do nothing....until it terminal log shows finished and saw d2/d3 dimly lit05:47
awi did nothing though. ;-)05:48
wolfspraulhuh? did it finish flashing?05:48
awno power off05:48
wolfspraulcan you upload the log?05:48
awyes, this failure was few cases in first round of tests though05:48
wpwrak_not yet. currently at the i/o redirection. maybe consider using "script"05:48
wolfspraulit may be a software problem only05:48
wolfspraulwpwrak_: ok so when you make it here :-)...05:49
wolfspraulbasically fix2b worked well for 0x39 (yesterday) and 0x3405:49
wolfspraulit did not work for 0x32 and 0x3a (values see above)05:49
wolfspraulon 0x3A, it turned out that replacing D16 and C238 made it work (well, not 100% sure yet, see the dimly lit story just unfolding)05:50
wpwrak_btw, does reflashing still use "debug all" ?05:50
wolfspraulI killed that :-)05:50
wpwrak_(debug all killed) good :)05:50
awwhen you saw log, there's stop 1484404 there, after that I replaced C238 and diode. then can reflashed. ;-)05:51
awbut do nothing once reflashed done05:51
wolfspraullooks good05:51
wolfspraulstill dimly lit now?05:51
wolfspraulpress the middle button05:52
awno any flash on leds05:52
awno boot up05:52
wolfspraulnow - power cycle05:53
awnow tp37 tp36 is stll good 3.3V05:53
wolfspraulah wait05:53
wpwrak_what's the voltage on INIT_B ?05:53
wolfspraulno power cycle05:53
awcan't reconfigure after power cycle05:53
awwpwrak, bad..05:53
awi powered05:54
wolfspraulbefore we do measurements, I suggest to disconnect/reconnect the jtag-serial board, and flash again (remember to check that you flash in usb full-speed)05:54
wolfspraulthis board was just flashed for the very first time, so it could be related to that05:55
wpwrak_a virgin board. maybe it's a little shy :)05:56
awmoment...the init_b is now at bottom side..phew~05:56
wpwrak_aw: ;-)05:56
wolfspraulaw: I suggest - reseat jtag-serial board, flash again05:56
wolfspraulmaybe there was a problem writing into nor, whatever problem05:56
wpwrak_an item for the shopping list: lab at zero gravity ;-)05:56
wolfsprauland this was the first flashing. so it may be something totally different from our 'permanent reset' issue before.05:57
awwpwrak, init_b = 3.3V while d2/d3 dimly lit05:57
wpwrak_that means that the FPGA is happy05:58
awso now power off and replug jtag board and reflash again?05:58
wpwrak_maybe see if you can load the test program ?05:58
wolfspraulwon't work, no reconfig05:59
awwpwrak, once d2/d3 dimly lit, the middle btn is no action so that can not enter test s/w05:59
wpwrak_ah, i see05:59
wolfspraulaw: disconnect/reconnect jtag-serial, reflash05:59
wpwrak_INIT_B = 3.3 V means either that the FPGA didn't even begin to reconfigure, or that it succeeded05:59
wolfspraulwpwrak_: theoretically a boot path entirely over jtag/fpga/sdram could be written, but a number of pieces are missing now06:00
wolfspraulI think we can load the bitstream over jtag, but then the bios has to come from nor06:00
wolfspraulbut even for that we have no scripts ready now, right now06:00
wpwrak_wolfspraul: you need a devirginator ;-)06:01
wpwrak_(like we had at openmoko)06:01
wolfspraulyes I know06:01
wolfspraulpeople complained to me about inappropriate naming of technology by some rogue staff...06:02
wolfspraulto which I said it's beyond my control :-)06:02
wpwrak_so somebody noticed. i was wondering ;-)06:02
wolfsprauloh sure. this is actually not so pleasant to talk through with Taiwanese staff, female staff, etc.06:03
wolfspraulbut we are all for free speech etc.06:03
wolfspraulin the US you would be in big trouble06:03
wpwrak_yeah. i never expected the name to stay around for long. so i'm quite surprised it did :)06:04
wolfspraulthe problem is they take it serious, look it up in a dictionary etc.06:05
wolfspraulnot so good06:05
wpwrak_oh dear :)06:05
wolfspraulhere you go. devirginator "A person who consistently sleeps with virgins i.e. removes their virginity or pops their cherry. Can be male or female."06:05
wolfspraulwant me to discuss this with Taiwanese staff? no! please not!06:06
wpwrak_i think i got the idea from someone calling fresh-from-the-fab boards "virgin" boards06:06
wpwrak_heh :)06:06
wolfspraulwell. they look it up.06:06
wolfsprauland that's what they find06:06
wpwrak_duly noted. need to find more obscure names06:06
wolfspraulnicely explained in Chinese maybe even06:06
wpwrak_the depravity of us westeners06:07
wolfspraulI should have suggested they schedule it to be added as an 'new words seen in the office' for the weekly English class06:08
wolfspraulmove the problem to that teacher, so they earn their money...06:08
wpwrak_make sure they all use it in daily conversation with other people :)06:10
wpwrak_our current board is 0x32, right ?06:12
wpwrak_to see what's happening, maybe monitor TP35 (DONE) with a scope when power cycling06:14
wpwrak_even better: monitor INIT_B too06:14
wolfspraulno it's 0x3A now06:15
wolfspraulbut same case as 0x32 in that before fix2b, it never flashed or rendered06:16
wolfspraulaw: any update on 0x3A ?06:16
wolfspraulAdam is a little silent :-)06:16
wolfspraulwpwrak_: I vaguely remember one case in the US where a developer did something similar, naming some internal little tool in an 'inappropriate' way06:16
wolfspraulwell, he had a nice little chat with general counsel or CEO or so, and then it got 'cleaned up' :-)06:17
wolfspraulall fine with his job etc. but that kind of stuff will just not be tolerated in the corporate US world06:17
wolfspraulso he ran around frantically trying to erase all traces of his neat little tool :-)06:18
wolfspraulthe pussies are in control06:18
wolfspraulah Adam just told me he got interrupted, back soon. and I'm out to meet Jon. crossing my fingers...06:19
wpwrak_(us) yeah, that was of course part of the fun. knowing that this would never fly over there :)06:19
wpwrak_0x32 is in limbo, too ?06:20
wolfspraulput aside06:20
wolfspraulat that point we wanted to see some more fix2b results first06:20
wolfspraulbecause 0x32 never rendered before06:20
wpwrak_(more fix2b) sounds fair06:21
wolfspraulthen we did 0x34 (which rendered before and fix2b turned it all good)06:21
wolfsprauland then 0x3A (which initially behaved same as 0x32 but then with C238 it got a little further, eventual resolution pending)06:21
wolfspraulthat's where it stands now06:21
wolfspraulAdam thought 0x3A is a done deal, and he wanted to go back to 0x32, but then of course a problem still did show up on 0x3A06:22
wpwrak_in murphy we trust06:22
awi am back06:23
wolfspraulah, but I need to run. l8 and good luck!06:23
awwolfspraul, sure06:23
awwpwrak, so you got all histories of this moring test. ?06:23
awwpwrak, hehe..06:23
wpwrak_still working on the backlog06:24
awi think now i leave 0x3A apart firstly and back to see 0x3206:24
awbut before this, i need to record first06:25
wpwrak_but i think if a board has okay voltages (after fix2b) but still has dim LEDs, the things to look at (with a scope) would be DONE and INIT_B. if INIT_B is inconvenient, use PROGRAM_B instead.06:25
awi see. now mine is  0x3A06:26
wpwrak_DONE = TP35. at least that's easy :)06:26
awso okay...that's scope TP35 to trigger with program_b?06:26
wpwrak_hmm, okay, trigger on PROGRAM_B rising06:27
awman! 0x3A now is dimly lit again after power cycle06:27
awlet's see tp36, tp37 normal voltage first again06:28
wpwrak_let's say 100 ms/div, peak, ~3 div before, ~7 div after the trigger06:28
awtp36 tp37 is still 3.29V good enough06:28
awneed to solder wire on TPs...second06:30
aw0x3a: http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_0x3a_ch-program_b_ch2-done.JPG06:39
awch1-TP36, ch2-TP3506:39
awwpwrak, i think i need to scope init_b though trigger with program_b .;-)06:40
wpwrak_hmm, never finished configuration06:41
wpwrak_yes, INIT_B would be interesting then06:41
awwpwrak, wait06:41
awnot sure06:41
wpwrak_pity you have only two channels06:41
awfro rc2 the waveforms I scoped , i should set to more 250 ms/div and to see if done has been pulled high?06:43
awwpwrak, aggreed?06:43
wpwrak_dunno. in rc2, DONE should rise within ~300 ms. here, you have ~700 ms06:43
wpwrak_but you can try. maybe the speed is variable / has gotten slower06:44
awlet me try..hope not miss more important info.06:44
wpwrak_another feature for your next scope: MEMORY :)06:44
awwpwrak, yes, ch2 is over 8 div, and still no pull high...so even fpga didn't enter reconfigure stage06:46
awwpwrak, ha..you can push Wolfspraul though..06:46
awphew~ try init_b now06:46
wpwrak_(push wolfgang: yeah, i have a few ideas what needs to get bought if we should ever come across significant money. better scopes it pretty high on that list ;-)06:47
wpwrak_(alas, good scopes aren't cheap. the ones i have my eyes set on are all in the USD ~10k+ segment)06:48
awyes, i remembered when i at OM, Wolfgang and Ruby tried to gather those info for you. ;-)06:53
awinit_b captured.06:53
wpwrakokay, caught up :)06:53
wpwraklet's see what it shows :)06:54
awfrom rc2 waveform, init_b should stay at 1.2V roughly once program_b goes high, right?06:56
awlemme check, not sure though.06:56
wpwrakseems that it should be around 3.3 V06:57
wpwrakso it appears to hit a CRC error immediately06:57
wpwrakdoes the script that reads the NOR via USB-JTAG work ? (in general)06:58
awlemme try it now06:59
awonce fails on reconfigure, it may not read back...not sure...try this 0x3A first07:00
awif not , I go back to read 0x3407:00
awnew discovery, i triggered again. and see init_b waveform is different though. ;) but d2/d3 still dimly lit.07:03
wpwrakshow me ! :)07:03
awgoes high!~07:05
awtotally different act from previous one. ;-)07:06
wpwrakhmm, that's a weird one. i don't like the drops at t = +75 ms and t = +145 ms07:07
wpwrakbut maybe that's retries07:07
awthis may show init_b can be output also input as an indicator07:07
awcould be?07:08
wpwraksure. but we can't always tell when init_b is an input. input looks the same as output high :)07:09
awwpwrak, sorry that i don't understand you don't like the drops at ....?07:09
wpwraki wonder what they mean. but let's assume they're CRC errors.07:11
awso init_b indicator should be High once fpga inside finished CRC checked and show High syncronized to PROGRAM_B?07:11
wpwrakso FPGA comes out of reset at t = 0, tries to load from NOR, gets a CRC error at t = +75 ms, tries again, gets another CRC error at t = +145 ms, tries again, ... and then seems to succeed (?)07:12
wpwrakINIT_B should be high while the CRC is okay07:12
awmm...so this needs to consult with lekernel  to confirm?07:12
awyup~reasonable from rc2 waveforms. got it07:13
wpwrak(consult) naw, i think we don't need to bother him with this. yet :)07:13
wpwraknow .. why would the NOR have troubles. hmm.07:14
wpwraknext test: CH2 stays in INIT_B, move CH1 to TP37 (FLASH_RESET_N). then trigger on CH2 rising. move trigger to -200 ms so that we get the same time window as the last time07:15
awhmm..let's trigger other pin of flash chip, to see if flash is in correct assertion?07:15
wpwrakif the reset looks good, then we'd have to test the other NOR pins, yes. this will be fun :)07:16
wpwrakbut i think if the reset is fine, then 0x32 should go to the "try to fix this when you have plenty of time" queue. because that can easily keep you busy for a whole day.07:18
awoah...yup..whole day or directly replace flash chip...but too bad now is out of stock here. :(07:22
wpwrakso ... INIT_B + TP37 ?07:30
awphew~ :(07:31
wpwrakreset is good.07:33
wpwrakregarding the NOR reading script, have you ever used this script successfully ? (with a board that works okay)07:33
wpwrakif yes, maybe you can try it here too07:33
awnever used but now lemme try 0x3907:34
awsince i doubt fpga will let jtag access with flash chip if unsuccessful on reconfigure. but worthy to read though that i 've never read before. ;-)07:36
awreading from 0x39. :)07:38
awxiangfu, reading flash image will be slow?07:38
xiangfuaw, yes.07:39
wpwraki think it will allow jtag access :) before fix2b, failure to reconfigure meant reset trouble, which also blocked NOR access via jtag. now, failure to reconfigure means something else. so NOR access via jtag should work.07:39
wpwrakaw: planning a five-course dinner ? :)07:40
wpwrakah nice. finally got M1rc2_powerOnOff_sequences_manuscript.jpg printed. now i no longer need a screen just for this :)07:40
xiangfuaw, you want read whole 32MB flash? that needs ~4 hours.07:41
wpwrakouch :)07:41
awwpwrak, i'll always be beaten when do this with PHD. :)07:41
wpwrakxiangfu: what does the reading script read by default ? everything ?07:41
xiangfuwpwrak, no. it only read first 640KB.07:41
awxiangfu, what's the image from 640K?07:42
wpwrak(640 kB) hmm, so that's a bit less than half the bitstream ?07:42
awxiangfu, man! i stop reading07:43
wpwrakaw: 640 kB should take about 5 minutes07:43
xiangfuwpwrak, it only read standby.07:43
wpwrakoh, is see07:43
awwpwrak, hmm? so keep reading?07:43
xiangfuwhole standby partition, I mean.07:43
wpwrakaw: yeah, let it finish. should be soon.07:44
xiangfuaw, if you want read soc bitstream. you can modify the script file a little.07:44
awno no...i think we just need standby07:45
awso do i need to modify script file or itself is for standby already?07:45
awxiangfu, so 5 minutes only?07:46
xiangfuaw, no. by default only read standby . yes about 5 minutes07:46
awalright read again now. thanks07:46
awwpwrak, so how do we go next?07:47
wpwrakaw: let's label 0x32 with "possible NOR instability" and put it on the pile of boards that need deeper analysis later07:48
wpwrakaw: the, the next would be 0x3a, right ?07:48
awwpwrak, no , swapped them though07:48
wpwrakswapped ?07:49
awso 0x3a is possible NOR instability07:49
awnext is 0x32. ;-)07:49
wpwrakah, you were working on 0x3a ?07:49
wpwraki see07:49
wpwrakokay, let's see what 0x32 can do :)07:49
awnow just reading 0x39 flash chip back. ;-)07:49
awafter this, i go back to read 0x3a since you said it could be read though. ;-)07:50
awxiangfu, so Files is under /home/adam/.qi/milkymist/readback/20110817-154407:51
aw will always be the same name file? or everytime is different07:51
awhm...seems that you used system time. ;-)07:52
xiangfuaw, filename always the same. the folder is changed.07:52
xiangfu20110817-1544 <-- is the data time07:52
awxiangfu, oah ..got it07:52
wpwrakxiangfu: in the tests adam is doing, which bitstream does his FPGA load ? the "standby" bitsream or the "regular" bitstream ?07:54
awxiangfu, i would like the saved/readback file name is related to mac address, is it possible? ;-)07:58
aw0x39 read back is done07:59
awnow back to try to read 0x3a07:59
wpwrak(name by MAC) mv readback/2011<Tab> 0x39-standby.bit   :-)07:59
wpwrakor, rather  mv readback/2011<Tab> readback/0x39-standby.bit08:00
awwpwrak, oah~ sweety, you know i poor on cmd. ;-)08:00
aw0x39: http://pastebin.com/RWrwtPyg08:01
wpwrakmv /home/adam/.qi/milkymist/readback/20110817-1546 /home/adam/.qi/milkymist/readback/0x39-standby.bit          (or wherever you want it)08:03
awwpwrak, yes. you are right. 0x3a is reading now. ;-)08:03
wpwraksee :) we're winning !08:03
awwpwrak, that's because we've seen program_b/init_b/rp# are all correct, so that's why you wanted to me to buy you a dinner! .;-)08:04
awwpwrak, so later how we compare those two bitstream files?08:05
wpwrak(dinner) ah no, i was asking there, whether you were planning to take a long break (e.g., for a lavish dinner) while a very slow download is happening08:06
awwpwrak, oah...misunderstood though...08:06
awwpwrak, can we from those two (0x39 and 0x3a) bitstream files to discover secrets behind?08:07
wpwrakto compare:  diff -u <(hexdump first-file) <(hexdump second-file)08:07
wpwrak(if you're using the bash shell)08:07
awwait..so we need to go deeply 0x3a or work for 0x32(next board) next after read from 0x3a?08:08
wpwrakonce you've downloaded the standby bitstream from 0x3a, please download it a second time. that way, we can see if it changes (e.g., if there is noise on the bus)08:09
wpwrakxiangfu: in the tests adam is doing, which bitstream does his FPGA load ? the "standby" bitsream or the "regular" bitstream ?08:09
awwpwrak, aha~ good idea.08:09
awseems he's not here. :)08:12
aw0x3a: read again now.08:16
wpwrakyeah, seems that we lost him :-(08:16
wpwrakcan you upload the bitstreams you got so far (0x39 and 0x3a) somewhere ?08:17
xiangfu_my last message is : wpwrak, the test is standby --> soc bitstream --> BIOS --> test bin08:19
Action: lekernel imagines talking to someone in business suit: "- How do we program the boards? - Well you have to take the..., ahem,... the Devirginator"08:19
wpwrakxiangfu_: thanks !08:20
wpwraklekernel: it gets better: at the factory, the girls working there were running ./devirginate from the command line :)08:20
wpwrak(i didn't expect that, though :)08:21
awwpwrak, yes...lemme upload it first08:23
wpwraklekernel: we have some new interesting behaviour: http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_0x3a_ch1-FLASH_RESET_N_ch2-INIT_B.JPG08:26
wpwraklekernel: top is FLASH_RESET_N (all is fine there), bottom is INIT_B08:26
wpwraklekernel: looks as if it hits a CRC error at +75 ms, retries, hits another CRC error at +145 ms, and then succeeds08:27
wpwraklekernel: now, i wonder if that CRC error would be in the standby or the "regular" bitstream. does the "regular" bitstream also use a load mechanism involving INIT_B, DONE, etc., as the initial hardwired loader ?08:28
wpwrakafk for ~20 min08:33
awwpwrak, bitstream files under: http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/bitstream/08:34
aw0x3a has two read back files08:35
lekernelif you only powered up, it only reads the standby bitstream08:44
lekernelthe regular bitstream is only read after middle pushbutton is pressed08:44
xiangfu_aw, read the mac address, I can try.08:45
wpwraklekernel: ah, excellent08:47
wpwrakthe two 0x3a bitstreams differ from each other.08:50
wpwrakhere are the first few differences: http://pastebin.com/kLcGuu9a08:51
wpwrakhere's a better section: http://pastebin.com/d9nXTPbY08:52
aw_wow..bad with each other08:53
wpwrakDQ7 or DQ15 seems to have trouble08:53
aw_that was done by 'diff -u <(hexdump first-file) <(hexdump second-file)'?08:53
lekernelcan we read the bitstreams several times?08:54
wpwrakalmost :)08:54
wpwraklekernel: that's already from two successive reads of the same NOR (no reflash in between)08:54
wpwrakaw: this would be the command I used: diff -u <(hexdump -C 0x3a-1.bit) <(hexdump -C 0x3a-2.bit)08:55
wpwrakaw: the -C adds the ASCII column on the right side08:55
aw_wpwrak, okay..thanks, i record cmd first. ;-)08:55
lekernelmaybe that's just a urjtag bug?08:56
aw_wpwrak, okay08:56
lekernelurjtag won't use the same timings as the configuration system08:56
lekernelso if you get intermittent read failures, it doesn't mean much08:56
wpwraklekernel: hmm, could be. would you expect urjtag to always have such issues ? or just with that usb-jtag board ?08:56
wpwrakaw_: when you do your experiments, does each M1 has its own usb-tag board or do you use the same usb-jtag board for all the M1s ?08:58
lekernelalso, if it's a problem on a data line, why don't we get problems when writing too?08:58
lekerneland why is the software CRC in the test tool passing most of the time?08:58
aw_wpwrak, each board has its own usb-jtag board08:58
lekernelthis simply looks like urjtag bugs to me08:58
wpwrakaw_: can you please read the 0x39 bitstream a second time ?08:59
wpwraklekernel: let's find out :)08:59
aw_wpwrak, okay08:59
wpwraklekernel: this board hasn't booted in its life so far, so we haven't made it to the software CRC yet09:00
lekernelit always failed to boot?09:00
lekernelthen maybe this is the problem09:00
wpwraknow fix2b has been applied and it seems to work a little better. but still not okay09:00
wpwrakmeanwhile, fix2b has "cured" two boards (i think)09:00
wpwrakso this is a new/different problem09:01
lekernelwhat is fix2b? disconnect INIT_B?09:01
lekernelthis should not have any influence09:01
wpwrakand also check D16 and replace if it looks suspicious09:01
lekernelexcept if we use crappy diodes09:01
wpwrakwe do :)09:01
wpwrakadam's current procedure is to disconnect INIT_B on the boards "in the cluster", then check TP36 and TP37 voltages. also measure D16 in-circuit, which seems to work more or less reliably. (he has removed a few good diodes, though)09:03
wpwrakah, and C238 once had an issue too09:03
wpwrakso the whole fix2 rework is a bit fragile09:03
wpwrakthe joy of hardware ;-)09:04
wpwraklekernel: if you think this is bad, you should have seen how things went at openmoko :)09:04
wpwraklekernel: response times measured in days, unexplained departures from the procedure you asked them to perform, quick nonsensical ad hoc fixes thrown into the mix, and so on. pure chaos.09:06
aw_wpwrak, hehe..at least all these are done myself though. not OM everyone could involved then you couldn't find out the root cause. ;-)09:06
wpwraklekernel: it once took me about half a year just to figure out whether they had fixed a missing resistor on the base of a transistor ...09:07
aw_wpwrak, there's a fact: since i have to improve my soldering , but seems hard a bit. :)09:07
lekernelhere we get intermittent and weird problems that redefine peskiness09:08
lekernelthat compensates09:08
wpwrakaw_: (everyone could play) yeah, the "chain of command" was a little ... strange over there :)09:09
wpwrakit got much better if you were physically present, though. shorten the loop and catch suspicious activities quickly :)09:10
aw_wpwrak, and i tried to openly as possible as i can. ha ;-)09:10
aw_wpwrak, so how's next after 0x39's dump?09:10
wpwraklekernel: some of the component issues are indeed a bit surprising to me09:10
aw_if no err with each other?09:11
wpwrakaw_: upload the dump and then we'll see if 0x39 also changes from dump to dump. if yes, then the dumps are worthless. if the 0x39 dumps are the same, then we can try 0x3a with the usb-jtag board of 0x3909:11
aw_wpwrak, i see09:13
wpwrakhmm, in the dumps, is the first byte DQ0-DQ7 or DQ8-DQ15 ?09:14
wpwraklekernel: ah, here's a way to test whether it's the DQx bus or usb-jtag: if is's always the same bit that changes, then the bus is the likely suspect. else, it's something else.09:15
Action: wpwrak writes a tester09:16
aw_wpwrak, http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/bitstream/0x39-standby1.bit/09:16
wpwrakidentical to the first 0x39 dump09:17
aw_good..so now use 0x3a with the usb-jtag board of 0x3909:18
wpwrakyup, let's try that09:18
aw_so let's read twice or just one time ?09:20
wpwrakhmm, let's do it twice09:20
wpwraksince it will almost certainly differ from 0x3a-1 and 0x3a-209:21
aw_okay..so i name both as -3 and -409:21
aw_wpwrak, http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/bitstream/0x3a-standby3.bit/09:34
aw_wpwrak, if both -3 and -4 are identical, this'll be a trouble for me. :)09:36
wpwrakaw_: you can stop09:36
wpwrak0x3a-3 is identical to 0x3909:36
aw_wpwrak, big trouble now. :(09:36
wpwrakthrow away the usb-jtag that was in 0x39 ;-)09:36
wpwrakokay, now: reflash09:36
wpwrak(reflash 0x3a)09:37
aw_okay. ;-)09:37
lekernelwpwrak: from what we have seen so far, it seems to be bits 7 and 1509:38
lekernelfirst byte is DQ8-509:38
Action: wpwrak adds big-endian mode to the bit comparer09:41
aw_wpwrak, well...here I have usb-jtag boards with rc1 and rc2 version.  I have to know if they are different. 0x39 and 0x3a used the same usb-jtag rc1 vesion.09:42
wpwrakaw_: heh, no idea what the differences are ;-) maybe it was also just a bad connection. we can find out later.09:43
aw_0x3a: reflashed done but d2/d3 dimly lit still there.09:43
aw_sure we find out later.09:43
aw_i didn't power off09:44
aw_so let's quickly measure some TPs.09:44
aw_tp36/tp37 stay well 3.3V09:47
wpwrakthe bit errors aren't uniformly distributed but affect most bits: http://pastebin.com/KfWwu3vb09:48
aw_init_b keeps zero.09:49
wpwrakaw_: now let's read back the bitstream09:49
wpwrakthe board doesn't know it yet, but it *will* boot today ;-)09:50
aw_wow~ i am expecting that boot. :)09:51
wpwrakresistance is futile :)09:51
wpwrakjust on the radio: there's some marihuana plantation burning (somewhere in buenos aires, it seems). and they say "some of the firefighters are affected by the smoke" ;-))09:55
aw_wpwrak, so how's best and quick way that i can know if usb-jtag board is bad while testing m1?09:55
aw_via 'diff" cmd to know?09:56
wpwrakfor now, that seems to be the best test, yes. actually, it could also be the M109:57
wpwrakbut we'll find out soon :)09:57
wpwrakbit comparison utility is here: http://projects.qi-hardware.com/index.php/p/wernermisc/source/tree/master/bitcmp/10:00
wpwrakoh, wait. there's a bug :)10:02
aw_wpwrak, http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/bitstream/0x3a-standby4.bit/10:03
wpwrakhmm, it's corrupt10:04
aw_this is used 'good' usb-jtag of 0x39 and dump from just reflashed10:04
aw_umm..so 0x3a-4 is not identical to 0x39?10:05
wpwrakcan you download again ?10:05
wpwrak0x3a-4 is very different from 0x3910:06
wpwrakhere's the beginning: http://pastebin.com/TfyV0f7W10:06
wpwrakand then it gets much worse10:06
wpwrakstrange patterns: http://pastebin.com/vdTjuDcy10:10
wpwrakwhat's interesting is that 0x3a-2 was correct10:13
wpwrakso the errors some to come and go10:14
aw_you had have suspected before about usb speed transmission effect, will this related to that? or start to think if it might be a flash chip problem itself?10:14
lekernelwpwrak, have you tried a board that works? that might just be stupid urjtag bugs10:14
lekernellet's not spend any time on those10:14
wpwraklekernel: yes, 0x39 is a "good" board10:14
lekernelok, and if it had so many failures, the software CRC wouldn't work so well10:14
wpwraklekernel: and two dumps from 0x39 were identical10:14
lekernelso the flash really behaves like crap on that 3a board which doesn't work...10:15
wpwrakwhat's odd is that the bit position changes10:16
lekernelmaybe we simply have sourced broken flash chips10:16
lekernelis the pattern reproducible with GDB read memory command?10:17
wpwrakdoesn't gdb need the BIOS ?10:17
lekernelonly you won't be able to access the SDRAM10:17
lekernelyou can simply 'pld load' the SoC and GDB will work no matter what10:18
wpwrakah, great. maybe you can walk adam through gdb use then (i'll be watching, since i don't know the process either)10:18
aw_the flash chip this time i ordered from authorized here Taipei (WPI), it should be okay though i think.10:18
wpwrakaw_: should i ask where the other NORs came from ? ;-)10:18
aw_WPI taipei10:18
wpwrakso the NOR in the rc3 boards is from WPI ?10:19
aw_in one batch of order10:19
aw_no splitted shipments10:20
wpwrakbetter than "Flash soup kitchen" in wolfgang's backyard ;-)10:20
aw_ordered 96pcs in one batch10:21
aw_but not sure if i have stock now10:21
aw_i hope smt sent me all back.10:21
wpwrakyeah, let's hope the NOR is good in general. there could still be the occasional bad chip, of course. either factory-bad or didn't like SMT or whatever10:21
aw_well..but do you have any idea on 0x3a? or let's still name it as 'possible NOR instability'? ;-)10:22
aw_then we back to see 0x32. :)10:23
wpwraklekernel: have you heard of "baking" ? that's also a fun thing: components absorb water. some only very little others more. when you SMT them, the water evaporates and the steam pressure may crack the plastic ... somewhere. lots of fun to debug :)10:23
wpwrakdid you do the 2nd download ?10:24
wpwrakinteresting. it's identical to the previous one10:26
wpwrakso the corruption occurred when writing, not when reading10:27
wpwrakso .. can you please reflash ? :)10:27
aw_wpwrak, yes, the 'baking' thing is a normal process in smt factory. i saw them do this thing.10:27
wpwrakoh, and please upload the file you're using for the flashing, too10:28
aw_wpwrak, you meant the script file or the results of reflashing log?10:28
wpwrakthe binary file that's the input of the script10:29
wpwrakor if you don't know which file this is, the script first10:29
aw_i upload script file first then we ask xiangfu which is the exactly bin as the input under my folder.10:31
aw_two scripts but now i am using 'reflash.sh', so cmd like this: ./reflash.sh 00 3A10:34
wpwrakthe file seems to be   ../standby.fpg   (?)10:35
aw_okay..found it. lemme upload it10:36
wpwrakexcellent. thanks !10:37
wpwrakthe data read back from 0x39 is indeed correct10:38
aw_now reflashed is done: log is here: http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/log/urjtag_3A.log10:38
aw_but scrolling down to the bottom. :-)10:39
aw_it will be added into file everytime I reflashed10:39
wpwraki see. okay, let's try to boot. maybe it works ;-)10:40
aw_wait...you said 0x3a-5 is identical original file standby.fpg?10:40
aw_but now d2/d3 are still dimly lit. did i understand correctly?10:41
wpwrakno, 0x3a-3 was identical10:41
wpwrakand 0x3a-210:42
aw_0x3a-5 was not, wasn't it?10:42
wpwrak0x3a-4 = 0x3a-5, but different from 0x3910:42
aw_got it10:43
wpwrakdid you try to boot ?10:43
aw_since d2/d3 is dimly lit now. but let's try to press middle btn first10:43
aw_if not10:43
aw_i go for power off to see if d2/d3 is fully off.10:43
aw_can't boot surely while dimly lit10:44
aw_d2/d3 dimly lit after power - cycle. :(10:45
wpwrakso still no go. hmm.10:45
wpwrakokay, can you please take two more dumps ?10:45
wpwrakand then 0x3a goes back to the queue, "NOR mystery corruption"10:46
aw_hehe :)10:46
wpwrakafter that, i'd like three more dumps from 0x39. to make sure that the reading does indeed work. that way, we can be sure we can use this tool to analyze future NOR issues. (or, if the reads of 0x39 also yield inconsistencies, then we know that we don't have a reliable tool :)10:49
wpwrakbut first the two from 0x3a10:49
aw_hmm...good idea about 'reliable tool' preparation indeed.10:51
aw_so I'll use current usb-jtag (i.e the original good usb-jtag we guessed) on 0x3a to 0x39 too.10:53
wpwrakno, keep them as they are10:53
wpwrakthe original 0x3a usb-jtag is now in M1 0x39 and the original usb-jtag from 0x39 is now in M1 0x3a, correct ?10:54
wpwraksince M1 0x3a is acting weird with both, the usb jtag may be okay. we'll test this implicitly when taking the dumps from 0x3910:55
aw_so later we dump 0x39, will we use the 0x39's original jtag board? is that you wanted?10:55
wpwrakno, M1 0x39 with usb-jtag 0x3a10:56
wpwraki.e., don't swap the usb-jtag. use them as they are now10:56
aw_got it10:56
wpwrakthis is identical to the one you had before reflashing11:01
aw_man..so it's not identical to the latest reflash. :(11:02
aw_i still reading another one...11:03
wpwrakvery very strange ...11:03
wpwrakmaybe DQ15 now simply fails consistently11:09
aw_i hope this is not radiation problem though..but can be more clear after three dumps that worked 0x39. :-) exciting to know the results about 0x39 later.11:09
wpwrakthe problem strikes on average every 33.5875 bytes11:10
wpwrakit seems way to reproducible to be just something random11:10
wpwrakthe differences are only in the first third of the NOR. then, suddenly, all is good11:14
aw_hmm...so let's start 0x39 workable board to see secret though. ;-)11:15
wpwrakhah, 0x3a-7 is different ;-)11:15
wpwrakyes, on to 0x39 !11:15
wpwrakmaybe put 0x3a in the fridge :)11:16
aw_i think before i read 0x39 back , to boot up again and to see CRC check ?11:17
aw_or no need though. ;-)11:17
wpwrakah no. i had made a mistake. 0x3a-7 is the same as 0x37-6. okay, that's what i expected.11:18
wpwrakwell, why not :)11:18
GitHub25[rtems] sbourdeauducq pushed 1 new commit to mmstaging: https://github.com/milkymist/rtems/commit/8d6bc82d5a56faaae02ec9e1b25a2da4a19714b611:18
GitHub25[rtems/mmstaging] Merge branch 'master' into mmstaging - Sebastien Bourdeauducq11:18
aw_wpwrak, well...good that -6 & -7 at least they are reflashed again11:18
wpwrak0x3a is consistently wrong now. at least we've achieved that much ;-)11:19
aw_alright: CRC pass and rendering too11:20
aw_0x39 reading...11:21
wpwrakhere's the error distribution in 0x3a: http://downloads.qi-hardware.com/people/werner/m1/tmp/errors-3A.png11:23
wpwrakalmost all in the first third11:23
wpwrakand yes, 20000 of them. no surprise it doesn't boot :)11:26
aw_is there theory about this curve you did distribution?11:27
wpwraki don't see anything revealing there11:28
wpwrakmaybe i'll find something later :)11:28
aw_or is a statistics?11:28
wpwrakfor now it's just "bad" and "weird" :)11:28
wpwraklet's try the fridge approach. or maybe freezer. i guess the board should be okay with that too.11:29
lekernelthis rather looks like a bad flash chip, no?11:29
lekernelthose are _read_ errors, right?11:29
wpwrakif there's anything even remotely temperature-related in the behaviour, the fridge/freezer will uncover it :)11:30
wpwrakno, write errors11:30
wpwrakall on DQ15. it read back several times perfectly11:30
lekernelso read always works reliably now?11:30
wpwrakalso, the exact same errors occurred in two independent writes11:30
wpwrakso it seems11:30
lekernelthen we might blame urjtag too11:31
lekernelbad write timing, maybe11:31
wpwrakbut the exact same pattern ?11:31
lekernelimo the next thing to try is xilinx impact11:31
wpwrak"same" as in "bitwise identical"11:31
lekernelthat's the standby bitstream right?11:32
wpwrak(impact) ah right, we have that too11:32
wpwrakyes, standby11:32
lekernelaw_, do you still have your xilinx jtag cable?11:32
lekernelok i'm preparing a .mcs11:32
aw_lekernel, yes, i have that11:32
lekernelaw_, ok, wake up your ISE installation11:32
lekernelwe will reflash that problem bitstream with impact11:33
wpwrakaw_: please take 0x3a out of the fridge again. we need it in the torture chamber ;-)11:33
wpwraklekernel: that somehow sounds as if it involved a hammer :)11:33
lekernelyes, tough problems need tough solutions11:34
lekernelok, digging out the git history, the mcs generation command is: promgen -w -p mcs -o standby.mcs -s 32768 -u 0x00000000 ../standby/build/standby.bit -bpi_dc parallel -data_width 1611:34
lekerneli'm resynthesizing the .bit atm ...11:35
wpwrakwe have the bit somewhere ...11:35
lekernelit should be quic11:35
lekernelit's a small design11:35
lekernelthat's .fpg11:35
wpwrakah, yet another format ?11:36
aw_wpwrak, http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/bitstream/0x39-standby2.bit/11:36
lekernel.bit is the xilinx "standard" format with header11:36
aw_wpwrak, still need other two, right?11:36
wpwrakah, wait11:36
lekernel.fpg is raw flash content, with the words reversed to meet the idiosyncrasies of the way the fpga reads the flash, i.e. LSB first11:36
lekernelok I have the .mcs, emailing it to aw_11:37
aw_lekernel, okay11:37
wpwraklekernel: this is the "original" bitstream adam used (fpg, though): http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/tool/standby.fpg11:38
wpwrakaw_: (other two) lemme check first ...11:38
aw_wpwrak, it still be good to know how 0x39 will be reliable or not though. pls check it. thanks. ;-)11:39
wpwrakaw_: yes, please keep them coming11:39
wpwrakaw_: 0x39-2 is good11:39
aw_wpwrak, great11:39
aw_keep reading11:39
aw_lekernel, received standby.mcs, tks11:41
aw_wpwrak, after these three dumps, i go for dinner first. ;-) sorry11:41
togii was at ccc last week, but i missed the milkymist talk :/ anyone know if it's available somewhere?11:41
aw_and when I'm back. let's to see using xilinx tool. yup..long time not use it :-)11:43
wpwrakaw_: we should put you on a sushi diet. maki, to be precise. then you can quickly eat a bit each time you have to wait for some up- or download :)11:45
aw_oah~yup...i do really sorry on this.11:46
wpwrakthinking of it, that would work for me too. i have a sushi restaurant just around the corner. and they do delivery :)11:47
wpwrakwell, actually japanese restaurant. but it seems their non-sushi stuff isn't so great.11:49
aw_seems i have to buy more foods in preparation.11:49
lekerneltogi, http://media.ccc.de/browse/conferences/camp2011/cccamp11-4412-latest_developments_around_the_milkymist_system_on_chip-en.html11:49
aw_wpwrak, http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/bitstream/0x39-standby3.bit/11:53
wpwrakalso good11:54
aw_so this means usb-jtag boards are not the problem source at least, right?11:58
lekernelaw_, let's see how it goes with impact11:59
lekernelflash the .mcs and see if the standby bitstream works now (ie. good readback + LEDs go fully off when power is applied)11:59
aw_lekernel, sure but lemme go out for foods first ;-)12:01
lekernelenjoy :)12:01
aw_the third one hasn't been finished though.:)12:01
togilekernel: thanks!12:05
aw_wpwrak, http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/bitstream/0x39-standby4.bit/12:07
wpwrakaw_: also identical. thanks !12:08
aw_wpwrak, great! so this shows up that we currently no need to worry usb-jtag board. i'll be back soon12:08
wpwrakyup, usb-jtag looks good. enjoy your meal !12:09
lekernelwpwrak, thanks so much for your help.12:15
wpwrakno problem. it's fun ;-)12:16
lekernelwpwrak, to sum up now, it may seem that we have a combination of a) unreliable writes b) intermittent reset circuit fuckup that causes boards to fail in the field?12:16
wpwraki think b) is starting to disappear. we may not have found all the critters there, but at least some.12:17
wpwrakwhat causes a) is still a mystery. oh, and we also had changes between reads. so it's all still foggy.12:18
lekernelyou said that reads were working reliably?12:18
wpwraknow they are. on 0x3a, there were successive reads with differences.12:18
lekernellet's always use impact on problem boards now12:18
lekernelit received more testing than urjtag and the usb-jtag board12:19
wpwraklet's first see what impact impact makes :)12:19
scrts2who is the one here coded ethernet? I wonder if the milkymist is connected to a bigger network through a few switches, is there a packet queue for packets, which are received in different time or are duplicated? e.g. ip header identification field shows, that the later packet has identification value smaller than the previous packet, which means that this particular packet must be processed before the other13:22
wpwraknow let's find out what impact "impact" has :)13:23
aw_wpwrak, hi yes, i am looking for my rc2's previous script~ phew13:54
aw_lekernel, i tried to reflash 0x39 firstly via xilinx jtag14:01
aw_lekernel, http://pastebin.com/81cix6fb14:02
aw_but i have question: while reflashing standby.mcs file , do the d2/d3 flash? i forgot that if they must be flashed via xilinx tool.14:04
aw_this is i currently use for xilinx jtag to reflash standby.mcs14:07
lekernelaw_, all you should do is 1. load the standby.mcs I sent you using the impact gui (not the script) 2. test if the flash was correctly written14:07
aw_hmm...okay..i go to open impact gui first14:08
lekernelactually that script you pointed might work as well14:08
lekerneljust don't forget the template.cmd file14:09
aw_hmm....no included template.cmd under folder...try again14:10
aw_mm..no template.cmd already there.14:12
aw_i opened impact gui. as i knew before: only used this to do read device status/device id etc... I 've not loaded into standby.mcs with this iMPACT before. only used script. :(14:19
aw_how to load standby.mcs via iMPACT gui? i need to set many parameters?14:19
lekernelcreate new project, select "autodetect devices with boundary scan", then when it asks whether you want to program a flash attached to the fpga say yes and select the .mcs14:21
lekernelit's completely trivial14:21
lekerneli cannot give you step by step instructions, I lost the ribbon cable of my xilinx jtag cable14:22
FallenouI confirm, it's trivial14:22
Fallenoueven for non fpga-expert like me :)14:24
aw_man! created a project as autodetect with boundary scan. now "Identify Succeeded", but which item that I go for selecting my *.mcs?14:32
aw_Fallenou, he..seems not trivial for me. :)14:33
Fallenouwell you want to put the bitstream in the flash ? or just program the fpga ?14:34
aw_load standby.mcs file in fpga14:36
Fallenouright click on the FPGA14:36
Fallenouand there should be a menu element that says "load bitstream" or something like this14:36
aw_oah~ i see it, thanks14:37
Fallenouassigner configuration file14:37
aw_i assigned done with 16bit data bus/BPI/Flash chip14:38
aw_and ?14:38
Fallenouwhen you have assigned the configuration file to the proper device14:39
Fallenouthen you can do thing like right click, configure or something like that14:39
Fallenou(I don't have impact on my computer, sorry)14:40
wolfspraulwow, need to read the backlog...14:40
aw_oaw~ i see...must right click on "flash" icon then program it. :)14:41
aw_now it's programming...14:41
Fallenouif you right click on the "flash device", then you are programming the flash14:41
Fallenounot the fpga directly14:41
FallenouI don't know what you are trying to do exactly though14:41
aw_oah~ man! but good now my d2/d3 is fully OFF now..14:42
Fallenouoh ok reading backlog I understand14:42
Fallenouyou should be ok14:42
aw_Fallenou, i saw the console with the most likely message same as script though. :)14:43
aw_lekernel, i right clicked on 'Flash' icon not fpga itself, is that right?14:44
Fallenouok good14:44
aw_hm...i lost my self though14:45
lekernelyes, click the flash icon14:45
lekernelwe do not care about what the leds are doing while you are in impact14:45
Fallenouaw_: if you just want to reflash the board, so that the board would be able to boot without being plugged to a computer, then yes14:45
aw_so now 0x39 boot up and rendering well14:46
aw_so now let's go for 0x3a :)14:46
lekernelok, so it simply seems urjtag has some bugs that make writing unreliable particularly at the beginning of the flash14:47
lekernelif you have boards that do not configure at all, give them the impact treatment14:47
aw_just noticed that d2/d3 is fully off after xilinx tool finished programming14:47
lekernelah, hm14:47
lekernelno, power cycle the board14:47
aw_yes, 0x39 i powered cycle . it works well now. :)14:48
aw_boot up and rendering14:48
aw_so let's see 0x3a via xilinx tool next :)14:49
wolfspraulas usual. it's not great but if the xilinx tool is more reliable, we should probably always use the xilinx tool.14:49
wolfspraullekernel: what do you mean with "writing unreliable particularly at the beginning of the flash"?14:50
wolfspraulhow is that possible?14:50
lekernelwolfspraul, just fix the boards that did not pass with impact14:50
lekernelif the CRC check is good, then writing was ok...14:51
lekernelsomeone needs to fix this annoying bug in urjtag, but later...14:51
aw_one question first: will i need to "identify" fpga eveytime when i am going to program a new board?14:52
wolfspraullekernel: ah yes, of course I agree. But what is this bug?14:54
lekernelsome pesky and mundane time sink14:55
lekernelnothing very interesting I think14:55
aw_hmm...i answered my question, just directly click 'flash' icon to program though. :)14:56
aw_0x3a: http://pastebin.com/mnV0US5W14:57
aw_copied them from xilinx iMPACT's console. :)14:58
aw_recorded first though.15:00
wolfspraullekernel: I see. bug dismissed I guess :-)15:01
wolfspraulwe can definitely use Impact for the rc3 run, but then I will try to find at least a workaround for the bug.15:02
aw_good that xilinx iMPACT have readback function, but it read failed15:02
wolfspraulI guess what is does is that when we write into nor, what arrives is not what we wrote?15:03
wolfspraulnow that we are on Impact, we can fix Impact issues :-)15:03
wolfspraulI haven't completed the backlog yet, but is it possible that a wire to the nor chip is bad? do you want to try resoldering the pins?15:04
wolfspraulI'm still reading backlog though, do what you think is right...15:05
kristianpauli jsut receiver (at work) a network appliance it said soemthing interesting, Memory test 4hr. System Stress test 1hr15:05
kristianpauldo we have memory test in milkymist?15:06
aw_i programmed 0x3a again, still failed while "Reading device contents..." ~ phew~15:06
aw_wolfspraul, not plan to soldering pins now.15:07
aw_i'd rather tomorrow morning go for other pieces to keep on fix2b rework15:07
aw_now...just back to 0x32 to see if i can fix like we did this morning...it's a long day story though.15:08
aw_0x39 example: good standby.mcs program log - http://pastebin.com/QuCz5fZk15:12
wolfspraulI lost overview with 0x32 0x39 0x3A15:14
wolfspraulthe backlog is scary, I cannot follow all details :-)15:14
wolfspraulI think we should definitely move forward to other boards15:14
wolfspraulnot get stuck15:14
wolfspraulI just need realibility that we are able to produce 100% stable and tested boards, so we can start selling.15:14
wolfspraulit seems fix2b is good15:15
wolfspraulI mean I find no evidence in today's long work that there is any problem with fix2b.15:15
wolfspraulso I think we should continue with more boards from the 19 and fix2b.15:15
wolfsprauland if there is a problem, just move to the next board.15:15
wolfspraulaw_: do you agree?15:16
wolfspraulif you feel better, always use Xilinx Impact. Impact or reflash_m1.sh - your choice.15:16
wolfspraulbut pick one and stick to it15:16
wolfspraulah, finally finished15:18
aw_wolfspraul, yes, agreed, Werner & me just tried to discover others we may pretty not sure. even for if usb-jtag is the problem source, but now this consideration is gone15:18
wolfspraulbut it seems Xilinx Impact did not help :-)15:18
wolfspraulXilinx Impact only showed right away that the read failed15:18
wolfspraulin that case I would continue to use the jtag-serial board and reflash_m1.sh15:19
aw_wolfspraul, no no...the xilinx tool i have only standby.mcs file from lekernel, with this only. i can't rely on xilinx for reflash all other boards15:19
wolfsprauland Xilinx Impact did not improve anything if I understood the backlog correctly15:20
wolfspraulso just use reflash_m1.sh15:20
lekernelwolfspraul, it did fix urjtag write problems with one board15:20
aw_just like lekernel said if I have some trouble with NOR problems, this xilinx tool with standby.msc could be helpful.15:20
wolfspraulI'm overwhelmed with the details of the backlog.15:20
wolfspraulI thought no15:20
wolfspraulit just said 'failed' by itself15:21
wolfspraulaw_: I think tomorrow we need to go to full speed mode. not get stuck on a few boards.15:21
wolfsprauljust power through the whole batch of 19...15:21
wolfspraulif anything doesn't work or is unclear, just take a note and move to the next one15:22
wolfspraulI feel pretty good about fix2b now15:22
wpwrakaw_: hmm, but 0x39 worked before. and 0x3a fails with impact as well. so it seems with 0x39 both work and with 0x3a neither.15:22
wolfspraulwpwrak: did we find any evidence for problems with fix2b today? doesn't look like to me...15:23
wpwrakokay, all agree :)15:23
wpwrakwolfspraul: in fix2b we (still) trust :)15:23
aw_agreed though...We only reworked 4 boards only this morning, and got 2 unknown reasons caused. Werner 7 me tried to figure this out hopefully..just don't want more boards like this...surely need to speed up...but tough decision though..15:24
lekernel<aw_> yes, 0x39 i powered cycle . it works well now. :)15:24
wpwraki'd suggest putting 0x3a in the fridge. see if temperature changes it. we've has it work better and worse and the course of these experiments. very confusing.15:24
wpwraklekernel: 0x39 worked before :)15:24
lekernelso why the hell did we flash a board with impact that worked before ?!?15:24
wpwraklekernel: adam tried a good board first. only then the problem board.15:24
aw_0x39 : both usb-jtag & iMPACT all works well15:25
wpwrakwolfspraul: we verified that urjtag can read back the NOR quite reliably. so we can use it in the future for verifications, if necessary15:26
wpwrakwolfspraul: what's a bit troubling is that there doesn't seem to be a proper verification of what gets written. at least we once got completely bogus content flashed. maybe the ... "verify skipped" (?) in the logs is a hint :)15:27
wpwrakaw_: trying any more boards today ? or entering suspend mode ?15:39
aw_wpwrak, yup..i gotta entering suspend mode to myself to start another day.15:41
lekernelaw_, did we have similar impact flashing problems in run 2?15:41
lekernelthis sounds like a brand new problem, no?15:42
aw_lekernel, in rc2, we finally got 35/40 pcs done15:42
lekernel(and like crappy flash chips, too)15:42
kristianpaulwhy are crappy the flash chips? is that a new discovering on rc3?15:43
lekernelright now there are 51 working boards?15:43
kristianpaulsorry i missed all backlog..15:43
lekernelaw_, did the missing 5 run2 boards have similar flashing problems?15:43
aw_lekernel, and those 4 pcs rest were mostly to yes d2/d3 dimly lit problems but at that time we guessed they were damages by "fast power-cyling"15:43
lekernelthere are no damages by fast power cycling15:44
wpwraklekernel: not sure if it's the NOR. could also be the FPGA. or soldering on either.15:44
aw_and eventually those 4 boards are finally "dead" though... so which is if actually belongs to flash NOR problems, this is really good question!15:44
lekernelwhat do you mean, "finally dead"?15:44
wpwraklekernel: what we've seen with board 0x3a were 1) good NOR content but (variable) errors on read and 2a) bad NOR writing with 100% reliable read (of the bad data) or 2b) good NOR writing (and unrelated failure to configure) with 100% reproducible corruption on NOR read15:46
aw_thus cant reconfigure, but at that time we thought it was an unnormal production process on switch fast power-cycling then.15:46
wpwraklekernel: so, a bit scary that one. a moving target. but it think we did enough tests to be reasonably sure of these results.15:47
lekerneltry replacing the flash chip15:47
lekernelwolfspraul, can we move forward with the other boards?15:48
aw_well...these failure boards I'll leave them apart firstly15:48
aw_lekernel, tomorrow i go directly for other boards with fix2b circuit15:49
lekernelaw_, what is your next target?15:49
lekernelwhat 'other' boards? the 51 working ones?15:49
aw_lekernel, no the first 19pcs boards (including today's 4 boards already) and see what they move.15:50
wpwraklekernel: there are some more in the fix2b "cluster"15:50
lekernelaw_, you are not touching the 51 working/available ones, right?15:51
aw_so I'll go for rest 14pcs cluster tomorrow firstly15:51
aw_lekernel, right15:51
aw_well...time to go15:52
wpwraklekernel: the 51 "available" ones should at least be checked. some may also need fix2b and are just at the edge of not working. some of the boards in the cluster have worked a little once and then went worse, so the fix2b problem isn't just black and white15:53
aw_I'll work on 1st 14 rest boards.15:53
aw_good night15:53
wpwraklekernel: but it would be good to be able to test them in a non-intrusive way, to avoid more rework15:53
wpwrakaw_: sweet dreams ! :)15:54
lekernelwpwrak, should we apply fix2b on all the working boards?16:02
lekernelthey look nicer after that (no messy cable)16:02
rohhey. how is it going?16:16
wpwraklekernel: i'm slightly in favour of applying it everywhere, yes16:22
wpwraklekernel: seems to be low-risk enough16:22
wpwraklekernel: and yes, we get rid of the cable. all evidence of human fallibility destroyed ;-)16:23
wpwrakroh: today we had one with problems somewhere between FPGA core, NOR, and back. or, rather, it had us.16:24
wpwrakroh: to make things more interesting, the problem pattern shifted. first it looked merely like a usb-jtag problem, but then that part turned out to be quite reliable but NOR reads or writes caused trouble.16:25
rohoh. pcb routing problems?16:26
wpwrakroh: fix2b is still looking good, though. we're getting further than we used to.16:26
wpwrakroh: hard to say. could be bus, could be I/O pad drivers dying, could be a bad NOR bank, ...16:27
rohwpwrak: whats fix2b?16:27
rohi just learnt about burning streets in london etc. 10 days of camping take its toll (there was ip and power in my tent but i was too drunk and met too many interresting people to care)16:28
wpwrakroh: fix2b = remove the diode between INIT_B and PROGRAM_B (and the wire going around the board). also, check that diode D16 is okay. some aren't, and let FLASH_RESET_N get pulled low or into an undefined state16:31
wpwrakroh: fix2b solves: the problems with usb-jtag flashing stopping at "bit stream length = 14xxxxxxx" and failure to (re)configure on some boards16:32
wpwrakroh: success rate about 50% so far on those afflicted by such problems (i.e., 2 out of 4)16:33
wpwrakinteresting detail: when NOR reading on 0x3a was a problem, the bit flips were all 0 -> 1. when the reading stabilized, the bit flips were all 1 -> 016:35
kristianpaullekernel: mm_i2l.pdf, thanks for publishing it !18:24
kristianpaulthe one about plasma looks worth to look to nice :)18:25
kristianpaulif you have more slides about HDL specific and milkymist, please share :)18:27
lekernelhave you tried the demo binary on your board?18:27
kristianpaulmilkymist demo?18:30
kristianpaulno never, i saw wolfgang to used at cparty no more18:30
lekernelno, the demo bin from masteri2l plasma18:30
kristianpaulno no18:30
kristianpauli'm at work now, and just reading rss now18:31
kristianpaulnice, that tp_files tarball is a hello world at the milkymist style :)18:34
--- Thu Aug 18 201100:00

Generated by irclog2html.py 2.9.2 by Marius Gedminas - find it at mg.pov.lt!