#milkymist IRC log for Tuesday, 2011-08-16

wolfspraulwpwrak: those sound like really interesting ideas (remove D16, second reset ic for FLASH_RESET_N), but the problem I see is in how to test the result00:35
wolfspraulI think that's also the reason why this bug has not been fixed yet. We first need to find a way to reproduce some 'bad' thing consistently, then we fix the 'bad' thing, then we verify that it's gone.00:36
wolfspraulbut how is this possible now?00:36
wolfspraulaw: good morning :-) you are early!00:39
awwolfspraul, good morning ;-)00:40
wpwrakwolfspraul: yes, let's first get some statistical data on 0x39. that one seems to be very good at generating the problem.00:51
wolfspraulwpwrak: should we do 0x39 tests now?00:53
wpwrakaw: and maybe you can check the reset circuit rework on 0x39 to see if there are any obvious issues (such as a reversed diode, capacitor not properly soldered, etc.)00:54
wolfspraulaw: can we do some 0x39 work now?00:54
wpwrakwolfspraul: first, the last status of 0x39 was that it didn't reconfigure, correct ? so the next test should be to see if it does now00:55
wpwrakwolfspraul: then repeat the power-cycling with CRC loop until it stops again (which should be soon, probably less than 10 tries, if past results are any indication)00:55
wpwrakwolfspraul: then try to retrieve the NOR content via jtag. check that it's okay. (that is, if we get this far. if the NOR chip is just messed up, e.g., held in reset, then jtag won't work either)00:56
wpwraklekernel: btw, does the FPGA assert the pull-up on FLASH_RESET_N during its built-in load process ?00:58
wolfspraulwpwrak: no we will not be able to read nor then (same as yesterday)01:00
wolfspraulof course we have to try01:00
wpwrakwolfspraul: wasn't the failure yesterday a problem with the script ?01:00
wolfsprauladam may have gone out, maybe pickup roh's second package...01:06
awokay...let's try to power on to if 0x39 can reconfigure now.01:17
awanswer is NO after powered -on with whole night01:17
awso now going to read_flash_m1.sh01:18
awstill stopped 'Bitstream length: 1484404'01:19
awwhat's next steps will you suggested? I would like to power off it firstly. ;-)01:20
wpwrakmaybe try a few times if you can read the bitstream (power on and off as you see fit :)01:20
wolfspraulaw: can you try to use Xilinx Impact and Xilinx cable? can you use Xilinx Impact to read nor? Or just detect the nor chip?01:21
wolfspraulI'm wondering whether xilinx impact would give us any new clues...01:22
wpwrakah yes, good idea01:22
wpwrak"xilinx cable" = also replaces the usb-jtag board ?01:22
awalthough this way from xilinx tool...but not sure if it can work01:22
awi do only follow the instructions from rc201:22
awdon't know if they are suitable.01:23
wolfspraulyes remove usb-jtag01:23
awwpwrak, yes, use 'xilinx cable' can instead of usb-jtag boards01:24
wolfspraulwpwrak: if we had a spare nor chip, I would suggest switching the nor chip to a new one, just to get another data point01:25
wpwraki'd save that option until the very end :)01:25
wolfspraulbut we don't have one, and taking one from another board would insert more variables into an already questionable fact finding01:25
wolfspraulwell, it could be interesting01:26
wpwrakadd two to the next digi-key/mouser order ?01:26
wolfspraulfor example if it then first worked, but after a few cycles fails again01:26
wolfspraul5 already on the way01:26
wolfspraullet's say it first works, then fails01:26
wpwrakyes, if it works at first ... :)01:27
wpwrakit's kinda major rework :)01:27
wolfsprauldata point01:27
wolfspraulI think it's not too hard, no?01:27
wpwrak56 pins .. depends ...01:27
wpwraki think my maximum is ~28. doable with chip-quick ;-)01:28
wpwrakbut maybe adam has better tools01:28
wolfspraulnah maybe not01:28
wpwrakor techniques :)01:28
wolfspraulbut in the factory they are swapping such chips within 30 seconds or so01:28
wolfspraulwould need to take a video and watch in slow-motion to see how they do it :-)01:28
wolfspraulbut a) we have no spare chip b) Adam may not be able to do the rework that easily01:31
wpwrakin any case, i'd try swapping the nor last. too much can go wrong there. also, it may destroy evidence. e.g., if there's some short01:33
wpwrakor near-short01:33
kristianpaulxray ? :)01:34
wpwrakgamma-ray laser ? :)01:35
kristianpaulhum, whay so fancy?01:37
wpwrakkristianpaul: more threatening :)01:37
wolfspraulsorry, got disconnected01:46
wolfspraulaw: any news? trying xilinx impact?01:46
awwolfspraul, this is not easy for me now. but have to try01:47
wolfspraulaw: wait01:50
aw0x39 via sudo jtag: http://pastebin.com/HafSjUGL01:50
wolfspraulif it's not easy, don't do it01:50
awi am going to see the instructions in rc101:50
wolfspraulor describe the steps one by one here, either we can make it work or not01:50
wolfspraulthat sounds like you will disappear for a long time01:50
wolfspraulif the xilinx impact test cannot be done in 5 minutes, don't do it01:51
awmeanwhile should we just send one boards to someone right away?01:51
wolfspraulI am looking at pastebin01:52
wolfspraulwhy did you do 'quit' after 'detect'?01:52
wolfspraulaw: u there?01:54
awI only know these commands though01:55
wolfspraullet's clarify first01:55
wolfspraulyou are currently working on x03901:55
wolfspraul0x39 cannot boot, when you plug in the DC cable D2/D3 become dimly lit?01:56
awunless else can tell me that i can directly use other commands to dump into file from standby bitstream. ;-)01:56
wolfspraulis that correct?01:56
awyes, now 0x39, can't reconfiguration when power-ed on01:56
wolfsprauld2/d3 are dimly lit?01:56
awnow it's in this d2/d3 dimly lit still01:56
kristianpaul(pastebin) yes, what happended with detect command ? :)01:57
wolfspraulaw: try this: turn off power01:58
wolfspraulremove jtag-serial daughterboard01:58
wolfspraulpower on again01:58
wolfspraulwhat happens?01:58
awsame d2/d3 dimly lit01:58
awif someone can lead me to use some commands in UrJTAG tool, there's must data can be dump from flash. not sure if xiangfu know this.01:59
kristianpaulyes :)02:00
wolfspraulof course the commands are here https://raw.github.com/milkymist/scripts/master/scripts/read_flash_m1.sh02:00
wolfspraulwe can walk through, but it won't work of course, since the script also doesn't work02:00
awno not use script, i mean directly used UrTAJ02:01
awit doesn't work I don't know if the script doesn't work somewhere or UrJTAG itself02:01
awbut if i enter UrJTAG, there's more commands can be used.02:02
kristianpauldetect is easy to remenber02:02
awgood, but now how i can dump standby bitstream?02:03
awfrom which address which commands?02:03
wolfspraulwaste of time imo02:03
awsorry that I very much poor on this02:03
kristianpaulmay be a voltage verification02:03
wolfspraulah good idea02:03
kristianpaulwhere?, i dont know :)02:04
wolfspraulaw: let's measure some signals :-)02:04
awso that's why i said that should we send one rc3 board to whom can dump it?02:04
wolfspraulwe are not covering up our incompetence by spreading the problem so thin that we can eventually claim it doesn't exist02:04
kristianpaulaw: what i understand problem right now is not related to NOR content, yet :)02:05
wolfspraulwe don't know02:05
kristianpaulwith this dimmy lit fpga is detected i guess but problem rais when loading bistream?02:05
awwpwrak, do you think that where/or which parts pin's signal I should measure? or I directly rework diode and C238 again?02:06
wolfspraulnot rework02:07
wpwrakmaybe do this: without power-cycling, put scope probe on TP37 (FLASH_RESET_N), scope set to AUTO, check the voltage and look for any noise02:07
wpwrakthen, keep the probe pressed to TP37 and reset or power cycle02:07
wpwraksee if it reconfigures then02:07
kristianpauldont we have a list somwhere of know voltage expected for TPs (that apply to power suply and such)?02:08
wolfspraulyou can start making one with your rc2 :-)02:08
awwpwrak, stay tuned02:08
kristianpaulgood idea !02:09
wolfspraulactually seriously you could provide some reference measurements for wires into or out of the nor flash as well, if you want to help02:09
kristianpaulyes why not, let me check rc2 datahseet for avaliable testpoints02:10
wolfspraulwe are a bit asymmetric here. Werner has the clearest mind, but no board. I have a board, but no electrical capabilities. Xiangfu has a board but is hard to reach, Sebastien is sleeping. and so on :-)02:10
wolfspraulAdam has a lot of boards but is always worried he will damage something when running this or that software :-)02:10
wpwrak(clearest mind) still with a cold, though :-(02:10
wpwraknext project: an internet-attached alarm clock ;-)02:11
wolfspraulkristianpaul: here's what werner said "maybe set trigger on OE#, then start with RP#, WE#, DQ0, A0, then do the rest of DQx and Ax"02:11
wolfspraulthose are reference measurements around the nor you could do02:12
wpwrak /msg qi-bot wake lekernel    ;-)02:12
wolfspraulit's on rc2, but those datapoints may help02:12
wolfspraulwell, I actually think Sebastien is thinking a lot about what the root cause could be, but has no striking idea.02:12
kristianpaulok as soon i can measure with no soldering cables is okay02:12
wolfspraulthis thing is really difficult because we can't pin it down02:12
wolfspraulcannot really reproduce in a controlled way02:13
wolfspraulproblem appears and disappears without us understanding why it did that02:13
wolfspraulit affects > 20 % of boards, at least. maybe with tougher testing even more. We don't know.02:13
wolfspraulwe don't know whether some boards have 'genes' that will make them never show the problem02:13
wolfsprauland so on02:13
wpwrakfor all we know, it could affect all of them02:13
wolfspraulyes, definitely02:14
awwpwrak, the TP37 while (d2/d3 dimly lit) is 259mV now02:14
wpwrakaw: by the way, did you do the visual inspection of the reset rework ("fix2") on 0x39 ?02:14
wpwrakthat's a reset !02:14
awreset status02:15
wpwrakdoes it constantly stay at ~260 mV ? or does it change02:15
awi need to power-cyle to see it02:15
awbut I bet it wil pull high once d2/d3 dimly lit is gone for sure. ;-)02:15
wpwrakdo we know at what point in time urjtag load fjmem.bit ? i.e., is there a specific step in the script ? or does it just do it automatically ?02:16
awwpwrak, TP36 is 120mV now02:16
Action: kristianpaul looks for TP3702:17
wpwrakand TP37 ?02:17
awTP36 is program_b02:17
wpwrakso we just have a permanent reset. interesting.02:17
awwpwrak, http://en.qi-hardware.com/wiki/File:M1rc2_powerOnOff_sequences_manuscript.jpg02:18
awplease be noticed that I knew a fact is:02:19
wpwrak(manuscript) yes, PROGRAM_B_2 should be high, not low02:19
awthe DONE pin will be from low to hi to show up fgpa finish reconfiguration.02:19
awwpwrak, wait02:19
wpwrakDONE shouldn't matter. it doesn't connect anywhere near the NOR.02:20
wpwrak(unless we have some interesting shorts :)02:20
awi said xilinx guy told me before and i checked the DONE pin which described the duration is "done" once fpga firstly access with flash. ;-)02:20
wpwrakokay, but DONE is TP3502:21
awwpwrak, wait02:21
awthe INIT_B will start a short duration of LOW and it acts syncronized with DONE pin reversely.02:22
awcan you see that?02:22
awso my question is:02:22
wolfspraul'permanent reset' may be a much better description of the problem we see on rc302:22
wolfspraulat least it fits with the vast majority of test behavior I can think of right now02:23
wpwrakaye. now on to the "why" ..02:23
wpwrakaw:  can you touch TP37 with a 1-10 kOhm resistor to 3V3 ? and see how the voltage changes ?02:24
awwpwrak, will flash RP# pin acts wrongly while the start situation from reset's IC's output? meanwhile this duration, will standby bitstream acts wrongly if the "start" doens't access well then corrupted somewhere?02:25
awwpwrak, okay02:25
wpwrakas far as i understand things, PROGRAM_B_2 low should also keep the FPGA in reset. so it shouldn't try to access the NOR at that time02:26
awwpwrak, it's R60 placeholder.02:27
wpwrak(r60) yes :)02:28
awwpwrak, you want me attach a 10K while power is ON02:28
awor solder it after power off02:28
wpwrakjust see how the voltage on TP37 changes02:28
wpwrakwith/without R60 "placeholder"02:28
awwpwrak, TP37 is 318mV, TP36 156mV, d2/d3 dimly lit02:33
awafter attached R60 10K02:34
wpwrakokay, so that's not it. thanks.02:34
wpwrakdid you check that the diodes have the correct orientation ?02:34
awyes, two diodes are correct. this board 0x39 surely had have reconfigured if keep it days long.02:35
awso i don't know if i directly resoldering new parts of them can solve.02:36
wpwrakhmm, tricky02:39
wolfspraulno resoldering02:40
wolfspraulwhat is the sequence the board goes through now from the moment power is applied on the DC jack?02:40
wpwraksome one is pulling FLASH_RESET_N down. but who ? could be INIT_B_2, PROGRAM_B_2, the reset chip, the FPGA, or something that's not visible from the schematics02:40
wolfsprauldoes the fpga ever start its configuration sequence?02:40
wolfspraulor it goes into permanent reset immediately02:40
wolfspraulwho is in control, in which order?02:40
wolfspraulis the fpga in control at some point? or always forced down from outside?02:41
wolfspraulwpwrak: ah yes, you think in the same direction :-)02:41
wolfspraulwho is in control02:41
wolfspraulcan't we just measure backwards?02:42
kristianpaulsorry i dont have all TPs needed to be usefull to you now..02:42
wolfspraulflash_reset_n is pulled down02:42
wolfspraulwhich timespan are we talking about between power-on and permanent reset?02:42
wolfsprauljust a few hundred milliseconds?02:42
wolfspraulthen we could scope the voltage of FLASH_RESET_N, INIT_B_2, PROGRAM_B_2 and even more and compare them side by side?02:43
wpwrakwolfspraul: (measure backwards) maybe adam can make a little "power probe" :)02:43
awkristianpaul, yes, the rc2 doesn't have them. sorry that I should have told you first.02:44
awkristianpaul, thanks though. :-)02:44
wolfspraulwpwrak: which timespan are we looking at?02:44
wolfspraulfrom power-on to permanent reset02:44
wpwrakaw: do you have throuh-hole resistors around 100 Ohm ?02:44
awwpwrak, i don't have but tell me you r idea firstly. ;-)02:45
wpwrakwolfspraul: right now, i'm interested in the permanent reset. i think NOR shouldn't be reset in this state. but i'm not 100% sure02:45
awthen I try to get it done02:45
awwpwrak, tell me firstly your idea. ;-)02:45
wpwrakaw: (idea) solder a wire to 3V3. solder the other end to a ~100 Ohm. connect voltmeter (or scope) to the open end of the R02:46
wpwrakaw: then touch things with the open end. this should do two things: 1) pull them relatively strongly to 3V3. 2) show the voltage02:46
wolfsprauloh totally. with permanent reset we are onto something.02:46
wpwrakor, maybe easier:02:46
wolfspraulI just hope the board shows it long enough :-)02:47
awwpwrak, go on02:47
wpwrakconnect multimeter in DC current measuring mode to 3V3. then touch things with the other probe. e.g., check how much current TP37 can sink02:47
wpwrakpoints of interest: TP37 (RP#), TP36 (reset chip out), the INIT_B_2 side of R15702:48
awwpwrak, okay..let's do a surgical operation on 0x39. ;-) stay tuned. :)02:49
wpwrakno surgery. just measurements :)02:49
wolfspraulkristianpaul: can you post your d2 dimly lit situation here?02:51
wolfspraulso you had your rc2 board in a state where d1 was dimly lit and it wasn't detected by jtag?02:52
kristianpaulyes, correct02:52
wolfspraulthere may be multiple bugs in this area, and that one may have already been independently fixed on rc302:53
wolfspraula lot of 'may', sorry02:53
kristianpaulsure np02:53
wolfspraulis your board back to life now?02:53
kristianpaulphew.. :)02:53
wolfspraulok good02:53
wolfspraulno don't worry02:53
wolfspraulif it breaks, you will get a new one. rc3 even :-) but please don't be reckless because of that :-)02:54
kristianpauloh no02:54
wolfspraulbut also please don't worry02:54
wpwrakwolfspraul: you just gave him a lot of reason to be reckless ;-)))02:54
wolfspraulI am the manufacturer, and I support my stuff.02:54
kristianpauli still suing it as always02:54
wolfspraulthat's why I'm so keen on getting rc3 to a higher level...02:54
kristianpaulactually i just was going to reflash as always and then... omg :)02:54
wolfsprauleven this guy hadez and others, I may just end up giving them new rc3. but one by one, first we need to make rc3 at a controlled quality level...02:55
wpwrakkristianpaul: maybe as a warm-up, unsolder the FPGA, re-ball, then solder again ;-))02:55
kristianpaulwpwrak: nah02:55
kristianpaulwolfspraul: i still trusting my rc2 a LOT for now ;)02:56
wolfspraulyou can02:56
wolfspraulit's a good board and we worked hard even then. of course we also learnt a lot since :-)02:56
wolfspraulwpwrak: your idea with a second reset ic for FLASH_RESET_N - is it the same 4.4V reset ic as we have now?02:57
wolfspraulI'm just asking in case I should get more parts :-)02:57
wolfspraulwith 'have now' I meant 'will have in a few days'02:58
wpwrakwolfspraul: yes, the same02:58
wolfspraulthat second reset ic would replace the need for logic gates?02:58
wolfspraulsounds like we have to do a few more experiments before settling on the reset circuit for rc4...02:59
wpwrakit would replace one. not sure about the other02:59
wolfspraul(which I wouldn't mix in with the permanent reset research on rc3 we are doing now)02:59
wolfspraulok, got it02:59
awwpwrak, take R60 apart, right?03:03
awthen use 100 ohm to pull high on 3V3 then measure dc current.03:04
wpwrakaw: probably doesn't matter03:04
awwpwrak, ?03:04
awstill let R60 (10k) on it.03:04
wpwrakaw: naw, let;s do it the simple way:just multimeter in DC current mode03:05
wpwrakaw: measure current between TP36/TP37/INIT_B_2 and 3V303:05
wpwrakR60 probably has no effect03:05
awwpwrak, still need 100 ohm as a limited resistor thoug, right?03:06
wpwrakif you can, it would be nice to hace03:06
wpwrakelse, just avoid touching GND ;-)03:06
awwpwrak, R60 still 10K there, TP37 (RP#, 16mA), TP36(reset pin out, 19mA), INIT_B_2 (24.7mA)03:16
awwith 100 ohm to 3V3 measured. ;-)03:16
awTP 37 now 18.9mA03:17
wpwrakhmm, they're all very strong03:18
wpwrakwhat do you get on the 3V3 pin of the reset IC ?03:19
awvoltage on TP36?03:20
awdon't understand your question.03:20
wpwrakcurrent between 3V3+100R and pin 3 of U24 (pin 3 is the one on the side that has only one pin)03:20
wpwrakexpected value: 0.0000000 A ;-)03:21
awi measured pin 3V3 of reset ic is good @ 3.3V. ;-)03:21
wpwrakthis is weird. could there be any shorts ?03:22
wpwrakhigh current on TP36 should only exist if:03:22
awi got -0.001 mA while attached pin 3v3 of reset ic03:22
awso no leakage current though. ;-)03:23
wpwrak- PROGRAM_B_2 is actively pulling low (which, afaik, it never does)03:23
wpwrak- the reset chip is pulling low (which is has no reason to do)03:23
wpwrak- something is shorted into that net03:23
awyes, TP36 (program_b) is 67mV now03:23
wpwrakand it pulls low with ~20 mA03:24
awbut don't know where surged03:24
awregards to if somewhere is short existed. this is really weird03:26
wpwraknext: connect scope to TP36, acquisition: peak, trigger: auto, slow timebase (maybe 100-200 ms/div)03:26
wpwrakthen power-cycle the board. see if it ever comes out of reset03:26
awi see a pluse fro low -> high -> low using rising edge03:31
awtrying to catch it ;-)03:32
wpwrakfor how long does it stay high ?03:32
wpwrak(roughly) picoseconds / milliseconds / days :)03:34
awwpwrak, http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_0x39_tp36.JPG03:47
kristianpaulhttp://downloads.qi-hardware.com/people/adam/m1/pic/rc3_0x39_tp36.JPG ?03:50
wolfspraulI cannot imagine a short, unless it's a short coming from (programmed) inside the fpga03:53
awwpwrak, i may use two channels to compare03:53
wolfspraulthat's because the board worked before, and got into this state without any hardware action (manual hardware action, like soldering)03:53
awwpwrak, maybe scope DONE pin? RP#?03:53
awin channel 2. ;-)03:54
wolfspraulaw: if you think that might be helpful, just do it until Werner is back...03:54
awwpwrak, forget about my last picture, see this new: http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_0x39_ch1-tp36_ch2-tp37.JPG04:11
awch1-tp36, ch2-tp3704:11
awthey are synced actually.04:11
awi'm going to scope done pin as ch2 to see different04:12
awwpwrak, ch1-tp36, ch2-tp35(done pin): http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_0x39_ch1-tp36_ch2-done-tp35.JPG04:23
awthe duration from the first pull high pulse to second pulse is thus ~180ms (which is the reset delay time)04:25
awalright..i think i need to test others after lunch04:27
awleave 0x39 aside temporarily. ;-)04:27
wolfspraulaw: yes, let's wait for Werner's feedback and continue with other boards first04:30
wolfspraul'permanent reset' is an interesting new angle, maybe we are lucky and find something there...04:30
awhopefully get secrets behind it.04:31
kristianpaulxiangfu: you can load bistream using jtag pload04:46
kristianpaulxiangfu: _but_ you mm1 soc dont support boot from anyhthin besides NOR as is today.. may be the debug ROM is linked to jtag and you can boot from there.. i dont know upto there...04:47
kristianpauli hope i'm wrong on that and i missed that part of the HDL :)04:48
wolfspraulnow that we know that at least some boards are held in a permanent reset state, some of those earlier ideas lost value (for now)04:49
wolfspraulbecause even if we could load and boot everything without nor on a functioning board, it wouldn't work on 0x3904:49
wolfspraulsame for trying Xilinx Impact (which we skipped)04:50
wolfspraulactually - is there still the chance that even on the 0x39 we have now, the fpga first reads a corrupted bitstream and then ends up forcing itself into permanent reset?04:51
kristianpaulcan we reset fpga from jtag?04:51
wolfspraulprobably not because then the access path via jtag-serial should still work, which it doesn't04:51
wolfspraulgood idea [reset fpga from jtag]04:52
kristianpaulah eys !!04:52
kristianpaulpld reconfigure04:52
wolfspraulbut that would end up doing the same thing, no?04:53
wolfspraulit would read something from nor... ?04:53
kristianpaulah, yes :)04:53
kristianpaulwell bitstream must be loaded from somwhere04:53
wolfspraulcan we tell it to reconfigure from elsewhere?04:53
wolfspraullike from what we supply over jtag04:53
kristianpaulyeah, thinkking same..04:54
wolfspraulbut first maybe reset, then reconfigure from jtag04:54
kristianpaulif the problem that fires permanent reset is on the powecyling04:54
kristianpaula pld reconfigure should sucess04:54
kristianpaulas the board is already powred,04:55
kristianpaulmy guess04:55
wolfspraulsure we can try, but let's assume there is a nor corruption04:55
wolfspraulthen it would hang itself again04:55
wolfspraulif that nor corruption triggers the permanent reset04:55
wolfspraulbut how can we flash the board when nor is still empty?04:55
kristianpaul_if_ what hangs is nor corruption and no some wrong tmings with reset IC perhaps?04:56
kristianpaulha, just wipe up a board that is know to boot and see what happen04:56
wolfsprauldoes the fpga 'give up' when there's no bitstream in nor, but later it finds a corrupted bitstream and hangs itself?04:56
kristianpaulshould be a similar state as the no corruprion, as at the end..04:56
kristianpaulhum no may be it starts okay then errors pop up and it just give up...04:57
wpwrakhmm ...04:57
wolfspraulah :-)04:57
kristianpaulor fpga dint catched bitstream corruption and lock up it self because of that?04:58
kristianpaulthats even more trickier ;)04:58
wpwrakthe pulse looks scary in http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_0x39_tp36.JPG05:00
kristianpaulagree :)05:00
kristianpaulmalfctioning diode or reset ic?05:01
wolfspraulwe can definitely try a pld reconfigure on 0x39 when Adam is back, and measure FLASH_RESET_N etc. then05:01
wolfspraulwpwrak: Adam uploaded 2 more later and said "forget the first one"05:02
wpwrakit's hard to forget what looks like a 6 V+ pulse on a 3.3 V line :)05:08
kristianpaulno no05:08
kristianpaulhorrible :)05:08
wpwrakthe others look better. some amplitudes seem too small, but maybe that's a limitation of the scope05:10
wpwrakwhat is weird is that TP36 (PROGRAM_B_2) comes down. so either there's contamination from INIT_B_2 or FLASG_RESET_N, or the reset chip triggers for some unfathomable reason, or PROGRAM_B_2 becomes an output.05:12
wolfspraulwpwrak: hmm. but what's wrong with the first picture then?05:13
wolfspraulbad measurement?05:13
wolfspraulor there was a 6V pulse on a 3.3V line?05:14
wpwraki'm curious what adam thinks he measured there :)05:14
wpwrakfor now, 0x36 looks more like a case of "multiple organ failure"05:16
wpwrakerr 0x3905:16
wpwrakmaybe put it aside and proceed with the next from the list we made yesterday05:17
wolfspraulno other ideas for 0x39 ?05:21
wolfspraulat least we can try "pld reconfigure" over urjtag05:21
wolfspraulwe can take another board and see whether we find the same permanent reset state05:23
kristianpaulI'm out..05:24
wpwrakfor 0x39, the next thing i would try with the information we currently have is to remove the diode between reset chip and FLASH_RESET_N05:24
wpwrakdecouple the two systems, at the risk of now getting real NOR corruption05:25
wolfspraulsure let's do that then05:25
wolfspraulremove the diode and it may boot again?05:25
wpwrakbut i'd be curious about what lekernel thinks of the PROGRAM_B_2 net going low05:25
wpwrakas i understand things, that can only happen if the reset chip pulls low, which it has no reason to do05:26
wpwrakbut my understanding may be incomplete05:26
wolfspraulI'm curious why Adam said we should 'forget' the first tp36 picture05:26
wpwrakif there's a condition in which PROGRAM_B_2 could become an output and pull low, that would be interesting to know05:27
wpwrakyes, me too :)05:27
wolfspraulwe could replace the reset ic05:27
wolfspraulbut I hesitate to do these kinds of things while we are in analysis mode05:27
wpwrak"do not look at the elephant" ;-)05:27
wolfspraulwell just replace with a new one05:27
wolfspraulbut maybe then our beautiful study object will work again and not tell us any more interesting stories05:28
wpwrakor maybe just leave it out for testing. afaik, the FPGA shouldn't need it05:28
wpwrakthat could happen :)05:28
wolfspraulok so05:29
wolfspraul1) find out why Adam said to ignore the first tp36 scope picture05:29
wolfspraul2) try 'pld reconfigure' from urjtag and see whether it stays in permanent reset05:29
wolfspraul3) remove the diode between reset ic and FLASH_RESET_N, see whether it boots05:30
wolfspraul4) remove the reset IC, see whether it boots05:30
wpwrakbut as a i said, there are several things that look wrong on 0x39. the 6+ V spike is worrying, if it's real05:30
wolfsprauland then maybe, take another board and check whether we find a similar permanent reset condition there05:30
wpwrakwhat does "pld reconfigure" do ? is this a reset ?05:30
wolfspraulseems like05:30
wpwrakagreed on 1)05:30
wpwrak2) also seems reasonable05:31
wolfspraulunfortunately I don't know the exact behavior of reset05:31
wolfspraulwill it automatically load the standby from nor?05:31
wolfspraulis it possible that it loads a corrupted bitstream from nor which then locks itself (the fpga) up in permanent reset?05:31
kristianpaulyes it will wolfspraul05:31
wpwrakbefore 3), i'd like to have lekernel's opinion on PROGRAM_B_2 being driven low at ~20 mA while the board is powered05:31
wolfspraulhow come when we flash a board for the first time (nor empty), it will not load anything from nor (it's impossible because there is nothing there yet)05:32
wpwrakit probably tries to load NOR but fails (or maybe the CRC is correct and it just loads garbage :)05:32
wolfspraulcan it hang itself up?05:33
wolfspraulcan the fpga itself be stuck in a loop that always ends in a permanent reset?05:33
wpwrakcan PROGRAM_B_2 become an output ? :)05:33
wolfspraulok, so no #3 or #4 until we hear from Sebastien05:34
wolfspraulbut in the meantime we can look at another board, with the new permanent reset focus05:34
wpwrakif the same pattern appears on other boards, that would be good to know05:34
wolfspraulok we have 0x3C05:35
wolfspraulbut that one may not yet be in this state05:35
wpwrakfor all we know, 0x39 may have been ESD-fried ;-)05:35
wolfspraulif we would be hunting a rare problem (like 1 out of 100), and always poking around on the same board, then after some time I would agree and say "let's forget it until we have more boards"05:36
wpwrakwe do have photographic evidence of a 6+ V spike in a system that runs at 3.3 V and is supplied from a ~5 V supply. the supernatural is already there, on digital film :)05:36
ThihiDunno if anyone of you saw this, since I pasted this during the night. Anyway: http://kukka.siilo.fi/~kuutio/11-08-13-kissastuskausi.mkv - you guys might be interested in this. A small sample of what I do with a projector and a camera. Music has been ripped off from Boards of Canada.05:36
wolfspraulwell that may be cleared up fast05:36
wolfspraulbut anyway, we have enough boards and a problem cluster now to be sure it's not caused by ESD or other one-off phenomena05:37
wolfspraulthat's why we couldn't effectively dig in on the rc2 run (in addition to making mistakes how to handle it there)05:37
wpwrakwolfspraul: oh, i think the cluster is real. just don't know what's up with 0x3905:39
wpwrak0x39 exhibits at least two phenomena that contradict my understanding of things: 1) the spike, 2) PROGRAM_B_2 being driven low (for more than 200 ms)05:40
wolfspraulwe could take 0x3C, 0x7F, 0x61, 0x4005:42
wolfspraulI think 0x40 is erroneously set to 'available'05:42
wolfspraullet's look at 0x40 first, then we can clear that up as well05:43
wolfspraulif 0x40 is really good now, we can take 0x61 ?05:43
wolfspraulwell we have plenty05:43
wolfspraulwe just try to find a second one to support the permanent reset theory05:43
wolfspraulwpwrak: the spike supports my idea that some 'bad' event is happening that may sometimes cause lasting damage05:44
wolfsprauland for program_b_2 being driven low, I would think we find more instances of that now that we look for it, on 0x61 and others05:44
wolfspraullet's see05:45
wpwrakmy list of boards that look as if they belonged to the cluser: 0x36, 0x3a, 0x55, 0x67, 0x6d, 0x6f, 0x70, 0x77, maybe 0x7a05:45
wolfspraulall different from mine05:46
wolfspraulok let me look at your list...05:46
wpwrakyeah :)05:46
wpwrakwe can pick one from each list ;-)05:46
wolfspraulah ok05:46
wolfspraulI stay away from boards that have never rendered before05:46
wolfspraulsuch as 0x36, 0x3A05:47
wpwraki see05:47
wolfspraulyour whole list :-)05:47
wolfspraulof course it could be the same thing05:47
wolfspraulmaybe on those boards right from the beginning05:47
wolfspraulbut we risk running into one that simply has bad flash soldering or so05:48
wpwrakyes, could be05:48
wolfspraulaw: there you are :-)05:48
wolfspraulwe have plenty of new ideas :-)05:49
wpwrakaw: why did you want us to ignore http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_0x39_tp36.JPG ?05:49
wpwrakaw: and what do you think was what looks like a >= 6 V spike ?05:49
awroh, back from post office and picked 2nd shipment up, tks.05:50
awwpwrak, that one when you read it, please use divide 10X, i forgot to set the setting to X1. ;-)05:50
awso forget it05:51
wpwrakaah ! :)05:51
awand just use the second picture with two channels. sorry that.05:51
awthat second is exactly correct. ;-)05:51
wpwrakokay, makes a lot more sense then :)05:51
awalright, so how was new ideas?05:52
wpwrakwaiting for lekernel to tell us if PROGRAM_B_2 can be an output05:53
wolfspraulaw: next idea is this:05:53
wolfspraultake 0x39, power it with jtag-serial connected to your computer05:53
wolfspraulusb full-speed as always05:53
wolfspraulthen run 'jtag' manually (not a script)05:53
awwpwrak, aha...if PROGRAM_B_2 has been pulled low while powered-up? just guess, right?05:54
wolfspraulthen "cable milkymist" then "detect" then "pld reconfigure"05:54
awwolfspraul, okay05:54
wolfspraulI don't know about the two 'instruction' lines from the script05:55
wolfspraulmaybe we add those too?05:55
wolfspraulso it's05:55
wolfspraul1. cable milkymist05:55
wolfspraul2. detect05:55
wolfspraul3. instruction CFG_OUT 000100 BYPASS05:55
wolfspraul4. instruction CFG_IN 000101 BYPASS05:55
wolfspraul5. pld reconfigure05:55
wolfspraultype those commands manually05:56
wolfspraulthen check whether the board is still in permanent reset05:56
kristianpaultry middle button after just in case :)05:57
wolfspraulno confusion. that's later I think.05:57
wolfspraulyou want to try whether it boots? it first needs to survive reconfiguration...05:58
wolfspraulkristianpaul: are the two 'instruction' lines necessary?05:59
kristianpauldont know..05:59
kristianpauli dont think, but i can arge a reason now06:00
wolfspraulwe just leave them in06:01
awafter those cmd, TP37(RP#) is 236mV06:01
awd2/d3 dimly lit surely :)06:02
wolfspraulso 'pld reconfigure' does not do any magic06:03
wolfspraulno problem06:03
wolfspraulaw: I have a question about 0x4006:03
wolfspraulwhy is it set to 'available'? (I look at the wiki test results)06:03
awwolfspraul, mm..good catch. it must be I powered -on again and test rendering pass, so I marked as 'available'; this was done before you told me that don't put it like as 'available' once it has been haven d2/d3 dimly lit before.06:07
awdelete it now.06:07
wolfspraulso the 'notes' are not complete?06:07
wolfspraulok I got it already, so the board does work now06:08
wolfspraulalright, let's look at 0x3C then06:08
awno. it must be all rendering pass, so i marked available but forgot to fill some notes.06:08
wolfspraulgot it06:08
wolfspraullet's put 0x39 aside, and look at 0x3C06:08
wolfspraulpower it, see whether it boots...06:09
wolfspraulI want to find another board that stops with d2/d3 dimly lit, or cannot reconfigure, or cannot reflash06:09
aw0x3c: can reconfigure06:11
awso do same manual cmd like above?06:11
aw0x3c: TP37 surely is 3.3V now06:12
wolfspraultry a few power cycles with test software (up to crc check only)06:13
awmm....1st press middle btn then d2/d3 dimly lit now06:16
wolfspraulwell nice06:16
awTP37: now is much messy level from 1.2V to 3.3V. messy pulses!06:18
awmm...now is 3v3 and d2/d3 dimly lit is GONE.06:19
wolfspraulbut I think what you saw initially "messy from 1.2 to 3.3" and then to 3.3 and then dimly lit is gone all confirms out work so far06:20
awbad that I cant took TP3 pictures when it's pulse in unstable.06:20
wolfspraulno problem, we believe you and it's in the chat :-)06:20
awso how's the next we need?06:20
wolfspraullet's see whether wpwrak is still around06:20
awpress middle btn again?06:20
wolfspraulsure, try06:21
wolfspraulI would think it boots06:21
awsame now06:21
awi stop the scope06:22
wolfspraulsame what?06:22
awlet me take pictures06:22
wolfspraulyou pressed the middle button and then?06:22
wolfsprauloh, more 'pulses' maybe :-)06:23
awwhen d2/d3 dimly lit, TP37 has messy pulse fro 1.2V to 3.3V, many pulses variance on stays this area06:23
wolfsprauldid you press the middle button? what happened?06:25
wolfspraulif you can keep it in this state of 'messy pulses', can you measure the other test points as well?06:26
wolfspraulaw: did you press the middle button? what happened?06:27
awactually it's low level reached at least down to 1.2V06:28
wolfspraulplease describe the sequence of events there, then it's easier to come up theories06:28
awit goes into s2/d3 dimly lit after I press middle btn again06:28
wolfspraulso it did not boot06:28
wolfspraulinstead, d2/d3 dimly lit again?06:28
awnow its' in dimly lit06:29
awmaybe my prober let TP37 recovered then rised to 3V306:29
wolfspraulwell, we have to wait for more thoughts from Werner or Sebastien. I suggest you continue with regular testing and fixing across the batch.06:30
awso reset on flash chip is asserted then d2/d3 became fully OFF06:30
awthen it goes dimly lit after I press middle btn again.06:30
wolfspraulwith us finding a second board right away that may very well be in a similar or same 'permanent reset' state, I think it confirms what we found on 0x3906:30
wolfspraulright now it's dimly lit?06:31
wolfspraulcan you measure the other test points?06:31
wolfspraulTP36, INIT_B_@06:31
wolfspraulI think those, right?06:31
awnow 0x3c: TP37 keeps stable low (209mV, surely dimly lit)06:33
wolfspraulmeasure TP36, INIT_B_206:33
awTP37(flash chip reset pin), TP36 (PROGRAM_B_2)06:33
wolfspraulyes just to compare with 0x39, measure TP36 and INIT_B_206:34
aw0x3c: program_b_2 TP36 is stable 3.3V06:35
awwe need to know also from lekernel : what fpga does after pressing middle btn?06:36
awnow...i am blind though06:36
wolfsprauldid you measure init_b_2 ?06:37
awinit_b_2 is good low06:38
wolfspraulI suggest you continue with regular testing and fixing now06:38
awtp36 and tp37 is now unstable pulse together!06:38
wolfspraulwe have enough data about 0x3C (I do)06:38
awunstable (some sort like high impedance from fpga), maybe I don't know.06:38
wolfspraulI suggest you go back to the regular testing and fixing06:38
wolfspraulthe more 'other' bugs we can fix across all boards, the less likely we are to later be confused when investigating the 'permanent reset' problem more06:39
wolfspraulkeep the 'notes' column updated with anything unusual or suspicious you see with a board06:39
wolfspraulalso, from now on, I suggest if you run into any of these problems: 1) d2/d3 dimly lit, 2) cannot reconfigure, 3) cannot reflash, you measure TP36 and write the value you see in the notes column06:40
wolfspraulor maybe TP36 and TP37 - both? don't know06:40
wolfspraulmaybe both :-)06:40
awwell...too many06:41
wolfspraulok, then only TP3606:41
aw0x39 and 0x3c is good data now06:41
wolfspraulno I mean for new boards06:41
wolfspraulwhen you test them06:41
wolfsprauland if you run into dimly lit/reconfig/reflash problem06:41
wolfspraulnormally you would stop there06:41
awonce i run into failure, i write notes. ;-)06:41
wolfspraulbut now you measure TP36, and write it into the 'notes' column06:41
wolfspraulaw: I think that's a good idea, no?06:42
awwell...write firstly though06:42
awi am manually operator now to test...a little afraid of my memory to forget many boards though, not bad idea...just slow only. ;-)06:43
wolfspraulwhy forget. just go through the batch one by one, let's fix everything we know we understand.06:51
wolfspraulstarting from the easiest fixes, to the more difficult ones06:51
wolfspraulin parallel when Werner or Sebastien are back we continue with the permanent reset investigatio06:51
wolfspraulbut now back to regular testing...06:52
aw0x3c & 0x39 & 0x40 I updated results06:58
awyes, I go for others now06:58
wolfspraullet's fix all easy and simple things we already know about clearly06:58
awscope pictures linked there.06:59
lekernel"still stopped 'Bitstream length: 1484404'"08:37
lekernelplease do NOT make such reports again. instead, input the urjtag commands manually and do not use the batch file. better, enable some debug output.08:37
lekernelPROGRAM_B is not driven http://www.xilinx.com/support/documentation/user_guides/ug380.pdf08:43
wolfspraullekernel: any idea how it can end up in the permanent reset state as observed earlier?08:48
lekernelso it's confirmed? what we though was "NOR corruption" on RC3 is just permanent reset?08:48
wolfspraulwell. making the judgment is the hard part.08:49
lekernelhard? wtf08:49
lekernelis the voltage on TP37 high or low when the board fails?08:49
wolfspraulno making the judgment of whether we look at a "nor corruption" or "permanent reset"08:50
wolfspraulbecause we think about a lot of boards and even more testing data. there may be multiple bugs, or different problems on different boards.08:50
lekernelok, whatever08:51
wolfspraulearlier we did some tests on 0x39, did you see that in the backlog?08:51
lekernelon the board you are debugging right now08:51
lekernelwhat is the voltage on TP37 when it fails?08:51
lekerneland I just mean after initial power up , no booting etc.08:51
wolfspraulon 0x3C, we had pulses between 1.2V and 3.3V on TP3708:52
wolfspraulon 0x39, it was around 200mV I think, checking backlog...08:52
lekernelit should never be 1.2V08:53
lekerneland never pulse08:53
wolfspraulTP37 318mV on 0x3908:54
lekernel200mV is not correct either, that would permanently reset the flash08:54
wolfspraulare you reading the backlog at all?08:54
lekernelyes but it's quite not clear08:58
lekernelwasn't Adam supposed to measure what drives that TP37 low?08:59
wolfspraulok, seems we are stuck09:16
wolfsprauldid we answer some of Werner's questions?09:16
wolfspraullemme see...09:16
wolfspraul"waiting for lekernel to tell us if PROGRAM_B_2 can be an output"09:16
wolfspraulthe answer seems to be: no09:16
lekernelyes, it is "no"09:17
wolfspraulgood, thanks :-)09:17
wolfspraulnow, there was another one09:17
wolfspraul"high current on TP36 should only exist if: - PROGRAM_B_2 is actively pulling low (which, afaik, it never does) - the reset chip is pulling low (which is has no reason to do) - something is shorted into that net09:19
wolfsprauland it pulls low with ~20 mA09:19
wolfspraulso PROGRAM_B_2 is not the culprit09:20
wolfspraulthat leaves the reset ic or some short09:20
wolfspraulWerner's next idea was to remove diode and reset ic and see what happens09:21
lekernelok, sounds good10:18
zumbilekernel: hello! someone pointed me to talk to you about some question I had10:45
zumbilekernel: basically I was looking into converting bitstream to netlist10:45
zumbilekernel: I found a tool called 'debit' at ulogic.org, but it does not seem to be online anymore10:46
zumbilekernel: do you know where can I find such tool? or any idea on how to reverse engineer bitstream?10:47
zumbilekernel: oh! wow, thanks10:47
kristianpaullekernel: what's the behavior of fpga when never get a bitstream from nor?11:24
kristianpaulof our fpga in rc3 of course11:24
wpwrakbooting ... (sorry for my erratic napping pattern. bloody cold is messing with me :-( )11:26
wolfspraulwpwrak: wow, take good care of yourself11:27
wolfspraulyour relentless support for Milkymist One is amazing anyway, I'm flattered and feel bad that I cannot debug those bloody circuits better myself :-)11:28
wolfspraulwpwrak: Sebastien just confirmed that PROGRAM_B_2 cannot be an output, I guess that means next on 0x39 we remove the diode?11:29
wolfspraulaw: can we do another 0x39 session?11:30
wpwrak(reading backlog ... INIT_B_2 has no test point. but can be measured on R157)11:30
awyes, go on11:30
wolfspraulwpwrak: oh we had some interesting results on 0x3C, but I think they confirm what we saw on 0x3911:30
wolfspraulaw_: go back to 0x39, turn it on, tell us what happens11:32
aw_are you sure 0x39 or 0x3c(this has messy pulses)11:33
wpwrakyeah, let's remove the diode. untangle the knot a bit :)11:36
wolfspraulis it clear which one?11:36
wpwrak(the diode between reset out and FLASH_RESET_N, keep the one between INIT_B_2 and PROGRAM_B_2 for now)11:36
wpwrak(configuration document) ah, a godsent ;-) just wished it was 1/10 the size :)11:37
wolfspraulwpwrak: did you see the 3C results? voltage between 1.2 and 3.3...11:38
wpwrak(0x3c results) yeah, says "something's weird" :)11:38
wolfspraullooks similar to what we see on 0x39, no?11:38
wpwraki wonder if we have some unintended connections between things11:38
wpwrakmaybe the "fix2" rework left some trouble11:39
wolfspraulkeep in mind boards first working and then falling into this state11:39
aw_wpwrak, remove the diode which is between reset out and FLASH_RESET_N?11:39
wolfspraulaw_: wait, first turn 0x39 on and tell us what you see11:39
wpwrakaw: yes11:40
wolfsprauld2/d3 still dimly lit? voltage on tp36 ?11:40
wolfspraulwpwrak: so at least for the part of the problem that we see on a board that first works and then falls into permanent reset, it cannot be cause by some permanent short/connection on the board11:40
wolfspraulmaybe current flows the wrong way somewhere and slowly damages a part?11:41
wpwrakvoltages around 1.2-1.3 V look like some things working against each other11:41
aw_0x39: d2/d3 dimly lit, tp36- 78mV, tp37 - 238mV, init_b - 35mV11:41
wpwraki'm not sure we're seeing actual damage11:41
wolfspraulok perfect11:42
wpwrakaw_: that's after diode removal ?11:42
wolfspraulaw_: now power off, remove the diode11:42
wolfspraulwpwrak: no, before (pretty sure)11:42
wpwrakah :)11:42
aw_still have diode11:42
wpwrakthe executioner's axe don't swing so swiftly ;-)11:42
wolfspraulwpwrak: one difficulty is that the problem is known to have spontaneously disappeared before11:42
wolfspraulso even if the removal of the diode makes it go away, it may not be because of the removal of the diode11:43
wpwrakindeed. but it if it doesn't come back on, say, 0x39, that's an indication11:43
wolfspraulyes sure11:43
wpwrakbesides, we should still observe anomalies11:43
wolfsprauljust saying why I wanted the baseline11:43
wpwrakbut now the anlomalies will be in separate analog domains. easier to tame (i hope) :)11:44
aw_so now, remove diode which between reset out and FLASH_RESET_N ?11:46
wolfspraulhave we ruled out some stupid mistake on the affected boards like diodes in wrong polarity? some mistake to the circuit on those particular boards?11:51
wolfspraulsince some steps were done manually (fix2), those would need to be double-checked11:52
wpwraki think so. i've asked adam to do a visual inspection and also to check the diodes11:52
wolfspraulor would a wrong polarity diode lead to different behavior anyway?11:52
wpwrakbut in any case, a systematic search will turn up such things too11:52
wpwraklekernel: is the internal pull-up on P22 (FLASH_RESET_N) asserted during reconfiguration ?11:54
aw_0x39: d2/d3 dimly lit after removed diode(D16), tp36 - 45mV, tp37 - stable 3.3V, init_b  - 26mV11:54
lekernelyes, it should be11:54
lekernelbut if it's not, that might well be the problem11:54
wpwrakaw_: very good. FLASH_RESET_N is off the hook for now11:55
aw_wpwrak, yup11:55
wpwraknow, who's driving PROGRAM_B low ?11:55
wpwrakwe have three candidates: 1) the reset chip, 2) INIT_B, 3) divine intervention11:56
wpwrakaw_: can you please power down and test whether the diode between PROGRAM_B and INIT_B does indeed work like a diode ? (if your multimeter has a diode test, that would be the easiest)11:57
aw_wpwrak, do we need to trigger program_b again to see power on sequence?11:57
aw_wpwrak, ok11:58
wpwrak(trigger PROGRAM_B) heh, that seems to happen even without us doing anything. little gremlins are at work here ;-)11:59
aw_wpwrak, you got right, now it's not activated as a diode behavior11:59
aw_wpwrak, but let me confirm this again12:00
lekernelcrappy diodes breaking down?12:00
wpwraklekernel: maybe bad soldering12:01
wpwrakwolfspraul: or fake diodes ? ;-))12:01
wolfspraula ghost12:02
aw_wpwrak, forwarding voltage is 11.8mV, reversing voltage is -9mV12:03
wpwrakthe small voltage difference between INIT_B and PROGRAM_B suggests that they don't act much like diodes ...12:03
wpwrakhehe ;-)12:03
aw_now I am going to take apart this diode and measure it again12:03
wpwrakthey're 0R ! ;-))12:03
lekernelah, you did not take it apart?12:03
wolfspraulis there a chance we are wearing out diodes, or is that impossible?12:04
lekernelyou should not measure mounted diodes, this gives wrong measurements in most of the cases12:04
wpwrakwolfspraul: highly unlikely12:04
wolfspraulok, scratched off12:04
lekernelwolfspraul, with the small currents and voltages they are supposed to handle here, if they do wear out, they're probably counterfeit pieces of crap12:04
wpwrakwolfspraul: and i think these are quite sturdy12:04
wpwrakwolfspraul: you could probably wear them out with thermal abuse, though. should still take quite an effort, though12:05
wolfspraulcan we check the one we took off as well (between reset and FLASH_RESET_N)?12:06
lekernelwe had something weird happening on rc1. Adam reworked two video chips (reconnecting the pixel bus output correctly), which died with an internal power supply short some ~20s after power applied12:06
lekernelexact same problem on the two boards12:07
lekernelit was never explained12:07
lekernelmwalle did the exact same rework, and it went fine on his board12:07
lekernelon rc2 we have applied the change in the pcb layout, and it also went fine12:07
aw_wpwrak, yes, you analysis is right, my diode doesn't act as a diode though.12:07
lekernelI don't know what went on there12:07
wpwrakaw_: you removed it and tested it out of circuit ?12:08
wolfspraulaw_: can you check the one you took off as well?12:08
lekernelbut this looks vaguely similar12:08
aw_wpwrak, thanks you catching this. Super! so going to soldering a new one. :(12:08
wolfspraullekernel: where is the similarity?12:08
wpwraklekernel: by the way, where did you find that INIT_B needs to be pulled together with PROGRAM_B in order to have an effect ?12:08
wpwrakaw_: (solder new one) wait a minute ... try to boot without diodes first12:09
wolfspraulaw_: wait. shouldn't we check the diode we removed as well?12:09
aw_wolfspraul, the D16 I took just measure too, it's okay. ;-)12:09
lekernelthat some semiconductor device shorted itself some time after a rework12:09
lekernelwpwrak, we shouldn't need to pull PROGRAM_B low, but the initial PCB layout has the trace, and it's additional work to cut it12:09
lekernelplus it should not hurt12:09
Fallenoulekernel: are you using gcc 4.5.3 or gcc 4.5.2 ?12:10
wpwraklekernel: i was more thinking of pulling only PROGRAM_B, without INIT_B12:10
wolfspraulaw_: boot without diodes first now (see werner's msg)12:10
Fallenouwill try a diff a out git repo (for rtems) and their cvs head, to try to understand why zlib compiles in their cvs head and not in our git12:10
lekernelwpwrak, the xilinx doc says you should use INIT_B to delay configuration12:10
Fallenou-a out+on our12:10
lekernelbut it's not very clear12:11
wpwraklekernel: they seem to say the same about PROGRAM_B12:11
aw_since the diode I have to solder its two terminals, so yes, it was not acted. bad...i should have measured its forwarding voltage.12:11
wpwraklekernel: e.g., page 51 (picking a random one), before the section title12:11
lekernel"Before the Mode pins are sampled, INIT_B is an input that can be held Low to delay configuration. "12:11
wpwrakoh, that's actually the place that talk about NOR. lucky coincidence ;-)12:12
lekernelah, yes, page 51 says both approaches are correct12:12
wpwrakso maybe we can leave INIT_B out of the mess. that would help in general12:13
aw_wpwrak, yes, without (two diodes), now it boot up and rendering.12:13
wpwrakthen we only need to coordinate PROGRAM_B and FLASH_RESET_N12:13
Action: wpwrak does a happy little dance12:14
Action: lekernel is sick of those broken components12:14
wpwrakwelcome to the wonderful world of hardware ;-)12:14
lekerneli've never seen anything this bad12:15
wolfspraullekernel: yes. maybe you should do more software and less hardware :-)12:15
wolfspraulaw_: I think next step is to put 2 good diodes back on, and see whether it boots.12:15
wolfspraulwpwrak: agreed?12:15
aw_wpwrak, init_b = 1.2V, tp37 = 3.3V, tp36 = 3.3V12:15
wolfspraullekernel: if you build something one day, Werner and I will volunteer to help you. no worries :-)12:15
lekernelI will admit I have a relatively limited experience with manufacturing, but from what I've heard and seen, this project is by far the one which is hit the hardest by broken/counterfeit components12:15
wpwrakwolfspraul: no. i'd get rid of the diode connecting INIT_B. lekernel: aqgreed ?12:15
wolfspraullekernel: no it's not.12:16
wolfspraulthe one thing we could do better is to throw more money and people at the problem.12:16
wolfspraulin hardware parallelism works quite well, unlike in software (MMM)12:16
wpwrakaw_: lovely12:16
wolfspraulthat we cannot do, it is beyond my capabilities12:16
aw_wpwrak, you did a happy little dance now? ;-)12:17
aw_but really sorry that still this was my fault on diode soldering. :(12:17
wolfspraullekernel: yes, just wanted to say. why broken components?12:17
wolfspraulmaybe soldering12:17
wolfspraulplus we have over 20 boards12:17
wolfspraullet's see12:17
wolfspraulso what is the solution now? only 1 diode now?12:18
wpwrak(parallelism) indeed. adam is our bottleneck here. and will all the workload, he probably doesn't even have time to think about those problems himself, so we're wasting another analyst12:18
lekernelyeah, and make sure that one diode will not go bad12:18
wpwrakaw_: (dance) well, figuratively. i'm too lazy to get out of my chair :)12:18
wolfspraulok only one diode now12:18
wolfspraulaw_: have you put that one (and working) diode back on12:19
wolfsprauldoes the board boot?12:19
lekernelbut didn't we add INIT_B to fix some intermittent no-configuration problems initially?12:19
wpwrakaw_: was it soldering or is the component (diode) bad ?12:19
aw_wpwrak, yup..good question. I can't realized this is shorten by bad diode or my soldering though.12:20
wolfspraulwait, slow down12:20
wolfspraulI still have my question open12:20
wolfspraulaw_: did you put 1 diode back on? it boots?12:20
wpwraklekernel: as far as i remember, when we ran into the first set of reset troubles, you said that you had discovered in the xilinx docs that PROGRAM_B alone was not enough, and then the change was made. but i don't remember any observation in the real circuit triggering this.12:20
aw_wolfspraul, i haven't put 1 diode back on12:20
wolfsprauland also what lekernel just said - didn't we add INIT_B to fix something?12:20
wolfspraulaw_: ok, put that back on12:21
wolfspraulbring the board into the state that we believe it is perfect12:21
lekernelactually, if INIT_B is not needed, it becomes a very simple rework12:22
lekernelcompared to the initial RC3 schematics, basically install C238 and change two resistors12:22
wpwrakbtw, i only realized yesterday that the "flash length = 14xxxx" or such message meant that the download hadn't even started. all the time, i had somehow imagined that it had stopped somewhere in the middle.12:22
aw_and those two diodes are took apart now and rendering12:23
aw_so later if need to soldering diode, i can do measure if it's good before soldering then measure it again after soldering.12:23
aw_from now on, any diode i took apart, I won't soldering back. will use a new one.12:23
wolfspraulof course12:23
wpwrak(simple rework) after doing the complicated one ;-))12:23
wolfspraulso wait12:23
wolfspraulshould Adam put the 1 diode back on?12:23
wolfspraulwhat is the best design we have in mind now?12:24
wpwrakthe one between PROGRAM_B and FLASH_RESET_N, yes12:24
wolfsprauland now lekernel wants to install C238 and change two resistors?12:24
wpwrakC238 should already be there12:24
wpwrakthe resistors should already have been changed12:24
wolfspraulah ok12:25
wolfspraulsorry too many details I lost track12:25
wpwraknow it's really just a matter of removing the INIT_B diode. and the wire, if you want12:25
wolfspraulso... aw_ one diode back on and let's see whether it boots12:25
wolfspraulwe should approach this carefully, we have over 20 boards with reconfigure/flash/leds dimly lit problems12:26
wolfspraulif they all go down to a non-working diode, well, great12:26
wpwraklekernel: how confident are you about not needing R60 ? (RP# pull-up)12:26
aw_wpwrak, wait you said removing INIT_B diode?12:26
wpwrakaw_: yes. the one that you found to be bad12:26
wpwrakaw_: so instead of two diodes, we now only use one12:27
aw_wpwrak, so keep D1612:27
lekernelthe Xilinx docs says clearly the FPGA has pull up resistors, and we observe them with the dimly lit LEDs12:27
lekernelso I'm pretty confident about that12:27
lekernelas long as there are no glitches, though12:27
aw_wpwrak, just keep D16? how about R60?12:28
wpwraklekernel: i don't question that the FPGA has them. just whether we're sure it uses them all the time :)12:28
aw_wpwrak, the current 0x39 has R60 10K12:28
wpwrakah yes, that's from an experiment earlier tonight12:29
wpwrakaw_: let's wait for lekernel's verdict12:29
aw_so keep R60 or remove it then boot again?12:29
aw_wpwrak, hmm..okay12:29
lekernelit should use them all the time, yes12:30
wpwraklekernel: so away with R60 ?12:30
lekernelyes, that should not be needed12:30
lekerneland it increases the load on the reset IC, which has limited current capability12:31
wpwrakaw_: so, no R60 then12:32
aw_so i go for: 1. solder D16 back 2. remove R60 3. remove diode between program_b and init_b12:32
wolfspraulonly on 0x39 for now12:33
wpwrakadam needs a nocturnal twin brother who could then take over power-cycling 0x39 a gazillion times to see if it really survives :)12:34
wpwrakbut i think "we got him"12:35
wpwrakwolfspraul: btw, good cluster analysis connecting reconfig failure with usb-jtag12:36
wolfspraulwell. I'm not so sure about this yet.12:37
wolfspraulall of these problems and troubles just because of bad soldering?12:37
wpwrakcould be. we don't know yet what exactly happened with those diodes12:37
wpwrakit's unusual for diodes to fail this way12:38
wolfspraullet's see12:38
wolfspraullet's assume 0x39 boots now12:38
wolfspraulthen what? we do 20 render cycles (without CRC checks in between, just the 30 second rendering and power cycle)12:39
wolfspraulassume that works well too12:39
wpwrakyeah, do a few cycles, check the crc at the end. do a reflash with usb-jtag, just to confirm this is fine, too12:39
wolfspraulon all boards with flash/dimly lit/reconfig problems, we remove the diode between program_b and init_b? and check the other diode (onboard, as much as that is possible)?12:39
wolfspraulbecause now we say we don't even want the program_b/init_b diode anymore?12:40
wolfspraulso instead of checking it, we just remove it12:40
wolfsprauland if that fixes those boards, or a large number of them, then we conclude this to be a design improvement and remove the diode between program_b and init_b on all 90 boards?12:40
wolfsprauldo I roughly understand this right?12:40
wpwraksounds good to me12:41
wolfsprauland we also check (again, if practical onboard) the correct functioning of the remaining diode on all boards12:41
wolfspraulbefore lekernel said they cannot be checked while mounted12:41
wpwrakif the diode itself has issues, we may also need to check the other one12:41
wolfsprauloh sure12:41
wolfspraulso - can it be checked onboard or not?12:41
wpwrakdepends :)12:41
wolfspraulor can a check at least give some indication?12:42
wpwrakyou can inject a little probe current and see what happens12:42
wolfspraulI somehow doubt that all problems will magically go away by removing and checking diodes.12:42
wolfspraulbut ok, maybe they will :-)12:42
wpwraksometimes the diode is more or less isolated, in which case you can test i in-circuit. sometimes other things in the system will happily act in its stead, and all you get is confusion.12:42
wpwrakthe cluster may go away :)12:43
wolfsprauloh I'm sure some isolated cases will pop up12:43
wpwrakdo we know what specifically caused the high current consumption of some boards ?12:43
wolfspraulbut my main concern is to finally have a known-good design and test I can 100% trust, so that the board won't fail a few tests after I stopped testing12:43
wolfspraulok I'm out for about 30 min, reading backlog when back12:44
wpwrakme too :)12:44
wolfspraultake enough rest there, and thank you so much for all your help!!12:45
wolfspraulaw_: I'm back in 30 minutes12:45
aw_wolfspraul, okay12:46
aw_wpwrak, shall we change back to high speed though?12:57
lekernelaw_, no, stay in full speed12:59
lekernelwe don't want to take care of any USB/JTAG issues right now12:59
aw_now i start to count again13:00
aw_5 already13:00
aw_crc check between rendering 30 seconds13:02
wpwrakwolfspraul: np :) it's fun to finally kill those gremlins :)13:09
aw_i didn't reflash 0x39 again13:10
wpwrakaw_: if the CRC is right, it's good :)13:10
aw_now 10 times already with crc checks between rendering.13:11
wpwrakaw_: at the end, we can do some reflash tests, to confirm that this works, too13:11
aw_yes i watched it13:11
aw_wpwrak, okay13:11
wpwrakone mystery remains: why did the board ever work, with that evil diode misbehaviour ? points to a somewhat scary failure model. but we'll see ...13:14
wpwraklekernel: when the M1 is fully up and running, is there some easy way to force a flash reset ?13:14
aw_yes, i also didn't realized though it's in success before. :(13:15
aw_wpwrak, gui has a "reboot" btn which can let flash reset. ;-)13:18
aw_wpwrak, i tired to capture tp37 about reset waveform to you before. ;-)13:19
wpwrakhmm, what i'm looking for is a flash reset without system reset. to test whether the diode between FLASH_RESET_N and PROGRAM_B is okay. after all, it's the same component ...13:20
wpwrak(test in boards that seem "okay")13:21
wolfspraulI think Adam can remove the crc check between each render cycle13:29
wolfspraulthat wasn't part of the testing before, and doesn't imitate user behavior either13:30
wolfspraulwe can do the crc test after 100 render cycles13:30
wolfspraulwpwrak: what do you think?13:30
wpwrakit's probably safe to do so. doesn't hurt to have it either, though. you never know what you may find ;-)13:30
wolfspraulit costs time13:30
wpwraknot that'd expect to find anything13:30
wolfspraulso remove13:30
wpwrakyes, it does :)13:30
wolfspraulI don't believe in the NOR corruption story anyway13:30
wolfspraulso one test after 100 pure render cycles is enough13:31
wpwrakseems that was rc2 only13:31
wolfspraulif that really shows a crc problem I eat my words :-)13:31
wpwrakrc3 has new excitement to offer :)13:31
wolfspraulaw_: you can remove the crc check in between render cycles13:31
wolfsprauldo the render cycles only13:31
wolfsprauland you can do _ONE FINAL_ crc check after the last render cycle13:32
aw_okay...final crc check at last13:32
aw_so now those values: D16 still there, R30 = R157 = 10K, C238 = 220pF, removed R60 and program_b/init_b diode13:35
wpwrakaw_: do you have the link to your fix2 schematics ? (with the component numbers)13:37
aw_wpwrak, http://en.qi-hardware.com/wiki/File:M1_rc3_hw_fix2.png13:38
aw_need to modify this later if we surely this13:38
wpwrakthanks !13:39
wolfspraulaw_: wpwrak please let's give this a new name then, like fix2b, or anything, but it must be a new name13:40
wolfspraulnot fix3 either because we had that already13:40
wpwraki think we went up to "fix4" ;-)13:41
wolfspraulI propose fix2b13:41
wolfspraulor fix2a, but somehow I like fix2b better13:41
wpwrak2a, 2b seem to be available13:41
wolfspraulok so the new one is fix2b?13:42
wpwrakfind with me13:43
wolfspraulI think we can already say that 0x39 is fine now (with fix2b applied)13:43
wolfspraulas a next step, I propose that adam applies fix2b to a number of other boards and we look at the results13:43
wolfspraulfor testing, if he can make it to the render cycles, he should do 10 full render cycles, but without crc checks in between (maybe one at the end is not bad)13:44
wolfspraulso I try to select a list of boards now for fix2b, my proposal13:44
wpwrak(fix2b to other board) yes, do the cluster (or part of it)13:44
wpwrakwe still need to have an idea of what went wrong with the diode13:44
aw_mm...i need to note fix2b in .ods now. phew~13:45
Fallenoulekernel: our rtems git repo has a different cpukit/zlib/zconf.h.in than the one in their CVS head, we don't have the definition of z_off64_t but they have it13:46
wpwrakcould be: 1) bad soldering (short is outside the diode), 2) component in a constant bad state (which may itself have variations), 3) component degenerating13:46
wpwrak3) would be a problem, because we then can't be sure D16 won't act up in the firld13:47
Fallenoulekernel: I guess we just need to sync our zconf.h.in with theirs, will try to do a patch for that13:47
lekernelFallenou, if it was changed recently, it should just be a matter of git-cvs update then13:47
Fallenoulekernel: well you can try, or we can just cherry pick this file13:48
wpwrak2) has two branches, 2.1) component experienced excessive stresses in rework, 2.2) component arrived at rework in a bad state (fake, production error, box left in the sun, etc.)13:48
wpwraklekernel: when the M1 is fully up and running, is there some easy way to force a flash reset ? (without resetting the whole system)13:49
lekernelnot at the moment13:49
wolfspraul0x32 0x34 0x39 0x3A 0x3C 0x40 0x48 0x54 0x55 0x5C 0x61 0x63 0x6B 0x6C 0x77 0x7A 0x7D 0x7F 0x8513:49
wolfspraul19 boards13:50
wolfspraulthey all have a history of d2/d3 dimly lit, cannot reflash or cannot reconfigure. some of them worked before, some not. none have passed all tests.13:51
wpwraklekernel: P22 (FLASH_RESET_N) is driven high when the M1 is up ? or just pull-up ?13:51
wolfspraulI think we should apply fix2b to those 19 boards, then look at the results13:51
wpwrakwolfspraul: another thing to look for: boards that never had d2/d3 dim but that failed USB JTAG. (if there are any with this combination)13:52
wolfspraulsomehow I still cannot imagine all this going back to 'bad' diodes13:52
wolfspraulwpwrak: what do you mean with "failed usb jtag"?13:52
wpwrakthat flashing the NOR got stuck13:53
wolfspraulyes they are in this group13:53
wolfspraulI threw them together with the ones that worked before and then failed now13:53
aw_wolfspraul, so that's the next step on 19 boards firstly to apply fix2b?13:53
wpwrakok, perfect13:53
wolfspraulaw_: ok let's be precise13:54
wolfspraulfirst we maintain a production focus13:54
lekerneldriven high13:54
wolfspraulaw_: you are testing 0x39 now, if 100% is fine and pass, it goes to 'available' state13:54
wolfspraulto be safe, you can write "avail - fix2b" :-)13:55
wolfspraulso we remember that it has a fix2b applied13:55
wpwraklekernel: hmm, then we either need a way to command a NOR reset. or see if the diode can be tested in-circuit.13:55
wolfspraulaw_: have you finished 0x39 ?13:55
aw_wolfspraul, 0x39 was tested by test image successfully just rendering failed then13:55
wolfspraulrendering is part of the full test program13:55
aw_wolfspraul, yes. so now I fill it as 'avail - fix2b'13:56
wpwraklekernel: that is, if we even care to separate FLASH_RESET_N from PROGRAM_B :)13:56
wolfspraulhow many cycles did you do?13:56
aw_30 times only13:56
wolfspraulah ok13:56
wolfspraulyes that's enough13:56
wolfspraulso yes, I propose to work on those 19 boards I listed13:56
lekernelwe have to separate flash_reset from program_b; the fpga does reset the flash on a software reset13:56
wolfspraulin this way:13:56
wpwrakwait ... 0x39: final crc check and then reflash via USB-JTAG13:57
wpwrakjust to confirm that all is well13:57
wolfspraul1) first, make sure there are no other known bugs on the boards (like usb)13:57
wolfspraul2) apply fix2b, and also check that the remaining diode is good (if possible)13:57
wolfspraul3) reflash and run test software and run 10 render cycles (only 1 crc at the end)13:58
aw_wpwrak, 0x39 30 times crc is okay13:58
wolfspraul4) hopefully set them all to "avail - fix2b"13:58
wpwraklekernel: and it wouldn't be happy if the sw reset also causes a reconfig (?) ... or at least we don't want to tempt fate13:58
wolfspraulaw_: no wait13:58
lekernelno, software reset shouldn't reconfig13:58
wolfspraulWerner also wants to reflash again13:58
wolfspraulso just run reflash_m1.sh13:58
wolfspraulthen boot once to test that it renders, then done13:58
aw_wolfspraul, okay, let's reflash it again. and check crc again and rendering. ;-)13:59
wolfspraulok, perfect13:59
wpwraklekernel: okay. maybe we can just probe sw reset. that should be clear enough evidence about the diode's health.13:59
lekernelsw/flash reset stays asserted when the 3 pushbuttons are held, btw14:00
aw_0x39: reflashing...14:00
wpwraklekernel: if sw reset did cause a reconfig as well, would this be easy to notice from the outside ? (without scope, just looking at the M1)14:00
wolfspraulbtw, do we still want to do the 4.4V reset ic rework?14:00
lekernelyes, it would turn off14:00
wpwraklekernel: ah, excellent14:00
lekernelwolfspraul, if the current solution works, then no because it takes time14:01
wpwrakwolfspraul: i think it makes sense, because the current reset solution does not guarantee that all the rails are good14:01
wolfspraulI think if extensive testing shows that everything is stable, at least for myself I don't need the 4.4V reset ic rework only because that makes the circuit better conform to the datasheet voltages.14:01
wpwrakwolfspraul: although i don't know if you want to rework or just rc4 :)14:02
wolfspraulthose are separate things14:02
wolfspraulfirst I want to make an rc3 of good quality that I can support14:02
lekernelafter those boards are out, you can try. but please, not before.14:02
lekernelenough delays14:02
wpwraki concur14:02
wolfspraulif the only reason is that we noticed an 'out of spec' situation, that's not enough14:02
wolfspraulsorry but the chip has to handle that :-)14:02
wolfspraulsince our testing did not show problems14:03
wolfspraultesting results win here, imho14:03
wpwraki would also prefer not to have it as a rc3 rework, because it introduces the risk of bridging 3V3 and 5V14:03
wpwraks/as a/as a general/14:03
wolfspraulso I am still cautious about this whole fix2b and diode magic14:03
wolfspraulbut we see14:03
lekernelyeah, that too14:04
wolfspraulI need solid test results then we can start selling14:04
wpwrakso my proposal would be to rework one board with 4.4 V at a suitable time, confirm that this doesn't wake any gremlins, and then make it an rc4 feature14:04
wolfsprauloh you bet14:04
wolfspraulwe need to look at the gates and second reset ic anyway, for rc414:04
wolfspraulbut that is separate from finishing rc314:04
wpwrakyes :)14:05
aw_reflashed successfully14:05
wpwrakabout the diodes .. where do they come from ? friends in shenzen ? :)14:05
wpwrakaw_: champagne time ! ;-)14:05
aw_CRC is okay14:05
wolfspraulask Adam about source, I'm not sure whether they are in the bom/wiki14:05
wolfspraulbut don't always blame the source, you know how many reasons for problems there can be (you listed some above yourself)14:06
wolfsprauland amazingly, hello murphy, it's always the unexpected one that hits you, no?14:06
aw_rendering done...14:06
wolfspraulok, enough, 0x39 is 'avail - fix2b'14:07
wpwrakyes, but it still seems odd. adam's visual inspection didn't show any solder bridges. and he measured after unsoldering, which would further remove any bridges (or make existing ones easier to spot)14:07
aw_wpwrak, this diode is the one that BEN used, BEN was produced in China14:07
wolfspraulaw_: did you see the plan #1 - #4 for the 19 boards I selected?14:07
wpwrakand diodes don't overcook easily. or degrade just like that.14:07
wpwrakheh ;-)14:07
wolfspraulaw_: maybe we get new diodes one of these days ;-)14:08
wpwrakdo we have any diode problems in the ben ? :)14:08
aw_but this part was original designed by Taiwan company though...so i got them while i producing AVT2. ;-)14:08
wpwrakmakes me think of charging problems ...14:08
wolfspraulanyway pinpointing the real root cause is difficult14:08
wolfspraulaw_: plan! :-)14:08
wolfsprauldid you see my steps #1 - #4 above?14:08
wpwraki'm worried about D1614:09
aw_wolfspraul, yes saw #1 - #414:09
wolfspraulI propose this for the 19 boards I selected14:09
wpwrakwell, if what happens if D16 transmutates into a 0R is just that a "sw reset" powers down, that won't be a catastrophic failure. so this could be considered an acceptable risk14:10
wolfspraul0x32 0x34 0x39 0x3A 0x3C 0x40 0x48 0x54 0x55 0x5C 0x61 0x63 0x6B 0x6C 0x77 0x7A 0x7D 0x7F 0x8514:10
wolfsprauland Werner is right - how do we make sure D16 works well?14:10
wolfspraulaw_: any ideas?14:10
wpwrakso all that would need to be done about D16 is to test whether it works now (procedure TBD), and if it does, go ahead. else, replace, etc.14:10
wolfspraulcan we order new diodes locally in Taipei? (i.e. tomorrow)14:10
aw_wpwrak, how about I measure D16's forwarding / reversing voltage while I test those 19pcs firstly14:11
wolfspraulit seems Werner and Sebastien think that is not possible or worthless14:11
wpwrakaw_: can you measure D16 in-circuit ?14:11
aw_wpwrak, yes. i just checked in-circuit with D16 on 0x3914:11
wpwrakokay. then that's probably good enough.14:12
wpwrakwe can do fancier tests, but they also have more moving parts.14:12
aw_wpwrak, but I do really don't know why 0x39 have passed in power-on sequence, since I measured them before I reflashed after first time reworks14:13
wolfspraulyou measured both diodes earlier?14:13
wpwrakaw_: yes, the whole thing is very strange14:13
aw_wolfspraul, but fro your analysis above is that i could much probably let diode like as short enough while soldering14:14
aw_sorry to wpwrak14:14
wpwrakaw_: could the diode have experienced mechanical stress from the wire going around the board ?14:14
wolfspraulI think the next step is fix2b on those 19 boards14:14
wpwrakwill the wire also be removed ? (as part of fix2b)14:15
wolfspraulof course we are not suicidal. if after 3-4-5 we find out it's not right, we pause to think.14:15
wpwrakhehe :)14:15
wpwraki hear a "not yet" :)14:15
wolfspraulwpwrak: wire removed? that's not clear?14:15
wolfspraulaw_: will be wire be removed or not?14:15
aw_wpwrak, so i would think it's a component degenerating by my soldering its two terminals(one is program_b soldering, the other is init_b soldering), so TWICE soldering on diode. ;-)14:16
wolfspraulI thought that would be so clear it's not worth mentioning, now Werner is asking :-)14:16
aw_wolfspraul, I'll remove wire too.14:16
aw_wpwrak, diode is in reel I have on hand now14:17
wpwrak(wire) alright. no FM antenna ;-)14:17
wpwrakaw_: did you solder all those diodes ? or did they do some of them at the SMT fab ?14:17
wolfspraulI think slowly I can start as a daredevil PE, maybe in China.14:18
wolfspraulI wouldn't hesitate to keep that line running...14:18
wolfspraulhe he14:18
wolfsprauland over time I might even find out a bit about all this strange soldering and circuit stuff14:18
aw_wpwrak, D16 was mounted by SMT factory, I soldered all program_b/init_b diode. ;-)14:18
wolfspraulafter some years in China maybe I can upgrade to Taiwan14:18
wpwrakaw_: maybe your soldering iron is running too hot ?14:18
wolfsprauldon't mention that14:18
wolfspraulit's probably on max14:19
aw_wpwrak, set 325 degree14:19
wpwrakpheeew ....14:19
aw_my max can be 42514:19
wpwrakyeah, i sometimes go a lot higher too. 370 C if a component is really acting up14:19
aw_but this you know , soldering in even less than 1 second on diode terminal. ;-)14:19
wolfspraulmy 2 hands cannot count the number of foreigners that came into our Taipei labs and eventually to me complaining that they ruined their boards because they use the irons with 'crazy hot' settings that the locals had flying around there...14:19
wpwrakyeah, if you're quick, then hot should be fine14:20
aw_wpwrak, well...not to explain though...it's real great that you caught this, so tomorrow, i'll still measure it's forwarding/reversng voltage in-circuit after soldering.14:21
wolfspraulaw_: I think we have a solid plan for tomorrow14:21
wolfspraulafter the first few boards, we double-check the results14:22
aw_wolfspraul, we may classify them tomorrow later, you know their failures are different, but we can get results tomorrow. ;-)14:23
wolfspraulyes but I grouped carefully14:23
wolfspraulthose 19 should be interesting14:23
wolfspraulI do not expect all 19 to work14:23
wolfspraulbut I want to see how far this fix2b can take us14:23
wolfspraulsince we are planning to apply it to all 90 boards (!)14:23
wolfspraulaw_: if you see anything wrong with the plan, correct it14:26
wolfspraulif are much closer to the real problem14:26
wolfspraulfor example if you want to finish the usb fixes first, do it14:26
wolfspraulyou must keep a calm head and overview...14:26
wolfspraulbtw, I find it amazing that we first thought we need this diode (and long wire), but now it seems we don't?14:29
wolfspraulhow is that possible? are we sure we don't need it? :-)14:30
aw_to apply fix2b, I'll select them to do whole all items again to make sure my removal of diode and wire are good14:30
aw_also need to clean after reworks though.14:30
aw_so far now no found steps wrongly in your steps14:31
aw_but I would refill those results in whole one row. hope these boards are good news tomorrow.14:32
wpwrakwolfspraul: i think the long wire was lekernel getting lost in the twisty little maze of xilinx documentation, and the place in the docs that most specifically refers to this function states reasonably clearly that we don't need to worry about init_b. but let's see if sebastien changes his mind :) if xilinx docs are inconsistent about this issue, we may still need to do something. but for now, it seems that init_b (as of fix2) doesn't ne14:36
wpwraked to be connected.14:36
wolfspraulok got it - thanks!14:37
wolfspraulwell then, tomorrow is another interesting day in rc3 history14:37
aw_wiki 0x39 notes updated14:38
aw_wpwrak, thanks for your great helps tonight. ;-)14:38
wolfspraulbtw - ALL PARTS are now in Taipei!14:39
wpwrakaw_: thanks for doing all those experiments ! :)14:39
wpwrakwhee ! :)14:39
wolfspraulall accessories, box, labels, cases, leaflet, stickers, everything14:39
wolfsprauldoesn't make Adam's life easier unfortunately14:39
wpwrakhenceforth, August 16 shall be celebrated as convergence day in the empire of Qi :)14:39
wolfspraulwait wait14:40
wolfspraulI want to see this in the test results14:40
wolfspraulwhat I see there now is still a big mess, and some hope14:40
aw_go on14:41
wolfspraulno that's all14:41
wolfspraul'wait wait' for Werner's celebration14:41
aw_okay. ;-)14:41
aw_thanks again and night!. ;-)14:42
wpwraknaw, celebrate convergence day today, maybe diode day tomorrow :)14:43
Fallenoulekernel: I copied CVS HEAD cpukit/zlib/zconf.h.in to a fresh git clone of milkymist rtems, it builds properly, do I commit & push ?15:14
lekernelno, I will try a proper CVS upgrade before15:14
Fallenouok, should solve the issue15:14
lekernelif it does not I will use your patch15:14
Action: lekernel is done writing a milkymist article for xcell20:54
lekernelsince it seems open source people do not care about/are afraid of fpga's, let's see if fpga people care about open source *g*20:57
rohlekernel: the opensource people care about fpgas but are not motivated to fight against windmills20:57
kristianpaulthey just care abou their fancier IDEs and writing in vhdl ;)20:58
kristianpaul(commented biased from my side of course)20:58
lekernelroh, ?20:59
lekernelyou mean the proprietary tools, right?21:00
kristianpaulalso i wonder what they actually scared of, i mean, i had heard floss related people talking about opencores and openscarc as the path for "fpga freedom"21:00
lekerneljust fucking do them21:00
lekernelGCC, for all its faults, was still great work for its time21:00
rohlekernel: opensource means open toolchains. without that you need open docs to write some. thats why opensource is great on basically all cpus/soc with 'available' (not neccessary fully open) documentation and not on stuff you need to reverse first21:00
lekerneloh, altera published tons of stuff lately21:01
rohsee nvidia drivers. same problem. without open docs/specs supports sucks and isnt anywhere near 'production grade'21:01
kristianpaulroh: but thats a mental barrier you dont need a floss compiler to start coding or doing something21:01
lekerneland for xilinx you have xsl21:01
rohkristianpaul: wrong.21:01
kristianpaulwhy? and in wich part21:01
kristianpaullook mm1 soc21:01
kristianpaulyes it uses XST, but you can see testbench uses cver and iverilog21:02
rohkristianpaul: sure its kinda mental, but as somebody who worked with commercial environments and or dependent on parts of it, i can tell you: never ever again. not worth my lifetime.21:02
kristianpaulstep by step21:02
kristianpaulroh: my first words were about IDEs remenber? :)21:02
lekernelcan we stop here? if you want free FPGA tools, then write them. period.21:03
rohkristianpaul: so my point is: people WILL and DO opensource in anything which makes them able to solve their problems. using binary toolchains is a showstopper. ide's do not count.21:03
kristianpaulfpga = hardware thats scare more than one for sure :)21:03
rohlekernel: sure. give me all the needed specs and docs and a warranty that i do not need to start over when xilinx has a bad morning and does a new chip with everything different.21:03
kristianpaulno roh , thats jsut conding practives21:04
rohkristianpaul: no. i think thats false. people are not scared by hardware at all.21:04
kristianpaulpracices, as whe you mix you code with dark/propietary libs21:04
kristianpaulcould be--..21:04
kristianpaulmay be they need a kick start?21:04
kristianpaulmore tutorials, friendly people and such21:05
kristianpaulas Fedora in software side i mean21:05
kristianpaulwell, just another guess..21:05
lekernelroh, http://rapidsmith.sourceforge.net/, altera quip, debit, etc.21:05
rohkristianpaul: my point is: in MY (and propably most other opensource peoples) perspective (which comes from experience) its not helping but slowing down development if your tools are either broken, closed, costly or badly documented.21:06
rohlekernel: and thats completely free and production grade? can you build a mm1 with these tools?21:06
kristianpaulwell i always heard people saying bad words about gcc and still a sucess :)21:07
kristianpauls/sucess/been used21:07
lekernelroh, you asked about info about fpga internals. so here they are.21:07
kristianpauli think zumbi too ;)21:08
rohmy point is: opensource people USE opensource tools to create more sw/hw . they do fix bugs in tools here and there but they usually are not motivated enough to do N projects but ONE. means developing a toolchains is not their interrest or something which they find interresting.21:08
kristianpaulWas a small discusion last day about difference bitween bitstream from different vendors21:08
rohlekernel: i did ask for DOCUMENTATION. not some weird java tool.21:08
kristianpauldocumentation about?21:08
lekernelthen they should stop using x86 CPUs, Intel DRAM controllers and what not :-)21:08
rohdont get me wrong. i try to explain the whys and not the 'could be done's21:09
lekernelwell I already have had that discussion. it's boring. free tools = jfdi.21:09
rohlekernel: x86 is actually quite well documented and understood (compared to the different fpga archs)21:09
rohlekernel: yes. and you need to understand that most b21:10
rohpeople are NOT willing to waste their lifetimes writing compilers and such. thats a VERY small number of people who find that interresting at all.21:10
kristianpauljfdi = just find who can do it ;)21:11
rohits boring technology which is neccessary but not somehting to use your time on for most. its a tool. like a wrench. its there. use it. if you need to buy a complicated one or expensive one you will use a screw which can use the free and well known tool and not the expensive one.21:11
kristianpaulroh: is really unfair compar and asic (x86) with a FPGA21:12
rohkristianpaul: its not about fair. its about reality.21:13
rohkristianpaul: if you can solve your problem with an fpga or some soc with gcc support, people WILL choose the latter. even if the soc itself is blackbox.21:13
rohas long as there are interface specs/docs people are fine with that.21:14
kristianpaulyes, of course (solve problems)21:14
lekernelthere are interface specs: the fpga does what the standard verilog code tells it to do :-)21:14
rohlekernel: well.. only with things outside the hw (binary tools)21:14
Action: kristianpaul nocks lekernel 21:15
lekernelthat's not fundamentally different from a CPU scheduler or a DRAM control algorithm21:15
kristianpauli disagree last words from you roh , as for example you can have basis plaform to start with21:15
kristianpaulis my point about comparing fpga with asics21:15
kristianpaulat the end you need a hardware that works21:16
kristianpaullike coming mm1 rc3 it seems :)21:16
rohalso its a question of complexity. an fpga costs you not only a lot of money and extra (complicated) tools and code/thinking it also needs quite some overhead to work.21:16
kristianpaulan yes as wpwrak pointed before, and floss sinthesis may blow out lots of barriers21:17
kristianpaulbut i think still way to do around verilog, tests benches, automated soc buils scripts?21:17
kristianpauland lots of other fields21:17
lekernelin this project, the fpga costs less than half of the case21:18
kristianpaulroh: lazy! ;)21:18
rohon a typical mcu nowadays you need mostly caps and a powersource. they can even run without crystals and such. have internal flash, easy to use isp/debug possibilies and some decent amount of ram. fpgas still feel like the 8051 times of mcu.21:18
kristianpaulroh: just kiding of course :)21:18
kristianpaulall take time, and yes fpga world have its own learning curve :)21:18
rohkristianpaul: heh.. i am just not interrested enough by stuff i find boring details when there is so much more interresting real problems to solve out there.21:18
kristianpaulsure, is a respectfull position21:19
rohlekernel: the chipcost itself is negligable. the cost for development of code and support is so much more than for a mcu that whoever can, WILL avoid using one.21:19
rohnegligable atleast for our amounts of sales. can change if you sell 5 digits or more.21:20
kristianpaulwolrd is already done why cares to do more, when there is a big sea to navigate :)21:20
wpwrakinteresting discussion. is there a topic/direction or is this just the IRC equivalent of a bar brawl ? ;-)21:20
kristianpauli think last :)21:20
kristianpauland is monday :)21:21
wpwraklekernel: (jfdi) i hope that doesn't mean you've lost interest in continuing with llhdl21:21
rohwpwrak: *g* .. i am not trying to dicuss something. i am trying to explain a pov i seem to share with loads of other devels from the mcu and opensource side.21:21
kristianpaulor event interesting in continuing milkymist project?? :-(21:21
lekernelfuck all those developers, that's why I'm stopping hacker conferences now and write for xcell instead21:21
lekernel(among other things)21:22
rohso its not 'the opensource hackers are not interrested in fpgas'. they are just annoyed enough by devices with nonfree toolchains by experience to avoid them at ALL cost.21:22
kristianpaulbe friendly and they will come :)21:22
kristianpaulfriendly and skilled is powerfull combination21:22
rohlekernel: maybe you can explain what happend that you are annoyed?21:25
kristianpaulif somedy can say jfdi it because at least know how to do it but dont want, so you can provides guideliness for others dot it, even if you dont doa  single line of code21:26
kristianpaulno just "fuck" then away... :/21:26
lekernelroh, as I said: many hacker/open source people are afraid with this stuff. they obviously prefer blinky-LED arduino gadgets instead. that's why i'm slightly annoyed. it's all.21:28
wpwraklekernel: so what are your plans with llhdl ? was that an affirmative silence, before, i.e., have you lost interest in it ?21:30
rohlekernel: maybe you need to differenciate between the different levels of development. you do the basics on another level21:30
lekernelwpwrak, no, just try to speak to different people about it.21:31
rohmost want to solve their problem when developing something as cheap and simple as possible. not as free as possible. and in the end it doesnt matter if the chip is from xilinx or whatever (nxp? atmel? whoever builds soc), you WILL buy a 'chip' which is closed.21:31
wpwraklekernel: ah, good :)21:31
rohthen the equation is 'free tools' or 'complex closed tools' atm. and THATS what matters.21:32
wpwrakroh: i think you're barking up the wrong tree21:32
wpwrakroh: you should complain to xilinx, alteros, lattice, etc. convince them that they could sell more fpgas if they open their tools21:33
rohwpwrak: i know. i am trying to explain. i also want free fpga tools. but i also do not want to develop one myself if i can solve my problem in MUCH less work.21:33
wpwrakroh: lekernel is already working in the "right" direction21:33
kristianpaulgood point21:34
rohwpwrak: i am not complaining. i am describing what the reasoning of developers is what tools to learn to use and where to use their (often quite limited) time21:34
kristianpaulwell lattice alread move forward a bit, time to push xilinx.. but how?21:34
wpwrakroh: well yes, that much is pretty obvious, isn't it ? :)21:34
rohwpwrak: sure its the right direction. dont get me wrong. i fully support the way we are going.21:34
rohwpwrak: i learnt that every obvious which needs a transformation in thinking is already something which needs expaining from time to time.21:35
wpwrakroh: i think you're addressing the wrong audience ;-) do you really think anyone here _likes_ closed tools ?21:36
rohopensource devels see chip vendors just as 'producers of something which needs tools too expensive to buy yourself' and are happy that the numbers sold make them cheap21:36
lekernelkristianpaul, (guidelines) you can see that i'm doing exactly that http://www.ohwr.org/projects/ohr-meta/wiki/OHWorkshop21:37
rohchipvendors sometimes understood that (nxp, atmel) and sometimes not (xilinx)21:37
wpwrakroh: i think FPGA jargon gives a pretty strong hint of how people in that biz think. they don't have "software", they have "intellectual property" ;-)21:38
rohwpwrak: well.. duh. you see their fault? :)21:38
wpwrakroh: the typical FPGA customer wants to be closed. we're the exception. the typical EE is happy with closed tools on windows. and so on21:38
rohwpwrak: thats because fpga live from a 'no other way to do' market not from a broad spectrum of possible users.21:39
wpwrakroh: i wouldn't look at them for open tools. at the moment, there's little motivation for them. and it would probably very difficult for them to open their tools, because they may not even own all the necessary rights to do this.21:39
rohalso commercial devels choose a mcu if possible. simply because the amount of money they need to pay their devs to 'make it work' is much less than for a fpga based project.21:40
wpwrakroh: fpgas target a higher end market, yes. if an MCU will do, you don't need an FPGA.21:41
rohwpwrak: every usecase done by a fpga would be done by a specialized mcu if there would be the number of users to make it worth doing the 'chip' for it. and that also happens from time to time. see high end routers.21:42
kristianpaullekernel: (ohwr) oh, i havnt noticed it :), looks evry interesting, pleaser record your talk for the far away people :)21:42
rohstuff cisco did in fpgas 10 years ago is now done by a 3$ silicon from realtek. cisco still uses fpgas.. for stuff where there are no specialized chips for (e.g. routing engines)21:42
kristianpaulroh: are you plaing to start that copyleft layer 3 network switch? :-)21:44
rohkristianpaul: no. just using that example to show why people use fpga and where in commercial projects. it seems to be a 'noc soc buyable. last resort to make a product build-able at all' case21:45
lekernelroh, I never said we never will manufacture a milkymist asic. in fact, a large part of the current code should be portable to asics.21:46
wpwrakroh: you're overlooking the possibility of using an FPGA for more than some form of ASIC prototyping. i see great potential in partial reconfiguration, adapt the hw for your code. that's a domain that's still pretty much untouched. once synthesis is out in the open (cf. llhdl), work can start in this direction21:46
lekernel(i mean verilog)21:46
zumbiwpwrak: I was involved in a project like that, reconfigurable FPGA for SDR21:47
rohwpwrak: nobody cares abour reconfigurable hw outside of lab equipment or military use to be fair. atleast nobody is willing to pay the extra money that 'feature' does cost.21:47
zumbiI wish now I could assist OHR conf21:47
wpwrakzumbi: how far did you get ?21:47
lekerneleverything is possible, that, and free FPGA tools, we just need to get down to it (which also involves generating lots of sales for the asic thing) :p21:48
rohwpwrak: maybe that can change with free tools, yes. i sure hope so.21:48
zumbiwpwrak: there is an open project, let me search the link21:48
zumbiwpwrak: http://flexnets.upc.edu/trac/21:48
wpwrakroh: (nobody cares) well, there's a good amount of basic research that needs doing first ;-)21:48
rohalso its a question of the power budget. correct me if i am wrong, but afaik an fpga doing the exactly same as an asic based on the same design will eat more watt21:48
wpwrakroh: step 1: unlock the secret. step 2: learn. and so on ;-)21:49
zumbiwpwrak: I am looking forward for newer Zynq7000 devices21:49
wpwrakzumbi: (flexnets) so you design "IP blocks" in the traditional way and then connect them to each other ?21:50
zumbiwpwrak: right21:50
wpwrakzumbi: what i have in mind would go a little further: generate code and hardware description from the same source21:51
zumbiwpwrak: it adapts resources to users, lets say you got a dual BTS with 3G/WiMAX, depending they users you got, you reconfigure the BTS to allocate more resources to the network with more users21:52
wpwrakzumbi: e.g., you could write a - maybe C - program that implements some feature at a very low level, bit-banging and so on. then the "compiler" would identify functions that can be synthesized in hardware.21:52
wpwrakokay, but it's still at the level of modules21:53
wpwrakof course, because you need the heavy proprietary synthesis software to make your bitstreams :)21:53
zumbisure, while free tools sounds attractive, isn't there a free HDL synthesizer done by one of the fellows here21:55
wpwrakzumbi: maybe you mean lekernel's llhdl ?21:55
wpwraki think llhdl is a great start. even if it will be relatively primitive, once the whole process is implemented with free tools, it will be much easier to improve the tools.21:56
kristianpaulnice, Makefile-driven HDL flow (Pawel Szostek).21:57
zumbiwpwrak: were you trying to hint a compiler/synthesizer?21:57
wpwrakthe pioneering work is always the hardest.21:57
wpwrakzumbi: "hint' ?21:57
zumbiwpwrak: does such tool exist?21:57
wpwrakkristianpaul: death to all IDEs ! ;-)21:57
zumbiI have tried Makefile-driven HDL but failed :/21:58
zumbionce they upgrade IDE21:58
wpwrakzumbi: i don't know. at least nothing widely known. maybe some research projects under NDA, etc. but such secret things usually don't go very far21:58
wpwrakwe saw this in operating system research. before the Free unices, there were some projects that implemented kernel changes as binary modules for SunOS. Sun were "nice" to academia and let them have the sources, under NDA, of course. and they allowed them to distribute their binaries.22:00
wpwrakbut that was still difficult to use, and the sources were still closed. so such things weren't really useful.22:00
wpwraknow, fast-forward a few years. no kernel research would have much credibility in the days of open source unix if it didn't come with a patch.22:01
wpwrakand every once in a while, good work does find its way from academia into real life rather quickly22:02
wpwrake.g., things like RCU and various TCP and scheduling improvements were integrated into Linux fairly quickly. and they're substantial improvements of the art.22:02
wpwrakof course, not every linux patch that tweaks the scheduler or TCP is worth a PhD, but i think it's safe to say that research that seeks applicability is in a considerably better shape today than in the dark age of only closed source operating systems (omitting "research" operating systems that had very little scope)22:04
wpwraki hope very much to see the same happen when it comes to FPGAs22:06
wpwrakanyway, past 7pm, high time for breakfast :)22:19
mw|mobilehave anyone revceived my mail to the ml?23:06
mw|mobiletwo times.. ok gn823:14
wolfspraul[reading the backlog] I just checked the m1 box whether it still says 'fpga' outside, and yes - it does. Sebastien told me a few weeks ago that he thought we can remove it, but for some reason I didn't even though now I thought that we had... :-)23:30
wolfspraulSebastien was totally right I think now, we should have removed it. Next batch...23:31
wolfspraulfpga is a divisive term, too many people attach too many different experiences and feelings to it. Has nothing to do on the outside of a video synthesizer box.23:32
--- Wed Aug 17 201100:00

Generated by irclog2html.py 2.9.2 by Marius Gedminas - find it at mg.pov.lt!