#milkymist IRC log for Monday, 2011-08-22

wpwrakhmm, regarding the M1 extension connectors, are they keyed (i.e., is it possible/easy to reverse the polarity of a plug ?)00:24
wpwrakalso, how's the mechanical firmness of the board around them when in the case ? does it flex when inserting/removing a plug ?00:25
wpwrakalso, regarding keying, since they're both 9x2, can they be told apart except by position ? e.g., all other things being equal, color-coding would accomplish this00:26
wpwrakJ21 (the one with 3V3) has a nasty failure scenario: short any of pins 1 and 2 with 3 and 4, and the whole 3V3 rail becomes 5 V. i wonder if the M1 survives this :)00:28
wolfspraulall very good points. Mechanically, the two 9x2 headers are quite close to each other, I'm wondering whether it's possible to connect an expansion board into each one at the same time.00:43
wolfspraulmaybe we should define a size for them, so when people start to build something they know it in advance. then later if things are moved around on the board, we can keep that guaranteed size in mind so people can continue to use their older expansion boards in newer m1 boards...00:44
wpwrakexpansion boards would probably also want some form of additional attachment. not nice to have a board hang off a header, acting as lever00:50
wpwrakthe closeness of the connectors also eliminates a simple "bridge" structure, e.g., used by arduino00:51
wpwrakaw_: heya ! well rested from the weekend, and ready for battle ? what's the plan for today ?01:14
aw_wpwrak, go all rest boards to be fix2b version. ;-) meanwhile you can interrupt me if need. so long big rework again this week.01:17
wpwrakah, so no further analysis planned on the ones with incorrect TP36/TP37 voltage ? or the ones with NOR/bus issues ?01:17
aw_wpwrak, the more rework the more new problems i encounter.01:18
wpwrakheh ;-)01:18
aw_wpwrak, no, we can still plan analysis01:18
aw_even now01:18
wpwrakfor those with incorrect TP36/TP37 voltage, I would suggest to try the 3.3 V injection again (on TP36/37)01:19
wpwraki.e., connect 3.3 V through 100 Ohm and via an amperemeter, then see how much current flows into TP36/3701:19
aw_so later if I see any not a constant measured by in-circuit for D16, it has much probability is that D16 or C238 bad from soldering, i can just replace them as new ones.01:20
aw_i see01:20
aw_let's me settup a little then we check them01:20
wpwrakyes, C238 would be a likely candidate there. the measurements should guide us :)01:21
wolfspraulaw_: let me explain the 'big picture' plan as I see it this week, only high level.01:33
wolfspraulfirst you continue with a mix of applying fix2b to all 90 boards, and analysis of boards with problems01:34
wolfspraulafter we have 30 boards that are 100% pass with fix2b applied, you pause that fixind and testing work, and spend a day or two to assemble (case) and package 30 full retail units of M101:35
wolfspraulonce those 30 are ready for shipping, I will start catching up with some people that are waiting to buy, launch, shop, etc. you need to reserve maybe 2h / day or so for shipping out stuff. No worries you will get all paperwork prepared (invoices, HS code, 1040 form for US, etc)01:36
wolfspraulafter the 30 are ready for shipping, you go back to step 1, that is applying fix2b to all 90 together with analyzing boards with problems01:36
wolfspraulthat's how I see it :-) Let me know if it sounds wrong somewhere...01:37
wpwrakwolfspraul: ("bridge" connector) here's a more professional variant, the USRP: http://gnuradio.org/redmine/projects/gnuradio/wiki/USRP01:40
wpwrakwolfspraul: note that it also includes holes for spacers, for very good mechanical support. here's what it looks like with boards on it: http://gempillar.com/blog/2009/01/23/gnu-radio-rfid-reader/01:41
wpwrakwolfspraul: (spacers) the extra mechanical support is also needed because you can have boards that are TX/RX only, so they don't form a "bridge"01:42
aw_wolfspraul, since current "available" boards is more than 40pcs, this could be the first round of fix2b rework at one time job in order to get 30 full retail units.01:43
wolfspraulaw_: I don't think you should only go through the current 'available' boards. That's a little risky because we may still overlook a bigger problem somewhere.01:45
aw_or you just wanted an exactly 30 full retail units of M1, thus is 30pcs main boards 'available' enough, then immediately pause?01:45
wolfspraulso I think you can mix in some interesting or failure boards, if only to see that our fix2b and testing process is now strong and can always identify well between pass and failure.01:45
wolfspraulno not really, I just explain my thinking01:46
wolfspraulwhich is that after we have ca. 30 'avail - fix2b' boards, you need to pause that work for 1-2 days to do assembly and packing01:46
wolfspraulthen continue with fixing and testing01:46
wolfspraulbut right now, it's still important to analyze some failed boards, as planned. so that we are sure everything is under control.01:47
wolfspraulhere is a simpler version: :-)01:47
wolfspraul1. you continue exactly as you did last week. fixing and testing, analyzing some failure boards.01:48
wolfspraul2. at some point I will interrupt you, when I see enough boards (ca. 30) that I believe we can sell01:48
wolfspraulthat's all :-)01:48
wpwrakwolfspraul: i think what adam is saying that he already has 30 "available" boards, so your stopping condition is already fulfilled01:48
wolfspraulthey don't have fix2b applied01:48
wpwrakaah, i see01:48
kristianpaulthey lack fix2b01:48
kristianpaulah yes :)01:48
wolfspraulI just want to prepare Adam that there will be an interruption at some point, which is when we have ca. 30 fix2b 100% pass boards, and we are confident in our design, fix2b, and testing process.01:49
wolfspraulthen there's an assembly and packaging interruption, then back to fix2b/testing/fixing for all 90 boards01:49
wolfspraulaw_: sorry now I wrote so much :-) but just repeat the same thing 3 times. did you understand / agree with the process?01:50
aw_wolfspraul, l agreed , but here that I do here:01:51
aw_1. there's already more than 40 pcs put in "available" stage, not include passed "avail - fix2b"01:52
aw_2. mix good and bad boards, from the wiki results; the facts: the failure board are currently big "impedance" and few usb/midi  failure boards that I haven't fixed as "available" , Which of them are useless to approve fix2b design now.01:55
aw_3. from last first cluster with fix2b, we encountered a new branch of likely 0x32/0x3c/0x77 failure boards which I could say we they are bad board so far now. Those are we can keep to analysis.01:56
aw_4. so idea to use those more than 40 pcs "available" boards to meet/accumulate 30pcs "Avail -fix2b", how do you think?01:57
aw_5, once we reach 30 pcs "avai -fix2b" then we pause.01:57
wpwrakaw_: what is "big impedance" ?01:58
aw_wpwrak, before m1 to be powered on, I firstly measured their impedance on TP1 ~ TP4, TP33. They had have 'short' condition, which surely no need to apply 'fix2b' circuit.01:59
aw_wolfspraul, is that reasonable i replied?02:00
wpwrakah, so it's "low impedance" :)02:01
aw_for me, preparation is likely to be done as material to One-time job. so rework should be done from those 40 pcs "available" firstly02:05
wolfspraulok wait, reading :-)02:06
aw_later when everyday i tested, the avail-fix2b will be accumulate rising to 30pcs then we pause02:06
wolfspraulaw_: yes, all reasonable. _BUT_ I think you should definitely do some analysis in parallel, today, tomorrow. 0x32/0x3C/0x77/0x85/etc.02:08
wolfspraulnot 100%, but in parallel with finishing more fix2b boards02:08
wolfspraulmaybe 20% analysis, 80% finish fix2b boards :-)02:09
aw_wolfspraul, yes...just in parallel to feedback info from bad boards02:09
wolfspraulwe have a good plan :-)02:10
aw_well..sometimes let's see how amperemeter measured firstly....these would always be as ping pong status, hard to define a day that belongs to design validation day or productive day. ;-)02:12
aw_well...no more chats now...only work or logical analysis from now. ;-)02:13
aw_0x32: d2/d3 is fully off, tp36/tp37 is stable 3.3V, tp36 - 0.08mA, tp37 - 0.014mA02:34
aw_0x32: d2/d3 is fully off after power on, tp36/tp37 is stable 3.3V, tp36 - 0.08mA, tp37 - 0.014mA02:34
wpwrakdoes it boot ?02:35
aw_i didn't put it boot stage, so NO,  it's not.02:36
aw_just standby stage for boot02:36
wpwrakso what you mean is that the NOR isn't fully programmed yet ?02:37
wpwrakfrom last week, i see that 0x32 had some garbled NOR content (in the standby bitstream)02:38
aw_not exactly to say that. this have to be measured/triggered tp36 with tp35(DONE pin), to know if it's been finished reconfiguration stage.02:40
wpwrakmaybe it just needs a reflash02:40
aw_wpwrak, yes, i dumped 0x32 last week. good that you checked dump file already02:40
wpwrakthere's something strange with this board, though: sometimes, TP36/TP37 voltages are good, sometimes they're not. maybe give the reset circuit a good visual inspection. look for broken solder joints or things that could short a component.02:41
aw_no, before assert a nex reflash, do we miss some info or need to scope somewhere?02:42
wpwrakif you want, you can take another dump to check whether the NOR content is the same (or if something "magically" changes it)02:42
aw_wpwrak, yes, exactly sometimes it indeed, this morning I 've seen 0x32 auto boot again (d2 is ON) after powered on.02:43
aw_when it auto boot, i didn't touch it though02:43
aw_wpwrak, alright, try to if can dump it...02:44
wpwrakhmm, i'll understand all these LEDs much better once i get to play a bit with the M1 that's currently in ... memphis ;-)02:44
aw_wpwrak, yeah..you bet will.02:45
aw_fact on 0x32: i 've soldered pins on flash chip...it could be worse than hide the problem from my soldering02:48
wpwrakwhen did you do this ? recently or long ago ?02:48
aw_wpwrak, long day ago, before applied for fix2b from histories,02:49
aw_so i would just dump one time then we don't spend much time on 0x32, then back to 0x3c to see02:50
wpwrakah, you replaced the NOR chip. okay, that could indeed cause a lot of fun :)02:50
aw_no, not replaced chip02:50
wolfspraulmaybe the focus on 0x32 should also be to get it to work, rather than to analyze its current state?02:50
aw_resoldered pins of NOR02:50
wpwrakaw_: why did you resolder the pins ? did anything look wrong with them ?02:51
wpwrakwolfspraul: well, at the end of the day, that's the objective of the analysis ;-)02:51
aw_wpwrak, i was thought it had have soldering problem. ;-) long days ago. at that time you've not jumped here. ;-)02:52
wpwrakwolfspraul: i suspect we may end up with an uncertain status for that board, though: it may work but with a history of failures where nothing has been done to correct them02:52
wpwrakaw_: so the soldering problem was just a theory, but you didn't actually see or measure anything wrong ?02:53
wolfspraulI stay back. I just hope we are focused. Either we learn something, or we produce a result (make it sellable). But not get stuck on 0x32...02:53
aw_wpwrak, agreed. so let's just dump once, yes, no see actually voltage/or meausre02:54
aw_wpwrak, i would be later we back to see 0x32 after later I replace a new chip. not now.02:55
wpwrakaw_: the next step for 0x32 would probably be to reflash. i may then boot and render without further rework. but as i said, we may not be able to trust it.02:57
aw_wpwrak, agreed. then we stop 0x3202:59
wpwrakaw_: but the dump first :) and afterwards maybe reflash and see if it comes up. then we can forget it. let's not have a gazillion almost finished boards around. that just causes confusion.03:00
wpwrakwolfspraul: with 0x32, i just want to make sure it doesn't tell us anything new about the NOR. if it behaves, which it currently seems to be inclined to do, then it'll be a "kinda works but prefer not to sell it to customers" board. in a few minutes :)03:02
wpwrakfunny. 0x32-2 differs from 0x32-1.03:03
aw_how different?03:04
wpwrak0x32-1: ten 0->1 changes on DQ703:04
wpwrak0x32-2: three 0->1 changes on DQ503:04
wpwrakalso on different locations than 0x32-1.03:04
wpwrakvery interesting pattern.03:07
wpwrakokay, so ... don't reflash. this one is for the "unstable NOR/bus" pile then. company for 0x3a ;-)03:07
wpwrakjust to be sure: there was no reflash between dump 0x32-1 and 0x32-2, correct ?03:09
aw_yes, no reflash it again03:10
wpwrakokay, perfect. then we have to different dumps from the same content. just like 0x3a.03:11
aw_so at least a good consistent on 0x3a and 0x3c03:12
wpwrakokay, who's next ? 0x3c ?03:12
aw_0x3c, yes03:12
wpwrakaccording to the last dump, 0x3c's NOR content is perfect03:12
wpwrakso it should be able to boot. now, does it ? :)03:13
aw_some sort of pieces of your words I need to record in notes ;-)03:14
wpwrakkristianpaul: 121 is EREMOTEIO. that's a funny one, never saw it03:15
aw_wpwrak, 0x3c: you are right, good boot to rendering.03:16
wpwrakkristianpaul: (Remote I/O error)03:16
wpwrakaw_: heh ;-)03:16
wpwrakaw_: does it take a lot of time to run the other tests on 0x3c ? (the various tests you normally run, CRC, USB, MIDI, etc.)03:17
kristianpaulI'll upgrade next week, when get a new laptop to play..., i wasted too much time on this.. but may be some Fedora 15 user around want to give it a try to the package in  F-16 :-)03:18
aw_wpwrak, http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/3C-results03:18
wpwrakkristianpaul: ah, your bug has a response. good ...03:18
aw_you can scroll down a bit, there's three notes I marked03:19
wolfspraulbah. I would reflash and resolder and replace NOR on 0x32 and 0x3A until they work. what is the value of NOR paranoia that just leads us to some bad soldering in the end...03:19
wolfspraulbut sure, move them aside now and fix later. but not study forever - no value.03:19
wolfsprauljust fix them03:19
kristianpaulwpwrak: yeah. but i'm back to trusty debian now ;)03:19
aw_wpwrak, so yes surely I tested 0x3c by test program, but no more tests log I recorded after applied fix2b circuit.03:20
wpwrakwolfspraul: that's just hiding the problem :)03:22
aw_wpwrak, 0x3c is weird , why it did have messy pulsing on tp36/tp37 before, then then dump shows correctly then it works?03:24
wpwrakwolfspraul: maybe it's bad soldering. maybe not. the board also has strange things happening on the reset circuit. just making random changes until it works tweaks the statistics against you - you're actually decreasing the coverage of your production test.03:25
aw_guessed that prober's capacitance influence 0x3c's tp36 weh i probered it.03:25
wpwrakaw_: measure again ? if it's really only a probing problem, that would also shed some new light on the issue03:26
wpwrakaw_: but i somehow don't think it's so easy :)03:27
aw_wpwrak, 0x3c notes: 1. No VGA screen, replaced a new u19 then pass also as well as video input shows normally 2. rendering @ 2nd  then can't reconfigure 3. replaced new u7/u19/u20 4. d2/d3 dimly lit while TP37 and TP36 is unstable level, range 1.2V to 3.3V: http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_0x3c_ch1-tp37.JPG 5. when I attached prober on tp37, few seconds the messy pulse dissapeared and stays 3V3 steadly and03:27
aw_ d2/d3 is fully off. Messy pulses again after pressing middle btn. 6. applied fix2b 7. D16(in-circuit): For.V.=165mV, Rev.V = 1549mV 8. reflashed successfully 9. d2/d3 dimly lit after powered-cycle(tp36/tp37 pull high well) 10. d2/d3 dimly lit(tp36/tp37 is messy signal level 1.2V ~ 3.3V) after power-cycle, used prober touched TP36 can intermittencely let board d2/d3 is fully OFF and can boot up after pressing middle btn. 11. dum03:27
aw_p after power on, d2/d3 is fully off: http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/bitstream/0x3c-standby1.bit/ 0x3c's NOR content is perfect, goo boot to rendering,03:27
wpwrakaw_: regarding your notes, so the board tested okay (after some rework) but then it started to develop the tp36/37 instability ?03:28
wpwrakaw_: yes yes, i'm looking at http://en.qi-hardware.com/wiki/Milkymist_One_run_3_schedule#Test_Results ;-)03:28
aw_yes...from notes, you can say that...sot it's weird, that prober's capacitace can influence tp36 surely on 0x3c03:29
wpwrakaw_: hmm, i don't think the probe should be able to upset TP36/TP37 so much. something doesn't seem right.03:31
wpwrakaw_: but maybe just probe them again (with the scope). see if they're still unstable.03:31
aw_wpwrak, 0x3c today firstly powered on, d2 is fully off well, then good boot to rendering03:31
wpwrakaw_: ah, when you detemine the TP36/37 voltage, is this with the scope or a voltmeter ?03:32
aw_wpwrak, moment...i scope it for one minute to see,03:32
aw_i used scope03:32
wpwrakokay, then please try again with the scope03:32
aw_bad that d2/d3 dimly lit after powered cycle...let's see if prober can let d2 is fully off03:34
wpwrakkristianpaul: (debian) so you won't be able to try the package hans de goede suggested ?03:34
aw_or you want to dump firstly?03:34
wpwrakaw_: let's look at TP36/37 first03:35
aw_wpwrak, not just let my prober to let d2 off?03:35
aw_tp36 is stable 3.3V about 5 couple seconds then messy pulsing, d2 is still dimly lit03:36
wpwrakaha ! can you take a picture of TP36 and TP37 that shows the stable and the unstable part ?03:37
kristianpaulwpwrak: not until  next week03:37
aw_wpwrak, tp36 is still some sort likely of http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_0x3c_ch1-tp37.JPG03:39
aw_if the scope I press "stop" will like the waveform as above, or even messy one.03:40
aw_wpwrak, can you imagine the waveform's scenario? you know the pulsing is super messy03:41
wpwrakaw_: let's try the current measurement then. 3.3 V through 100 Ohm into TP36 and TP3703:41
aw_i got ready03:42
aw_no current uA i can measured on tp36/tp3703:44
wpwrakand you still get the "messy" signal ?03:44
aw_no messy now, it's stable 3.3V so far03:45
wpwrakthat's consistent with no current. at least something :)03:45
aw_need monitor lone time...03:45
wpwrakhmm. tricky.03:46
wpwrakdoes C238 look good (visually) ?03:46
aw_yes, lemme trying to both probering and current03:46
aw_C238 looks good03:47
aw_from last note, the Voltage on D16 is constant...03:47
wpwrakand i suppose U24 looks good too .. (it's a big component, hard to get wrong :)03:47
aw_I would to measure D16 again to see if pulsing means D16/C238 has been draftly varies after pulsing. ;-)03:49
wpwrakwell, why not03:49
wpwraki have to admit that i'm a bit puzzled. some of the effect may actually be a digital interaction. but it's all very strange.03:50
aw_fedex picker is coming...strange..wait03:53
wpwraklemme research reset chips a bit more. there are some parameters AiT don't mention ...03:53
aw_D16(in-circuit, power off): For.V. = 157V, Rev.V. = 1546mV03:56
wpwraksounds good. can you check the 3.3 V supply for variations/glitches ?03:57
aw_impedance to gnd: tp36 - 10.25 k ohm, tp37 - 18.45 k ohm....good to compare with good board.03:58
aw_wpwrak, ha...good reminder..you mean how's the ripple on 3.3V, right?03:59
wpwrakyou never know :) it would explain things03:59
wpwrakmaybe there's a short somewhere else they gets triggered from time to time03:59
aw_yup...this is a good direction way...03:59
wpwrakmaybe set up the scope as follows: CH1 on TP36, CH2 on 3V3. 2 ms/div. then trigger on one of those pulses on TP36.04:03
wpwraki just with your scope had more memory. that way, you could get the "big picture" but also zoom into the details.04:04
aw_hmm....messy pulsing it not listening to me...no happened now. :(04:16
aw_thinking how to reproducible it. :(04:16
wpwrakhow many times did you try ?04:20
aw_couples seconds only last time04:21
aw_imm monitoing scope04:22
wpwraki mean, how many times did you power cycle, trying to reproduce the instability ?04:24
wpwrakor does it come and go without power cycling ?04:24
aw_without power cycling....just wait couple seconds after prober touch TP3604:25
aw_but now even I used prober to TP36, it has not happened more..:(04:26
wpwraka heisenbug :-) http://en.wikipedia.org/wiki/Heisenbug#Heisenbug04:27
wpwraka bug that disappears when you try to analyze it04:31
aw_tp36 - 4mA, tp37 - 004:32
wpwrakwhoopie !04:32
aw_tp36 - 0.004mA, tp37 - 004:33
wpwrakah. boring :)04:33
aw_typed too fast...sorry to confuse. :)04:33
wpwrakstill  ... 4 uA may be significant. lemme calculate ...04:33
wpwrakhmm, even a voltage diffference of 1.3 V, that would be ~200 kOhm. weaker even than the pull-up.04:35
wpwrakbut .. what was the voltage ?04:36
wpwrakah okay, then it may just be resistance along 3V304:36
aw_read from scope04:37
wpwrak0.4 mV difference between the two ends of your 100R+meter setup. that's quite reasonable04:37
wpwrakmaybe power cycle a few times to see if the instability comes back04:38
wpwrakaw_: when you saw the instability happen before, was that just during an attempt to power on ? or did you have to do something in addition to this ? e.g., press the middle button ?04:43
aw_wpwrak, no need to press middle button, just used prober to touch tp36(sometime kept 3.3V stable, then instable level/pulsing happen.04:44
wpwrakokay. let's see if more power cycling causes it to appear again04:45
aw_i just finished 5 times powered - cycle with two probers to wait 10 seconds, no instability heppen (also d2 is fully off) , seems that hsisenbug like me.04:46
wpwrakdo you have an estimate of how many power cycles you did after fix2b and how many times the instability appeared ?04:46
aw_the instability once happened I immediately recorded into note. but no a system way to count how many or times I had have met totally04:48
aw_but I just felt one condition:04:48
wpwrakokay. let's just try a few more times.04:48
wpwraklet's say up to 20, so 15 more04:48
aw_1. will this bug related to temperature-oriented? seems now I can't reproduce it..04:48
aw_2. this morning when I firstly powered on, and probered tp36, it can be easiler to see messy pulsing..04:49
wpwrakif that doesn't make it happen, then try letting it boot a little more (i.e., press the middle button), then power cycle04:49
wpwraktemperature could be a factor, yes04:49
wpwrakresidual charges stored in caps may be another factor04:50
wpwrakbut let's vary one parameter at a time. if simple power cycling doesn't help, try maybe 5-10 times with booting the system (middle button)04:53
wpwrakif it still doesn't happen. try cycling with longer off periods. e.g., leave it off for ~1 min between tries. also 5-10 times. (you could combine this with lunch or some other fix2b rework :)04:55
aw_yup...i just accumulated 5 times of powered -cycle then still d2 is fully off good, no messy scope happened04:55
wpwrakif still nothing happens, i'd let the board cool down and discharge itself until tomorrow morning.04:55
aw_trys boot to rendering and power-cyle now04:58
aw_while this, i keep an eye on watch scope.05:01
aw_5 times of boot to rendering with power cycle05:05
aw_all worked well, no unnormal condition05:05
wpwrakthis bug is a slippery one05:05
wpwraklet's increase the power-off time then to ~1 min05:05
wpwrakif still nothing happens, 0x3c gets a rest until tomorrow morning. maybe we can then give 0x77 a quick try.05:09
wpwrakfor the measurements, you're soldering wires to the test points ?05:09
aw_yes, for preparations(soldering) on 0x77's tp36/tp3705:10
wolfspraulhave we tried fixing 0x32 by not putting the focus on learning/analyzing, but by simply replacing parts that could potentially be the source?05:11
wpwrakokay. so you're adding the wires to 0x77 already. good. that way, we can measure when it powers up the very first time.05:12
wolfspraulc238, d16, reset ic, nor chip, etc.05:12
wolfspraulthat'd be my approach05:12
wolfspraulif the problem is not even reproducible now, don't spend more time on 0x32, just put aside (like you are doing)05:12
wpwrakcargo cult engineering ;-)05:12
wpwrak0x32 is already back on the pile05:13
wpwrakwe're at 0x3c now05:13
wolfspraulyes I read it, just thought I throw my 2c in for 0x3205:13
wpwrak0x32 has some strange NOR data path problem. but it's not clear where it is. the reset circuit is most likely not to blame for this.05:14
wpwrakwhat's odd about 0x32 is that data that is read from the NOR seems to change. so that could be: failing NOR cells, bad I/O buffers in the NOR, some disturbance of the data or address bus (interference ?), bad I/O buffers on the FPGA, some obscure problem on the usb-jtag side.05:17
wpwrakah, also badly programmed NOR cells could be a cause (i.e., a "soft" error)05:17
wolfspraulreplace nor chip05:19
wpwrakso my next steps would be to read the NOR once or twice more tomorrow, see how the pattern behaves. try to program it again. see if it boots. if not, read back and see if there's corruption.05:19
wolfspraulI would replace the nor chip right away05:19
wolfsprauland not even now, because nothing to learn for fix2b now, so we can do that later05:20
wpwraknaw, that's way too drastic. if it's a soft problem, you don't need to replace the chip05:20
wpwrakyes, it's unrelated to fix2b05:20
wolfspraulthen move forward05:20
wpwrakwe're already at 0x3c :)05:20
wolfspraulyes I know, good05:20
wpwrakand possibly soon at 0x7705:20
aw_wpwrak, finished 5 times with ~ 1 min power-off time, it all goes well05:21
aw_i stop 0x3c now, we check this tomorrow morning again05:21
aw_let's at 0x77 firstly:05:21
wpwrakthe problem with "changing chips until it works" is that you may never solve the problem. so in the next run, you'll just get N times the boards that need arbitrary changes. worse, if the problem persists, you'll just rework the board to death and haven't learned anything.05:22
wpwrakaw_: yes05:22
wpwrakwolfspraul: also, replacing the NOR chip is high risk. you need to heat up a relatively large area, pull the chip without force, clean up all the pads, maybe clean the board from flux (optional, but things get messy quickly if you don't), then solder the new part (this is the easiest bit)05:28
wpwrakwolfspraul: so there's plenty of potential for damaging pads05:28
wolfspraulI have a different perspective. Not everything needs to be understood, it's about economics.05:29
wolfspraulonce we have the feeling that there is nothing that we will learn from 0x32 (for example) that applies to any other board, the value of 0x32 drops dramatically.05:29
wolfspraulin fact at that moment it's probably not worth even 5 minutes of the time of someone like Werner05:29
wolfspraulit's difficult to make the decision about 'can we still learn something that applies to other boards?' though05:30
wolfspraulI'm close to saying: no, we cannot05:30
wpwrakwolfspraul: but that's for the production phase. there, based on prior analysis/experience, you just have standard set of attempts at fixes, which may include replacing NOR chips. but so far, we don't even have any evidence tha there is anything wrong with the chip. maybe it's cross-talk on the bus.05:30
wpwrakwolfspraul: i'm not convinced yet that it's just a freak board. we already have two with similar issues. it's a cluster in the making ;-)05:31
wpwrakwolfspraul: so for now, i'd just examine boards that show good D16 and TP36/37 result but still don't boot for NOR corruption and add any that exhibit variations to that cluster05:33
wpwrakwolfspraul: so for now, we seem to have two trouble areas: at least two board that exhibit variations when reading the NOR (good old 0x3a and now 0x32), and those that "out of the blue" get instability on TP36/37 (0x3c, 0x77, i think there's at least one more)05:37
aw_0x77: d2/d3 is fully off, tp36/tp37 is stable 3.3V now with two probers touchs05:37
wpwrakwolfspraul: from the symptoms, it appears that the TP36/37 instability may not be the cause but the effect of some other problem05:37
wpwrakaw_: 0x77 is a bastard !05:37
aw_no messy happened yet05:38
wpwrakaw_: so you can boot etc. ?05:38
aw_tp36 - 0.214mA, tp37 - 0.039mA05:39
wpwrak214 uA ? that's quite a bit05:39
aw_wait...watch for a while. ;-)05:39
wpwrakwhat's the voltage ?05:39
aw_still 3.3V @ tp36, yup ..weird cuurent05:39
aw_alright...time to press middle btn05:40
wpwrak21 mV at 100 Ohm. hmm.05:40
aw_no boot....d2/d3 dimly lit05:41
wpwrakhah !05:41
aw_messy now05:41
wpwrakexcellent ! ;-05:41
aw_hope i can capture05:41
aw_stay tuned05:41
wpwrakyeah :)05:41
aw_got it..but seems not related to 3.3V...05:42
aw_TP1 is 3V3 net05:47
aw_trigger at falling edge05:47
aw_the instable pulsing now still keeps...05:48
wpwrakvery good. let's keep it pulsating :)05:48
wpwrak(scope picture) fascinating05:49
aw_but it acts not always like messy pulsing...sometimes back to pull high good..then drop to messy pulsing05:49
wpwrakcan you please move CH2 to TP37 and see if the mess is there, too ?05:49
wpwrak(sometimes stable/sometimes messy) yes, like t < 0 vs. t > 0 on the screen shot05:50
aw_too bad...05:51
aw_now all stable @ 3.3V, d2/d3 keeps dimly off05:51
aw_dimly lit , sorry05:51
wpwrakwhile watching TP36, can you gently push against the terminals of C238 ?05:52
aw_wpwrak,  gently push against?05:53
wpwrak(with something non-conductive, fingernail, toothpick, etc.)05:53
aw_but now tp36 is pull high gooe enough05:53
wpwrakapply a bit of pressure on the terminals from various sides05:53
wpwraksee if there's a bad solder joint or something else05:54
aw_i see05:54
wpwrakif C238 is okay, repeat for the reset chip05:54
aw_no, i think their soldering is quite good, i can use microscope to catch them though. ;-)05:56
aw_after I put more pressures on c238 and reset ic is the same, no changes now...05:57
wpwrakif this still doesn't yield anything. then try pushing on the PCB such that it bends a little. around the reset chip, C238, D16, and then anywhere on the board (maybe in a grid pattern with ~1-2 cm spacing)05:57
wpwrakmaybe there's a hairline crack somewhere05:57
wpwrakcould also be inside a chip05:57
wpwrakif nothing happens, maybe try power-cycling followed by the middle button 1-2 times and see if the problem comes back05:58
aw_hmm..no come back...start to power cycle06:01
wpwrakif all looks good, try to boot06:02
aw_i saw it06:02
wpwrakit came back ? great ! :)06:02
aw_tp37 synchronized to tp36 at 2nd power cycle06:03
wpwrakcan you take a picture showing just two peaks ?06:03
wpwrakcan you "zoom" in until you have just 2-3 peaks on the screen ?06:07
wpwrakand after that, try the 3.3V -> 100 Ohm -> amperemeter experiment again. first to TP37. check the current also also see if this ends the instability ?06:09
wpwrakoh, and you may want to set the voltage offset of CH2 to exactly -4.00 V :-)06:12
wpwrakwow, looks like a data transmission ;-)06:13
aw_wait...you wanted me set voltage offset of CH2 to exactly -4.00V now, or no need?06:14
wpwrakyes, please set it to -4.00 V. that way, it's easier to compare voltages. (just for the future)06:15
aw_when the TP37 starts a messy pulsing...the current goes up to roughly 1mA06:16
aw_pull high...surely no current on TP3706:16
wpwrakokay, now TP3606:16
aw_yes, TP36 is 1mA when stays pulsing too.  pull high is 3uA06:18
aw_now...the instability is rare to happen though06:18
aw_it seems to be warm up then goes to disppear a bit06:19
aw_but need to monitor more06:19
wpwrakcan you try to catch it again at ~10 us/div ?06:19
wpwrakand then, remove CH2 from TP37 and check the voltage rails. TP2 (2V5), TP3 (1V8), and TP4 (1V2)06:22
wpwrakah, and also TP33 (5V) and TP26 (4V3)06:24
wpwrak(did i catch them all ? there are so many :)06:24
aw_yes, not big discoveries. ;-)06:25
aw_i have to power off and soldering wires06:25
wpwrakvery crazy noise06:25
wpwrakncan't you just touch the TPs with the probe of CH2 ?06:25
aw_three pads under usb-jtag board06:25
wpwrakmaybe just remove the board06:26
aw_I'll firstly unplug usb-jtag06:26
aw_then just touch them sure06:26
aw_hope still reproduce it06:27
wpwrakoh, and how it your M1 supplied ? from a regular power supply or from a lab power supply ? in the latter case, you may want to have a look at the current consumption in the unstable state06:27
aw_from regular power supply06:29
aw_good news at least tp36 pulsing ie not related to TP1~4, TP33, TP2606:29
wpwrakokay, let's keep checking the total system current for later06:29
aw_what else we think TP36 came from ?06:30
aw_okay...sorry that what's "grrr"?06:30
wpwrak"grrr" = i was hoping for an unstable supply rail06:31
wpwrakand it's unstable also without the jtag board ?06:31
wpwrakis there anything else connected ? ethernet, audio, ... ?06:32
aw_no connections on 0x77 at all06:32
wpwrakthat would have been too easy, i guess :)06:32
aw_unless we just removed usb-jtag board and went though all if TPs related to TP36. ;-)06:33
aw_yup..so your guessing on regular supply and lab power supply is reasonable06:34
wpwrakhmm. it would seem that PROGRAM_B becomes an output. or that something else connects into the PROGRAM_B/TP36/reset out net06:34
aw_switching to lab power supply now...06:34
aw_hmm...set limited 1A at lab power supply, 0x77 TP36 still got pulsing06:37
aw_a total lab power current shows 0.55A06:38
aw_alright...seems 0x77 is easiler to reproduce messy pulsing06:39
wpwrakgood :)06:39
aw_wpwrak, i think we stop now analysis today06:39
wpwrakwait a minute. two more ideas.06:40
wpwrakbut i need to look up something first. i'd be curious about how adjacent traces behave in relation to TP3606:40
wpwrakthere are two candidates: the trace "north" of D16, roughly under the white bar that marks the polarity06:41
aw_not very clear on this , can you slowly describe it06:41
wpwrakand the one coming our next to R3006:41
aw_go on06:41
wpwraki need to find places where you can actually measure them06:42
wolfspraulit just gets interesting with 0x77 :-)06:42
aw_wpwrak, get syncronized to Dram routes?06:43
aw_which other partly circuit you want to scope, i can check here. ;-)06:44
wpwrakaw_: dunno. doesn't look like DRAM. but lemme find the package definitions ...06:44
wpwraksigh, if all this was done in kicad, i could just click on the pad and know what it does ...06:44
wpwraklekernel_: you aren't awake by any chance ? :)06:45
wpwrakone would be ball AA2 ... now, what is this ...06:47
wpwrakah, AA2 = FLASH_D8 = DQ8 = U9 pin 3406:52
aw_just let me know your idea, i can open my windows tool to see surrouding signals or ball under fpga06:52
wpwrakso that's one potential correlation to check: TP36 noise vs. pin 32 of the NOR06:52
aw_got it06:52
wpwrakthe other would be AA4 = BTN2 = ... naw06:55
wpwrakanother candidate would be FLASH_CE_N, pin 14 of U9 (2 "up" from pin 16 of FLASH_RESET_N)06:58
wpwrakbut FLASH_D8 is much more likely. FLASH_CE_N is also on the "wrong" side of D1607:00
wpwrakaw_: to get to the instability, is it sufficient to just connect power ? or did you also have to press the middle button ?07:06
aw_wpwrak, no need to press middle, just power cycle and you can see d2/d3 is either dimly lit or fully off. 0x77 has both messy pulsing.07:08
wolfspraulif must be caused by some part behaving differently from other (functioning) boards. but which part can cause this?07:09
wpwrakok. are you checking for correlation now ? TP36 vs. pin 34 of the NOR07:09
wolfspraulI would just make a priority list from most likely to least likely, and then replace them one by one with new ones.07:09
aw_wpwrak, yes..the pins are close.. so need to touch very carefully07:10
wpwraksorry, pin 3407:10
wpwrakerr .. pin 34 was correct. getting tired :)07:11
aw_yes, i knew,07:11
aw_so you go sleep first, i07:11
wpwrakaw_: naw, i'll wait ;-)07:11
wpwrakwolfspraul: for the 0x77 problem ? good question :)07:11
aw_I'll scope p34 and FLASH_CE_N07:12
wpwrakwolfspraul: right now, it looks plain impossible. we're seeing signals on TP36 that have no business to be there07:12
wpwrakwolfspraul: plus, they also look wrong. like two outputs working against each other07:13
wpwrakwolfspraul: while all we really should have there in an input07:14
wpwrakwolfspraul: so if the correlation check doens't yield anything, the next step would be to see if lekernel recognizes something familiar in the scope screenshots. or if he knows of a condition where PROGRAM_B can become a weird output.07:15
wpwrakwolfspraul: if he doesn't have any magic rabbit in his hat, i'd start simplifying the circuit. maybe start with pulling U2407:16
wpwrakwolfspraul: if this doens't do the trick, remove C238. if the instability is still there, D16.07:17
wpwrakwolfspraul: one problem is that i'm not sure we have enough data to be able to tell when the instability has gone for good. so it if can't be observed at a given point in time, it may be necessary to let the board rest overnight.07:18
aw_wpwrak, yes, you seems that right, tp36 seems sycronized to pin 34 of NOR07:18
aw_moment...i still need to catch a very firm waveform. ;-)07:18
wpwrakwheee ! now that's nice news ;-)07:18
aw_caught, yes!07:20
wpwrakthe next two steps: resistance between TP36 and pin 34 of the NOR (for both polarities). compare with the same resistance of two known to be good boards.07:21
wpwrakmaybe start with the good boards first to give 0x77 some time to discharge07:21
wpwrakbut let's first wait for the evidence ;-)07:22
wpwrak(i.e., the picture :)07:22
wpwrakwolfspraul: if the impedance is unusually low, you'll like the next step: visual inspection of the underside of the FPGA :)07:23
wpwraknice ! the voltage levels are a little odd, though07:25
wpwrakbut let's see about the impedance now07:25
wpwrakor wait07:26
wpwrakmaybe take another scope shot at 500 ns/div07:27
wpwrakhmm. thinking a bit more about it. the image does not suggest a simple short. otherwise, DQ8 would have to be at ~1.6 V too (i think the noisy "floor" of DQ8 is Z)07:31
aw_a high impedance of Z as we knew, should pull high chip inside or outside resistor, this waveform still can not say that a possible 'short' under between them07:34
aw_even if it's short, the waveform should not be like that level.07:35
wpwraklet's forget about the impedance for now. this is more mysterious.07:35
aw_do you think that that could be an interconnection inside fpga? as related to program_b?07:35
wpwraknext try: TP36 and NOR pin 54 (OE#/FLASH_OE_N)07:36
wpwrakit could be the FPGA just acting crazy, yes. but then, that's a bit too convenient an assumption :)07:36
wpwraki'd leave all the "crazy FPGA" theories to sebastien. he's probably read about a good number of FPGA madness issues, so something may sound familiar. if not, we probably have something else.07:37
aw_wpwrak, no pulse on pin 54 of NOR07:38
wolfspraulis there a chance that this board (this particular one, 0x77), could pass our test program and 10 render cycles?07:39
wolfspraulor is it behaving badly enough from what we see so far that that is impossible?07:39
wolfspraulnot theoretically, but practically this one, 0x7707:39
aw_wpwrak, no relations between TP36 and pin 54 of NOR07:39
wpwrakwolfspraul: this one may be badly off enough. but the very similar 0x32 went for a while without showing problems.07:40
wpwrakaw_: is OE# low all the time ?07:40
aw_yes, you got it, always at low07:40
wpwrakthe results would correspond to having about 5 kOhm between AA1 (PROGRAM_B) and AA2 (FLASH_DQ8)07:44
wpwrakthat  way, we'd just end up at around 1.6 V, with the kind of charge/discharge pattern on PROGRAM_M we see (C238)07:45
wpwrakwhere it gets a little mysterious is what would provide these 5 kOhm. flux residues ? cooked I/O driver ?07:46
aw_so next steps to measure impedance on bad and good board.07:46
wpwrakmaybe let's measure the impedance test now. TP36 vs. pin 34. both ways, i.e., swap probe +/-.07:47
aw_regards to if flux residues, this 0x77 i see it that is clear,07:47
aw_need to power off now07:48
wpwrakaw_: (0x77 and flux) no change of something trapped under the FPGA ? :)07:49
wpwraki've never seen flux do as little as 5 kOhm. but then, there's a first time for everything ;-)07:50
aw_wpwrak, it could still have chance under FPGA, yes, but you were right, me too on that a flux can take a 5 k ohm?07:51
wpwrakhow big is C238 now ?07:52
wpwrakmaybe your flux has extra-powerful ions ;-)07:54
aw_10.6 K ohm  from TP36 to pin 54, 118K ohm reversely07:56
aw_sorry that it's pin3407:57
wpwrakhow big is C238 ?07:57
aw_bad that i don't have equipment can measure capacitor now.07:58
wpwrakyou could build an oscillator with a 555 ;-)07:59
wpwrakthen you could _hear_ the capacitance :)07:59
aw_i saw an arduino with few parts to do. ;-)07:59
wpwrak10/120 kOhm is on 0x77 ?08:01
wpwraknow, two known to be good boards for comparison08:01
aw_59 / 118 k ohm on 0x4008:04
aw_58 / 129 kohm on 0x7a08:06
wpwraksomething's a little off :)08:08
wpwrakmaybe measure 0x32 too08:08
aw_0x3c: 63 / 131 k ohm08:08
wpwrakso, "normal" = ~60 / ~120-130 kOhm. ~11 / ~120 kOhm of 0x77 is bad.08:09
aw_0x32: 53 / 129 k ohm08:09
wpwrakaw_: do you remember the value of C238 ?08:09
aw_so yes, at leas 0x77 is bad08:09
aw_C238 is 220 pF08:10
wpwrakthanks !08:11
aw_also thanks to you !08:12
wpwraksimulation says this: http://downloads.qi-hardware.com/people/werner/m1/tmp/pin34.ps08:13
wpwraka bit over-simplified of course08:13
aw_4KOhm is an equivalent resistor inside that pin34?08:14
wpwrakhmm, 0x32 is about normal. so either it has a separate problem with just the same symptoms or we haven't found the real cause just yet08:15
wpwrak(4 kOhm) yes08:15
aw_agreed >>> so either it has a separate problem with just the same symptoms or we haven't found the real cause just yet08:16
wpwrakdo you have any cleaning process that has some hope of removing flux or dirt from under the fpga ?08:16
wpwrak(preferably without just moving the dirt/particle to another area of the fpga)08:16
wpwrakwell, but then that probably doesn't make sense08:19
wpwrakhmm. thinking ...08:19
aw_hmm..this quitely needed to be think more if need to see if flux or dirt from under fpga... I've ever not dealt this topic.08:19
aw_as you really knew that it's still possible a flux to be as likely huge ( 60 - 10 )= 50 KOhm...to bring resistance down, no big surprising; But if it indeed is. How few boards got similar problem like this. and Won't any other balls under FPGA surrounding Program_B be influenced too? and just only Program_B?08:24
wpwrakwell, this is the corner in which the rework was done08:24
wpwrakAA1 and AA2 are in the second row of balls, very close08:25
aw_That's too weird, 0x3c and 0x77 has been tested successfully on all I/Os though..08:25
aw_yeah...it's close enough to cause this08:26
wpwrakbut ... if it was just flux, it should have the same conductance in both directions08:26
aw_yeah...so that's too weird to say it's a flux problem now. ;-)08:26
wpwrakmaybe heat damage in the FPGA from the rework ?08:26
wpwrakbut then, i'm still not entirely sure whether we're seeing cause or effect here08:27
aw_hmm....don't know exactly . but i knew factory used heat air to blow C238 and R3008:27
aw_so only I go to Xray to find secrets on 0x3c/0x7708:28
wpwrakdo you have an xray session planned ?08:28
aw_yes, no cause surely known now08:28
aw_yes, sure08:28
aw_but I hope do X-ray later08:29
aw_so i think that I go for next reworks on fix2b to accumulate 30pcs of 'avail-fix2b' done08:29
aw_then I think from them, we may get more boards like 0x77 similar too.08:30
wpwrakheh :)08:30
aw_at the end, we go for X-ray to see if any consistence existed inside of this weird problem.08:31
wpwrakah, one more test please: 0x77, correlation of TP36 and pin 17 (A11)08:31
wpwrakthat would be a neighbour of FLASH_RESET_N08:31
wpwrak(the other one is VPEN, which is tied to 3V3)08:32
aw_no instability more.. :(08:33
wpwrakhi murphy ! good to see that you're watching :)08:34
aw_good ...reproduced now08:34
wpwrakah, nice ;-)08:34
wpwraki'm beginning to like 0x77. that's a good board. fails whenever we ask it to. not like 0x3c ;-)08:34
aw_not A11, no correlation08:40
wolfspraulis there anything in 0x77 that we believe we can learn that impacts other boards?08:41
wolfspraulif no - put 0x77 aside. if yes - continue studying it.08:41
wolfspraulwriting off 0x77 is no scary thought to me08:41
wolfsprauldelaying rc3 sales by one day (for example) is a scary thought08:41
wolfspraulso we need to balance between those two...08:41
wolfspraulI'm not following the electrical analysis and logic in detail today, so can just repeat the obvious high-level thinking...08:42
wpwrakaw_: (no correlation) good, thanks08:42
wpwrakwolfspraul: 0x77 is tricky. what's worrying is that 0x77 and 0x3c both show intermittent instability. on 0x77 it's fairly easy to reproduce, on 0x3c not so easy.08:44
wpwrakwolfspraul: so far, we don't have a good explanation of what's going on. 0x77 exhibits one anomaly that could be causally linked, but 0x3c doesn't show this anomaly.08:45
wolfspraullike I said. the key question is "can we learn _anything_ that applies to other boards?"08:45
wolfspraulI understand that question is not easy to answer, but that's what it's all about.08:46
wolfspraulif you are the first who can say "no" to that question, and you are right, that's great value08:46
wolfspraulof course not if you were wrong :-)08:46
wolfspraulxray pile...08:46
wolfspraulglancing over the result today doesn't make me worried about fix2b and our ability to produce 100% pass boards08:47
wpwrakwolfspraul: yes, xray pile sounds best for 0x77 and 0x3c for now08:47
wolfspraulso let's move forward08:47
wpwrakmaybe we'll have new ideas in a while as well. i don't have anything else i'd want to try on 0x77 at the moment. there are a few "destructive" tests (repairable) one could do to 0x77, but i'd save them for as late as possible08:49
aw_wpwrak, so I'll go for fix2b reworks firstly...but if you just think out any possible cause reason/or idea, you ping me.08:50
wpwrakalso because they may just make the problem disappear for the wrong reason. e.g., there may be a feedback loop. if you break it, the instability may vanish, but once you restore the normal functionality, it'll be right back.08:51
wpwrakaw_: yes, sounds good. thanks for all the testing ! we made some good progress into understanding the behaviour of that critter :)08:52
aw_wpwrak, hmm...good reminder that this could be possible as transfer functions: positive feedback or negative one. maybe08:52
aw_so i go reworks firstly. ;-)08:52
wpwrakaw_: the feedback loop i have in mind would be unknown -> PROGRAM_B -> FLASH_RESET_N -> unknown -> ...08:53
wpwrakaw_: we could break the loop by removing D16, but that wouldn't remove the unknown -> PROGRAM_B path. without the feedback, maybe the board will just reset a few times and then boot, so you never notice that something is wrong. but of course, when you bring the flash reset back, also the feedback loop returns.08:54
wolfsprauljust replace 0x77 with 0x78 :-) (joking, joking)08:55
aw_as you knew that system transferring is open and close types. I hope this weird problem is not existed as close type, so you remove D16, will let it acted as open type.08:56
wpwrakwolfspraul: like the airlines do when one of their flights crashes ? ;-)08:56
wpwrakaw_: it would be a bit like treating a broken bone with painkillers ;-)08:58
wpwrakvery efficient symptom treatment, but ... :)08:58
aw_wpwrak, alright. we do such as later. I am poor adam to do reworks firstly though...keep this surely good idea to get some approach later08:58
wpwrakaw_: and maybe get that xray trip scheduled :)08:59
aw_wpwrak, yup...may more later. but will.09:00
wpwrakwolfspraul: worst-case outcome: FPGA shows extensive heat damage on 0x77 (and similar) but also on "good" boards examined for comparison. would be hard to decide what to do in this case.09:01
kristianpauloh, wpwrak have a mm1 as well, nice !11:12
wpwraknot yet ! at the moment, it may be in memphis or maybe a few hours south-southeast of memphis11:25
wpwrakwolfspraul: phrew. just caught up with some really old stuff on #qi-hardware. the RTC thread was scary. pages of supercaps before finally the CR2032 was mentioned ;-)11:45
wpwrakaw_: how's the fix2b on the "good" boards going ? any boards that have gone from "good" to "bad" ?12:19
aw_wpwrak, tonight, wont test just rework. ;-)12:47
wpwrakaw_: a wise decision ;-)12:57
wpwrakaw_: ah, maybe tomorrow morning, you could try board 0x3c. first, see if you can reproduce the problem. and if yes, show TP36 and pin 34 at 10 us/div and at 500 ns/div. that way, we can see if the signal shape is the same of if it's different. also, measure the impedance between TP36 and pin 34 again.12:59
wpwrakaw_: (measure again) e.g., if thermal expansion is part of the equation, the impedance may be "normal" when warm but decrease when cold. i very much hope this isn't the case, but let's be safe and check.13:00
wpwraki'll be afk for a bit13:04
aw_wpwrak, okay...good reminder on thermal equation thing, thanks.13:04
wpwrakalready back :)13:15
wpwraki missed that today is a holiday ... the bastards introduced it just some 3 months ago, so it doesn't show up in any printed calendar13:17
wpwrakaw_: oh, and when you measured the resistance, which side had the "high" voltage ?13:40
aw_resistance between TP36 and pin 34?13:43
aw_it's measured both ways( prober +/-) after powered-off13:46
aw_so i wont be known which side was the high voltage. ;-)13:46
wpwrakwhich side was the red wire ? ;-)13:47
aw_your question about this was strange, hope that i was misunderstood your meaning. ;-)13:47
wpwrakif you have two multimeters, you could even check that the red wire is really the high voltage :)13:48
wpwrakwhat i mean is this: when you do a resistance measurement, the multimeter injects a current. one of the two sides must be high and the other low :)13:48
aw_the 10 / 120 was:13:48
aw_10 KOhm measured was red (high) on TP36, so 120 KOhm was red on pin 3413:50
aw_phew~ just understood your question though. he ;-)13:51
wpwrakkewl, thanks ! that would even make sense13:51
aw_oh, yup13:53
rohwpwrak: wouldnt it be helpful if you would be in taipei now?14:14
wpwraknot sure. if i had my lab with me as well, yes :)14:15
rohwpwrak: heh.. i see.. so we don't have a lab in tpe?15:54
wpwrakroh: well, adam's home lab. TDS1012, etc.17:02
Fallenoulekernel_: I've seen you merged a few commits from rtems cvs head into mmstaging 5 days ago, but you didn't merge the changes to cpukit/zlib/zconf.h.in , is it on purpose that you stay with the v1.1 instead of the v1.1.1.2 ?19:22
lekernel_maybe that's some git-cvs bug19:23
lekernel_that would explain the problems19:24
Fallenouyes maybe19:24
wpwraklekernel_: hey, good to see you here ! :) look what new tricks your M1 has learned:19:34
wpwraklekernel_: http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_0x77_ch1-TP36_ch2-TP1.JPG19:35
wpwrakCH1 is TP6 / PROGRAM_B, CH2 is the 3.3 V supply (we were looking for power supply glitches - without finding any)19:35
lekernel_yeah I saw that thing already19:36
wpwrakthat's when the board should be loading its standby bitstream. instead, it's having fun with "interesting" signals on PROGRAM_B19:36
wpwrakah, all of it ?19:36
wpwrakalso this ? http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_0x77_ch1-TP36_ch2-NOR-pin34-DQ8_500ns.JPG.JPG19:36
lekernel_well, that one scope picture19:36
lekernel_no, not the second19:36
lekernel_what's that?19:36
wpwrakthat's CH1 on TP36, as before, CH2 on NOR DQ819:37
wpwrakhere a bit zoomed out: http://downloads.qi-hardware.com/people/adam/m1/pic/rc3_0x77_ch1-TP36_ch2-NOR-pin34-DQ8.JPG19:37
lekernel_I don't see anything worrisome about DQ8 on those pictures, but TP36 is pure crap19:38
wpwraknow, you may wonder what on earth PROGRAM)_B and NOR DQ8 have in common. well, PROGRAM_B is on ball AA1, and NOR DQ8 is on ball AA2.19:38
lekernel_solder bridge?19:38
wpwrakthe 500 ns pic shows that DQ8 seems to push PROGRAM_B19:38
wpwrakadam measured and found ~10 kOhm in one direction, ~120 kOhm in the other19:39
wpwrakso there's a diode somewhere in the mix19:39
wpwrakjoerg thinks it's a fried FPGA. ESD or such.19:40
wpwraki was wondering if you had any other interpretation of this mess, or any ideas what to try19:40
lekernel_you could be simply measuring some CMOS protection diode to VCC through some pull-up resistor19:41
wpwrakto have another board with similar symptoms but DQ8-PROGRAM_B "resistance" normal (about 60/120 kOhm, like on several "good" boards)19:41
wpwrakyes, we must be measuring something like this. what's remarkable is that ll the other boards are around 60/120 kOhm, while this one is about 10/120 kOhm19:42
wpwrakthe measurement itself is very dirty, because it's not anything ohmic we're measuring there. so the value also depends on adam's instrument, etc.19:43
lekernel_maybe you could show that to a xilinx fae?19:43
wpwrakbut it seems the measurement is repeatable among several "good" boards. and the one with that rather interesting correlation between PROGRAM_B and DQ8 happens to be different.19:43
lekernel_'interesting'... hm :-)19:44
wpwrakmaybe you can try that. i have no idea about their support :)19:44
lekernel_I would rather call it pesky, annoying, time-wasting, and other such adjectives :-)19:45
lekernel_well, debugging those things is basically their job19:45
lekernel_they do that all the time19:45
lekernel_and they're often good at it19:45
wpwraki'm actually less worried about "saving" this board, but about finding a reliable test that tells us if something is amiss there. because we have another one with similar symptoms on TP36, but where DQ8 measures normally. we don't know yet if DQ8 and TP36 appear connected there, too, or if it's maybe another pair of pins that join forces19:46
lekernel_could it be some problem in the PCB substrate? or regular capacitive/inductive crosstalk?19:46
lekernel_seems unlikely because you're seeing a diode...19:46
wpwrakplus, we have some more boards with strange effects on the NOR signals. of course, if the FPGA's I/O pads in that area are damaged, that could explain a lot of strange effects. but i wouldn't jump to conclusions just yet. maybe it's a completely unrelated problem.19:47
lekernel_unless Murphy created some FR4-based semiconductor ofc :-)19:47
wpwrakat first, i suspected flux. the we found the diode ;-)19:47
wpwrakDQ8 was also more of a lucky discovery. we found it by examining traces adjacent to PROGRAM_B or FLASH_RESET_N (the latter is of course affected as well, so we may even see a feedback loop. of course, if we were to break the feedback by removing D16, we wouldn't solve the underlying problem.)19:52
wpwrakthat is, if it's really DQ8 affecting PROGRAM_B. the signal shape would agree with this theory. there could of course be another signal that just looks the same, and DQ8 simply happens to have the same pattern.19:53
wpwrakalso, DQ8 isn't all that pretty. there are some interesting little runts in the 500 ns picture. not that, according to adam, OE# is held low throughout all this. so DQ8 should be driven all the time.19:55
wpwrakah, and to make it all more interesting: the board affected by this don't always show this "noise" on PROGRAM_B. 0x77 does it almost always, but 0x3c is much less eager. it seems that, with increasing temperature, the probability of this occurring drops in 0x3c.19:57
wpwrakas in: in the morning. adam just had it happen quite often, but later on, he booted into standby and sometimes even further about 15 times, without a single such anomaly19:58
wpwraklekernel_: anyway, so you haven't come across any xilinx errata saying that PROGRAM_B can become an output with weird signals ? or anything else that would explain the madness ? (besides the hypothesis that the chip is indeed damaged)21:10
rohwpwrak: are there more than 1 board with that behaviour?21:56
wpwrakroh: two known cases so far with the same weird pattern on TP36/PROGRAM_B22:11
wpwrakroh: we haven't analyzed the 2nd one far enough to know if there's also a correlation between NOR.DQ8 and FPGA.PROGRAM_B, though. one problem with this board is also that it is a lot more reluctant to exhibit the problem.22:12
lekernel_no, this is completely unexpected22:14
lekernel_are you sure this comes from the fpga? it might be the reset IC too...22:14
wpwrakroh: it's probably upset that we caught it. presumably, it was hoping for the opportunity to embarrass the VJ at some great festival22:15
lekernel_also, maybe it works this way: PROGRAM_B is (wrongly) pulsed, FPGA deconfigures itself, and reads some memory address that happens to drive DQ8 high22:15
wpwraklekernel_: i've refrained from rework so far. but ... it seems odd for the reset chip. 1) the input voltage is perfectly stable. 2) the voltage jumps between ~1.7 V and 3.3 V. 3) the correlation with DQ8 would make even less sense then. so for now, i don't suspect the reset ic. but when we enter the "remove components" phase, then this would be the first component to go.22:17
wpwraklekernel_: (PROGRAM_B -> DQ8) not impossible, but the timing seems to fit a bit too well. e.g., why would PROGRAM_B drop just when DQ8 drops ?22:19
wpwraklekernel_: btw, can you tell me what signals connect to the balls around AA1 ? i.e., AB1-2, we already know AA2, and Z1-2 ? faster if you just click on them in altium than me visually searching the PDF :)23:39
wpwrakerr, make that Y1-2. there's no Z row, sorry23:41
wpwraki found AB2 = FLASH_D9, Y2 = SDRAM_DQ0, and Y1 = FPGA_VREF23:51
wpwrakah, and AB1 is ground23:53
wpwrakso potential candidates would be NOR.DQ9 = pin 36 and U14/U15 DQ0 = pin 223:56
wpwrakif the supposed damage spreads wider, then we would have USBB on AB3, Y3. there's at least one board with an unexplained USB-B failure. but this is a little thin evidence this far.23:59
--- Tue Aug 23 201100:00

Generated by irclog2html.py 2.9.2 by Marius Gedminas - find it at mg.pov.lt!