#milkymist IRC log for Thursday, 2011-07-28

GitHub9[openwrt-milkymist] larsclausen pushed 4 new commits to master: https://github.com/milkymist/openwrt-milkymist/compare/b0ed756...5a0510500:22
GitHub9[openwrt-milkymist/master] Revert "Clone git repositories with '--depth 1'" - Lars-Peter Clausen00:22
GitHub9[openwrt-milkymist/master] [package] busybox: Properly pass CFLAGS etc. - Lars-Peter Clausen00:22
GitHub9[openwrt-milkymist/master] lm32: Select ext2 by default instead of ramdisk - Lars-Peter Clausen00:22
larscmwalle: not the most recent version, but it did work before your device tree patch00:34
larsckristianpaul: i'm afrai i don't have it anymore00:34
wpwrakdeleting config files is like burning books ;-)00:38
kristianpaulhehe knew this reply, and i must confess i did with porpuse ;-)01:40
qi-botkristianpaul speaking too soon is good :)02:34
wolfspraulkristianpaul: who is writing from the qi-bot console?02:37
wolfspraulah ok. spooky :-)02:38
kristianpaulhehe i was scared for a moment :-)02:38
wolfspraulxiangfu: yes don't do that too much, it may confuse people...02:38
xiangfuwolfspraul, sure. ok.02:39
kristianpaulor identify first ;)02:39
xiangfuthe auto build will start in next 10 hours. after nanonote build finished. then we see if we got some images.02:41
xiangfuthere are some folder name needs make sure. (like bin/milkymist/***) for now I just guess them. after first build I will have the correct name.02:42
GitHub95[autotest-m1] xiangfu pushed 2 new commits to master: https://github.com/milkymist/autotest-m1/compare/7822c25...f2c518203:40
GitHub95[autotest-m1/master] add empty tests_images.c - Xiangfu Liu03:40
GitHub95[autotest-m1/master] cleanup the Makefile - Xiangfu Liu03:40
wpwrak(m1 rework) seems that R157 rework is a bit unreliable06:22
wpwrakaw: did you change those R157 etc. because things didn't work with 4.7 k ? or simply because you measured they were wrong ?06:23
awI made sure all R157 now is 10K. ;-)06:25
awsome of them didn't replace 10K in factory, but now they are 10k yes after my first "Impedance" step.06:26
awfactory missed some R157 though. bad!06:26
wpwrakthese R157 they had missed, did they cause tests to fail ?06:27
awwpwrak, no, so far now. I caught R157 without replaced 10K while my step of "Impedance" stage.06:33
wpwrakgood. it would have been worrisome if that relatively small change had already caused malfunctions06:34
wpwrak(i.e., we never experimentally established the range of permissible values. so any sign of parameter instability would be bad.)06:35
awcurrent those R157 missed boards are independently to failed obviously. that's what i saw data now. ;-)06:35
wpwrakvery good :)06:35
awbut i do had have some boards with d2/d3 dimly lit after finished reflash.06:36
awwell...i keep testing firstly ...no more chats now. ;-)06:37
awsurely any new idea from you, let me know. ;-)06:37
wpwrak(testing) good luck ! may the yield be with you ! :)06:38
GitHub95[extras-m1] yizhangsh pushed 1 new commit to master: https://github.com/milkymist/extras-m1/commit/1b6e25f5fc69f85be1ec54dbc2d7ab5615882b1d06:55
GitHub95[extras-m1/master] removed white background in graphics - Yi Zhang06:55
GitHub128[extras-m1] yizhangsh pushed 1 new commit to master: https://github.com/milkymist/extras-m1/commit/59e8a8208d42beada5ae036c1fea8e23abb6699107:48
GitHub128[extras-m1/master] modified m1 graphic - Yi Zhang07:48
GitHub135[linux-milkymist] larsclausen pushed 2 new commits to master: https://github.com/milkymist/linux-milkymist/compare/e94ece6...28b907d08:36
GitHub135[linux-milkymist/master] lm32: Fix led gpio pin numbers - Lars-Peter Clausen08:36
GitHub135[linux-milkymist/master] Merge remote branch 'lm32/master' into lm32 - Lars-Peter Clausen08:36
awwolfspraul, the file I can upload it as: http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/m1_rc3_test_report.ods10:06
awwolfspraul, but now my Firefox in our wiki can't show the newest file i uploaded, after I restart Firefox, still the same, what else could cause this?10:07
wolfspraulI'll check10:08
Action: kristianpaul noticed the "No vga screen" comment, kinda often10:21
wolfspraulyes, we will track it down10:21
awbe noticed that 0x7A is interesting: 1. d2/d3 OFF after reflashed successfully, but then power up then d2/d3 is dimly lit then cant reflash10:31
aw2. http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/7A-reflash-results10:31
aw3. 3. after couple minutes then can power on/boot up/rendering(Only D3 is ON, no VGA screen)10:32
wolfspraulhmm. I think focus on fully testing all boards first.10:32
awwell..i keep testing10:32
wolfspraulthen we need to start fixing, hopefully a lot more boards can be turned to 100% pass status then10:33
wolfspraulseems something is not right with vga on quite a few boards...10:33
AlarmWhen I run "lm32-rtems4.11-objcopy -Obinary hello hello.bin"10:34
AlarmI have the following error:lm32-rtems4.11-objcopy:hello: File format not recognized10:34
lekernelAlarm, your "hello" file is most probably not OK10:38
lekernelwolfspraul, I'd check the SOT23 gates which are used for buffering the sync signals (and cause signal detection pass/fail on the monitor). there was a failed one already on the MIDI of one board.10:39
lekerneleach batch, new broken components. after IR sensors and beads, now gates...10:40
wolfspraulthat's why you do testing10:40
wolfspraulwhat leaves Taipei is 100% working10:40
wolfspraulor the test routine is not good enough yet :-)10:41
wolfspraulanyway, one by one. first test all boards. looks messy now, but it will clear up eventually :-)10:42
wolfspraulit has to...10:42
wolfspraullekernel: btw, you cannot just say "broken components", the reality is in many cases you don't know what happened.10:42
wolfspraulbut it doesn't matter as long as our testing is rock solid10:42
lekernelIR and beads definitely were broken10:42
wolfspraulyou mean the 1.40 USD beads?10:43
lekernelyes, and the 6 IR sensors on the run1 boards. none of them worked.10:43
wolfspraulok probably we mean different things with 'broken components'10:43
wolfspraulthose beads were most likely not 'broken'10:44
lekernelfor me, more than one order of magnitude out of specs qualifies as "broken"10:44
wolfsprauland if all 6 IR sensors on the first run of 6 boards don't work, that's a quite strong indication that they are not all six 'broken' either (in my use of the word)10:44
wolfspraulthey are the wrong ones maybe10:45
wolfspraulit's just different meanings of 'broken'10:45
wolfspraulso far we have 9 or 10 100% pass boards, it's going up :-)10:45
lekerneleither way, those boards were assembled with components that did not perform as specified10:46
lekernelomg there are 1557 performances at this belarusian festival10:49
lekernelok, we should print tons of brochures :)10:54
GitHub142[milkymist] sbourdeauducq created eack (+1 new commit): https://github.com/milkymist/milkymist/compare/e6356d1^...e6356d112:10
GitHub142[milkymist/eack] TMU: early ack - Sebastien Bourdeauducq12:10
AlarmHere it is my "Hello world" appears on the console on my PC but not on the screen connected to the M113:10
lekernelthere's no screen console with rtems13:10
Alarmso this is normal always goes well!13:13
awi tried to use BEN's original usb cable (the shorter one) instead of current longer Fukang upward 90 degree USB (for jtag/serial) 1.5M. to flash failure ones on d2/d3 dimly lit after finished reflashed. THEN NO d2/d3 dimly lit.13:23
wolfspraulinteresting discovery!13:24
wolfspraulall sorts of things you see when you work with a lot of boards in sequence, no? :-)13:24
wolfspraulaw: in your milkymist reflash script, there is a line "frequency 6000000" somewhere13:25
wolfsprauldo you see that?13:25
awyes, i saw it. moment13:26
wolfspraulcan you try to reduce that to a lower value, and still use the longer 1.5m cable?13:26
wolfspraulwhich value should we try?13:26
awsee http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/48-reflash-results13:26
wolfspraulmaybe try 3000000 first13:26
wolfspraulis the problem very reproducible with the longer USB cable (on those boards)?13:26
awwait...let me see script13:27
wolfspraulI would prefer if you don't switch to the shorter cable now, at least not yet.13:27
wolfspraulthe reason is that we include the 1.5m cable in the box, and we are just hiding the problem from our eyes, and pushing it to our users.13:27
wolfspraullet's try lower values and see what happens13:28
awalright...let me change the script to 3000000 and still use longer 1.5M cable.13:29
wolfspraulcan you reproduce the problem well?13:29
awnot sure...but i can only take those failures ones to reflash them again. let's me try 0x48 again. then try the old failure one. ;-)13:32
wolfspraulyou can also go lower, 1000000, even less, 50000013:32
wolfspraulit may become very slow though ;-)13:32
wolfspraulbut we need to find out what is a safe value with the cable we include in the box13:33
wolfspraulotherwise our users will run into this and suffer much worse than we suffer now in finding a safe value13:33
wolfspraulI'm not even sure this is the right idea with the frequency setting, but maybe it is...13:34
aw0x48 became d2/d3 is fully OFF when power on, well..i am now going to reflash @3000000 to see it if will reproduce it again. maybe not, don't know.13:35
lekerneldon't bother with that... those JTAG adapters will and should remain mostly unused anyway13:36
lekernelthey're just a _developer_ thing, and developers should be able to handle JTAG frequency problems13:36
wolfspraulno not good. I like to understand what I sell, and to know that it works and how it works.13:37
lekerneljust get those boards flashed and working in as little time as possible13:37
wolfspraulmaybe we should default to a lower frequency value in the reflash script we publish, and then developers who understand things well can increase that value13:38
lekernelno because it's additional delays on us to determine that frequency13:38
wolfspraulargh :-)13:39
wolfspraulyou are quite insisting sometimes to cause you a big headache later :-) I don't want to support devs who run into this type of problem first time they fiddle with jtag...13:39
wolfspraultook me 2 hours already and some additional grey hair to narrowly avoid that rejon had no usable m1 at all for his talk...13:40
wolfspraulso if Adam tells me he has a _workaround_ for himself, that's not good enough for me as a manufacturer13:40
lekernelthis should not happen with the current software13:40
wolfspraulyou mean the web update?13:40
lekernelthere is 1) web update for the main images 2) rescue mode in case of problems13:41
wolfspraulso - I always see the positive side. I think Adam's discovery is great, very good observation!13:41
lekernelall the rest is developer (1/1000 users) and unsupported13:41
wolfspraulat least we have a bar, like a test, developers have to pass :-)13:42
awdone...seems 0x48 is good by 1.5M with 3000000 Hz, btw, from now on, let's use this to reflash rest to see if easily happens d2/d3 dimly lit after reflash. ;-)13:42
wolfspraulaw: ok, let's do this13:42
lekernelthe JTAG cable is only for _FPGA_ development. you can use netboot for all software.13:42
wolfspraulplease continue to use the cable we will include13:42
lekernelif you can't fix a JTAG connection, you probably can't program FPGAs either13:42
wolfspraullower the value to 3000000 for now and let's see whether you run into more cases13:42
wolfspraulif this looks more stable to you, we will change the value in the published script13:43
wolfspraulI rather err on the side of robustness, especially out-of-the-box.13:43
lekernelalso, this lower value makes flashing boards slower on our side. if they can be flashed correctly at 6MHz with another USB cable, just do it.13:44
wolfspraulbut that's a separate reason.13:45
lekernelright now, the major blocker in this project is run3 delays (followed shortly by lack of publicity). it'd rather make sense to optimize those rather than track down a rare and developer-only JTAG problem.13:51
wpwrakaw, wolfspraul: (cable testing methodology) first, it would be good to do, say, 10 tests with the long cable and the original value. otherwise you don't even know what failure probability to look for.13:54
wpwrakwolfspraul: (grey hair) have you started to let it grow ? :)13:55
awwpwrak, good reminds, I test now to see results 10 times individually. ;-)13:56
wpwraklekernel: (1557) i was wondering what leet "ISST" was supposed do mean ;-) btw, they're at 1558 now. are you going there ?14:01
wolfspraullekernel: come on. we have to lower the barriers of entry. seriously there is no 'delay' because of this. the biggest delay is that Adam ran into this in the first place because we have been sloppy about taking reflashing issues seriously _before_ the run.14:02
wpwrakroh: on some pictures the S looks a bit strange. does it also look odd in real life ?14:02
rohnot really14:03
wolfspraulit's a minor hickup, and good discovery from Adam14:03
wolfspraullet's get back to professional and fast work now, no worries14:03
wolfspraulI do not want Adam to silently switch to a lab-only workaround, and put a cable into the box that he saw problems with but didn't tell anybody.14:03
wolfspraulour users/developers/whoever _WILL_ run into these issues14:03
rohthe first 2 pix have the protective foil on the squares not removed14:04
wolfspraullekernel: http://yamato.hyte.de/tmp/logotest/14:04
wpwrakwolfspraul: yeah, shipping known to be broken stuff is generally not a good idea. the least you can do is find out if lowering the frequency is a suitable workaround. say, if you have 50% failure at 6 MHz and 0% at 3 MHz, that's a good indication.14:05
wpwrakroh: aah ! that's why :)14:05
wolfspraulwpwrak: in a perfect world, the software would make more checks and automatically fall back to the highest 'safe' frequency. but meanwhile I do need to ship with a robust baseline.14:05
wpwrakroh: it still sticks out a little on SANY0028.jpg, but not as much as on the previous ones14:06
lekernelsince I was against shipping this JTAG stuff in each box, I'm a bit annoyed that it incurs delays now. but well...14:06
wolfspraulroh: looks nice to me. what about size and location?14:06
lekernelexcellent logo though :)14:07
rohwolfspraul: location: i would simply center it14:07
wolfspraullekernel: are you ok with this logo, centered? which size?14:07
lekernelhow is this done? http://yamato.hyte.de/tmp/logotest/SANY0021.jpg14:07
wolfspraullekernel: no you need to look at the later ones14:07
lekernelthe insides are de-polished?14:07
wolfspraulthat one is misleading14:07
wolfspraulno it has the film on it still :-)14:07
lekernelah, that's just the film14:07
wolfspraulif I understand things correctly14:07
wolfspraullook at the last 214:08
lekernelyeah it's perfect :)14:08
wolfspraulroh wants to skip a full surface scan this time14:08
wolfspraulroh: there you go, that's the green-light :-)14:08
wolfspraulthanks a lot, this is great!!!14:08
wolfspraulwonderful that we got it this far...14:08
awhmm...seems that hard to restore from that failures once happened. i just got reflash stops @ 5th time by 1.5M & 3MHz.14:08
wolfspraulmaybe you will also see it with the shorter cable, if you try often enough14:09
awwhen it stops, it will be stayed at "Bitstream length: 148440414:09
lekernelaw, this looks like the libusb problem that Jon and I had14:09
wolfspraulI'm a little worried that we don't have full CRC all the time, as per my last understanding at least.14:09
wpwrakaw: maybe it's not the FTDI data speed but a USB signal integrity issue14:09
wpwraklekernel: or software. always good if you have software to blame ;-)14:10
awhmm..i felt using 0x48 to test 10 times is not good idea though14:10
wolfspraulwait so we all settled on the logo, right?14:10
wolfspraulseems yes :-)14:10
lekernelyeah, logo is perfect14:10
awi should use a good board to test cable14:10
lekernelgo ahead14:10
rohlekernel: needed to rework the stuff i got from jon.. somehow it wasnt squares etc14:10
rohwolfspraul: ok. will hack up a centering rig now ;)14:11
wolfspraulaw: wait, let's not stray away too far now.14:12
wpwrakroh: on SANY0029.jpg, are there still remains of the film in the grooves ? or why do they look rough ?14:12
rohwpwrak: i guess so.14:12
wolfspraulaw: don't do tests with many different frequencies and cables.14:12
wolfspraulnot worth it14:12
aware you sure?14:13
wolfspraulyes. there are too many combinations and it will create little value. we've been there before, and haven't implemented anything more robust yet.14:13
kristianpaulnice logo !!14:13
wolfspraulif you try with 90 boards it will add more harm than good.14:13
wolfspraulaw: before you tried to reflash 0x48 with the shorter cable, how many times did you try with the 1.5m cable?14:14
awtwo times with 1.5m & 6MHz, then just use shorter usb one then no d2/d3 dimly lit, is it obviously clear to realize differences?14:15
awthe shorter usb one I still used 6MHz14:16
wolfspraulI reluctantly force myself to agree with lekernel :-)14:17
wolfspraulaw: that means: 1) use the shorter one @6mhz for all reflashing now14:18
wolfspraul2) we still include the 1.5m cable in the box and hope that we can later fix this issue in software14:18
wolfspraulwhat does everyone think?14:19
wolfspraulcheap Chinese crap manufacturer cutting corners? :-)14:19
kristianpaulyeah, i guess if a developer had issues will join here, and we tell the history :-)14:19
wpwrakwolfspraul: how about doing a proper test but postponing it until a less busy time ? (hoping such a time will come :)14:19
lekernelwpwrak, +114:19
wolfspraulthe problem is that there may be too many actual root causes now, and Adam is in a tough spot with 90 boards around him and he is focusing on manufacturing yield, i.e. producing as many 100% pass boards as possible, in the least amount of time14:20
lekerneland yes, include the cable14:20
kristianpaulwpwrak: after 1.5m usb cables shipped?14:20
wolfspraulwpwrak: yes correct. same idea different wording.14:20
kristianpaulwell as soon as oders come14:20
wolfspraulAdam is not in the right position now with so many boards around him and yield pressure.14:20
wolfspraulI don't want him to get lost in an ocean of cable length & frequency test data now...14:21
kristianpaulyeah, thats messy14:21
wolfspraulaw: did you understand? we all agree now :-)14:21
wolfspraulit's easy: use the short cable for reflashing now, and include the long one in the box later :-)14:21
awwolfspraul, i am reading your all discussions now and thinking.14:21
wpwrakwolfspraul: in general, if you find this sort of issue, you want to understand them. otherwise, you're quickly juggling too many unknowns. but if you have a procedure that always works, even if very different from the regular procedure, then you can defer solving the issue14:21
wpwrakkristianpaul: (after cable shipped) preferably not ;)14:22
wolfspraulwell you've been with rejon, you've seen the issue before...14:22
wpwraki'm not sure what i saw ;-)14:22
wolfspraulsomeone just needs to sit down and spend serious time on it, with the many priorities we have that's not going to happen easily14:22
kristianpaulwhen adam have a little time later, providing infom about libusb version will be nice14:22
wolfspraulso someon has to test with different cables, different frequencies, find the root cause, make the software more robust probably in multiple ways, etc. etc.14:23
wolfspraulbut that's not a good thing for Adam to take on now14:23
wolfspraulnot at all14:23
wpwrakat the moment, it seems that we have three theories: 1) it's data frequency dependent, 2) it's USB signal integrity, 3) it's libusb14:23
kristianpauland you miss the hardware!14:24
kristianpaulwell, at least usb cable it self is OK14:24
kristianpaulnow i think i undertand lekernel love for USB ;)14:24
wpwrakassuming 3) is a clear bug (and not a case of "uh, this random number seems to be luckier than the previous one"), then 3) should be checked first. then, try the long cable at 6 MHz. if the problem persists, try a lower frequency, 3 MHz or maybe even 1 MHz (assuming there are no know timing constraints on the lower end)14:25
lekernel3 is a clear bug14:26
lekernelI always failed to reflash the board correctly with the new libusb, like rejon did14:26
lekernela complete reflash always failed14:26
wpwrakif the long cable still fails at 1 MHz, then it could be either the cable, the PC, or the JTAG board. if the long cable works perfectly at 1 MHz, then you still don't know what exactly is the problem, but you have a very promising work-around.14:27
wpwraklekernel: oh, so it's a regression. that's bad.14:27
kristianpaulor may be the bug is in urjtag..14:28
kristianpaulfor not following last libusb changes :-)14:28
wpwraklekernel: is that libusb 0.1 vs. 1.0 ? or something within each line ?14:28
wolfspraulI don't think it's a cable issue14:28
wolfspraulguess of course14:28
wpwrakwolfspraul: think or fervently hope ? ;-)14:28
lekernelI don't know14:28
wolfspraulso for me Adam can bypass it now, get the boards reflashed with any cable that works, and still throw the 1.5m one into the box...14:29
lekernelI just downgraded both. that problem had used enough of my time already.14:29
wolfsprauljust guess14:29
wolfspraulat some point I agree with lekernel about the importance of focus, so... bypass, throw cable into box, move forward, hope that things will get better over time :-)14:29
wolfspraulalso we need to keep in mind that the USB cable itself comes from a very respected vendor, has already undergone testing by that vendor, etc.14:30
wolfspraulit's not a 'cheap crap' cable sourced at a street corner in Shenzhen14:30
kristianpauland force users to downgrade libs :-)14:30
wpwrakkristianpaul: not a good idea :)14:31
kristianpaulwpwrak: sure not :-)14:31
wpwrakwolfspraul: could be just an issue on the JTAG side. bad impedance match or such. the thing is high-speed, not just full-speed, isn't it ?14:31
wolfspraulhigh-speed yes14:32
wpwrak(JTAG side) i mean the board14:32
wpwrakthen i can offer a 4th parameter: downgrade to full-speed ;-)14:32
wpwrakif you have poor but not hopeless signal integrity at high-speed, going to full-speed is pretty much guaranteed to solve this ;-)14:34
kristianpauloh, dear..14:35
wpwrak(not sure how you'd accomplish the downgrade, though. change a bit in the FTDI's EEPROM ?)14:35
wpwrakkristianpaul: USB is great fun :)14:35
awwell...i continue to test with shorter usb cable & 6MHz. :)14:36
kristianpaulwpwrak: not just USB too many variables here, as why in some boards it worked well and other dont..14:38
wpwrakkristianpaul: and you wouldn't believe what correct USB signals look like when you measure them along the path. USB is designed to take in account reflections to compensate for other transmission effects.14:39
kristianpaulwpwrak: also that dimly lit sounds like leaking power issue for me still14:40
kristianpaulwpwrak: (compensate), smart way to avoid bugs :-) and create more fun as you said :)14:40
wpwrak(too many variables) oh, that's why you make a tree :) think of potential causes, then split your tests such that they tell you something useful. branch at each test.14:41
wpwrak(compensate) oh, the electrical side is perfectly sound. it's just extremely confusing until you understand what's going on :)14:41
wpwrak(usb signal) lemme see if i still have my simulation from the happy ghost chase in HXD8 ...14:42
wpwrak(dimly lit) yeah, don't know what that means. only that adam doesn't seem to like it :)14:42
kristianpaulno body :-|14:43
wpwrakkristianpaul: http://downloads.qi-hardware.com/people/werner/tmp/usb-signal-sim.ps14:51
wpwrakkristianpaul: in real life it actually looks worse14:51
wpwrakkristianpaul: the signal travels from the right to the left. you start with a clean square. at the end you have a bit of overshoot but still good edges. in the middle, you have something a lot scarier ...14:52
wpwrakkristianpaul: in HXD8, we ran into USB stability issues. well, rather, they had already been an old issue in HXD8 when i ran into that project. the hardware folks were quite convinced they had done everything right. so this was presented to me as a software problem.14:54
wpwrakkristianpaul: so i spent a few days sifting though the kernel. i found a couple of small things, but nothing that really looked as if it had enough potential to cause trouble. (the trouble was that ethernet-over-usb would stall after some time, often around 10-30 minutes)14:56
wpwrakkristianpaul: then we thought of examining signal integrity. the problem: where to find the equipment to do this ? well, at FIC, there was one lab where they had a big scope with the USB test software. that was so exclusive that you had to ask for turns. so we got our turn the next day and walked down with our troubled board.14:58
wpwrakkristianpaul: the expert then hooked the board up and showed us the eye diagram (that's a setting where you trigger on both edges of the signal, so you see a pattern that looks like a hexagon)15:00
wpwrakkristianpaul: the eye diagram looked HORRIBLE. not at all like a hexagon. instead, we saw the signal crawl up to a plateau at about half the level, stay there for a bit, then go up some more, etc. basically what you see in the middle of the simulation.15:01
wpwrakkristianpaul: so we said our thanks and went to work on that signal integrity. countless reworks later, we had something like 100 pF of extra capacitance scattered all over the board, the signals looked a bit "cleaner" on the scope ... and the problems were just as bad as before15:03
wpwrakkristianpaul: while the hw team was doing reworks, i went to my office and made this simulation. i was a bit surprised that it also showed the "bad" signal. even though it was supposed to be "perfect". eventually, i realized that we (and the USB expert) had been looking at the wrong end of the cable.15:04
wpwrakkristianpaul: as a little detail, one night, i needed to check something on the scope. alas, i didn't have a good enough instrument at hand. but i remembered we had some really fancy 1 GHz or more beast stored in some forgotten corner. i didn't know which group it belonged to, but hey, who's there to complain at 1 am ? ;-)15:06
wpwrakkristianpaul: so i dragged the thing over and did my things. while playing around, i found that it also had some USB test software installed. turned out that we could have done all the fancy testing at our leisure with that scope, without having to rely on the other lab.15:07
wpwrakkristianpaul: fun fact #2: eventually, our head of EE did a little investigation and found out that this scope (at the value of a decent car) actually belonged to our group ;-)15:09
wpwrakkristianpaul: well, the story continues. i then suggested that we may have a clock instability that may originate from poor power routing or other power contamination. (power went around the CPU in a rather peculiar pattern)15:10
wpwrakkristianpaul: one theory was that some other component may contaminate power. e.g., the GSM modem on the same board. so we tried to remove all other chips, one by one, to see if the problem would stop. that rework was actually amazing. the EE folks removed one BGA after the other, without damaging the board.15:13
wpwrakkristianpaul: alas, by the time when there was little left besides the CPU itself, the USB bug was still alive.15:13
aw_hi i am going to sleep now. let's continue tomorrow. :)15:14
wpwrakkristianpaul: we also added beads all across the power tree, to contain possible sources of contamination, to no avail.15:14
aw_the newest file: http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/m1_rc3_test_report.ods15:14
wpwrakkristianpaul: finally, we started to run out of time. so we put in all our best guesses and hoped for the best, without really being convinced that we had nailed the problem.15:15
kristianpaulwait wait, just reading i was away :)15:15
aw_i made a column to note a shorter usb cable from now on (marked "V" at the most right column)15:15
wpwrakkristianpaul: more by accident, then wandered into the final review meeting of EE. i tought that could be interesting, also because i knew there were some other changes i didn't like, and i was hoping for a chance to kill them.15:16
wpwrakkristianpaul: well, at some point, they discussed some of the power changes and showed that region of the PCB. that was the first time i had a good look at the layout. (they used PADS, so access to all those things was difficult)15:17
wpwrakkristianpaul: there, i noticed something rather strange. four large pads from which two traces meandered towards the CPU, crossing a large set of parallel signals, to vanish in some vias, and supposedly to continue from there.15:19
wpwrakkristianpaul: when i asked what that was, they told me it was the crystal. when i asked where those signals would come out again, they pointed to the opposite side of the chip.15:19
wpwrakkristianpaul: the parallel signals i had seen were data and address lines for the RAM.15:20
wpwrakkristianpaul: so the traces between CPU and crystal went from the crystal, right underneath the RAM lines, then tunneled underneath the CPU to its opposite corner, burrowing all their way to the other side of that 6 (i think) layer board and back again, until they finally reached the CPU.15:22
wpwrakkristianpaul: needless to say, there wasn't much ground around these traces either, not even at the same layer15:22
wpwrakkristianpaul: that was the great moment of revelation ;-) it took a bit of discussion until i had the hw team convinced that we could indeed improve this even without having to do a complete re-layout (which, understandably, everyone was afraid of). but then they went at it with gusto. when the revised board was finally made, the USB instability was gone for good :)15:26
wolfspraulkristianpaul: wpwrak [dimly lit] in conjunction with the reflashing problems that sounds like we write corrupted data and then the s-6 hangs on reconfiguration or shortly thereafter15:29
wolfsprauljust a guess of course but if the problem goes away with better flashing, I'd say that point away from power problems15:29
wpwrakmay just be a separate problem. one being bad flash, the other something with power15:30
kristianpaulwpwrak: remove BGA, nice to watch :)15:31
kristianpaulwpwrak: so how you improved?15:33
wpwrak(bga) that was totally amazing. i expected that a board wouldn't survive more than 1 maybe 2 such changes. also because we didn't have optimal equipment for all this. yet they did this with disdainful ease. one chip after the other, maybe ten of them in total, several of them BGAs. and the board just kept on working.15:34
wpwrak(improved) oh, we moved the crystal traces away from the RAM traces. that was probably the main issue. of course, the whole design regarding the crystal was deeply flawed.15:36
kristianpaulah, i tought move traces was goint to be other big problem, hopefully not then :-)15:37
wpwrak(improved) then we also shortened the traces a little, made sure they had some shielding above and below them, didn't cross any other high-interference signals, put ground around them and around the crystal.15:37
kristianpaulwolfspraul: (hang) yeah that could made sense15:38
wpwrak(improved) so we basically went from three mortal sins (as far as crystal design is concerned) to only one :)15:38
wpwrak(sins) 1) keep traces short. 2) surround them with ground. 3) keep them away from high-speed signals.15:40
wpwrakof course, running them straight under the RAM signals, which are the fastest and busiest in the whole design, was just golden. that's bordering on sabotage :)15:41
wpwrakoh, and i should mention that the layout had been outsourced. so our hw team didn't commit all those sins themselves. but of course, they should have spotted such things on their own.15:42
kristianpaulsabotage, including the forgoten fancy scope :-)15:43
wpwraki should also mention that this was long before adam joined :)15:43
kristianpaul(outsourced), yeah blame the third party! ;)15:43
wpwrak(fancy scope) of course, we had only one probe. the others have somehow "wandered off". the fun thing is that FIC was very strict about inventories, even assigning people personal responsibility for purchases they had handled. (so our secretary was personally responsible for some 100+ kUSD of equipment)15:48
wpwrakso it's rather odd that such a valuable item would just completely fall through the cracks15:49
kristianpaul(spot), well, may be a common sense lack for this kind of design, also that exaplin the outsourcing it self15:49
larschmpf, stupid me, remove mmap support and wonder why nothing works anymore...15:52
wpwrak(spot) c'mon. probably all of them went to university and studied EE (actually, i don't know their biography. that's something wolfgang would know.)15:52
wpwraklarsc: did you replace it with something that fails silently, in a seemingly plausible way ? :)15:53
larscwpwrak: -ENOSYS15:53
kristianpaullarsc: you're talking of milkymsit related stuff? :)15:53
wpwraklarsc: hmm, bad. better return, say malloc(1234);15:53
kristianpaulbtw i noticed you derived milkymist openwrt from some *linaro stuff isnt?15:53
larscwpwrak: it will work once i recompile userspace15:54
larsclibc will use mmap if it is available otherwise mmap215:54
larscand since our mmap is just a wrapper around mmap2 we can drop it15:54
wpwrakah, so it's a migration, not a total removal. now i get it :)15:54
larscremoval of sys_mmap15:56
GitHub86[linux-milkymist] larsclausen pushed 3 new commits to master: https://github.com/milkymist/linux-milkymist/compare/28b907d...889c21415:58
GitHub86[linux-milkymist/master] lm32: Drop sys_mmap support - Lars-Peter Clausen15:58
GitHub86[linux-milkymist/master] lm32: Cleanup show_regs a bit - Lars-Peter Clausen15:58
GitHub86[linux-milkymist/master] lm32: Cleanup signal handling - Lars-Peter Clausen15:58
larscit would be interresting to see if could squeeze lm32 support in less than 1 kloc16:04
wpwraklarsc: well, what's the maximum line length gcc can handle ? :)16:08
larscif we strip all the gpl headers we are actually not far from it16:09
GitHub109[linux-milkymist] larsclausen pushed 3 new commits to master: https://github.com/milkymist/linux-milkymist/compare/889c214...2b719af16:25
GitHub109[linux-milkymist/master] lm32: Do not set USER_DS in flush_thread - Lars-Peter Clausen16:25
GitHub109[linux-milkymist/master] modules: add default loader hook implementations - Jonas Bonn16:25
GitHub109[linux-milkymist/master] lm32: Cleanup module loading - Lars-Peter Clausen16:25
larscand another 100 lines gone16:27
wpwraklet's see how long until you have a two-liner :)16:29
wpwrakor maybe even a one-liner, if you can find a convenient spot in some makefile16:30
kristianpaulhum i wast aware lekernel used twitter to post frequently mm1 related progress17:14
kristianpaulso often17:15
kristianpaulwhat? there is not rss support in twitter anymore?.. :(17:16
lekernelthere is, but they hid it17:20
lekernelcheck my blog/mailing list17:20
kristianpaulhe, spartan3 faster that s6?, just because hold/setup time17:24
kristianpaulnow i wonder a s3  milkymist one?17:32
kristianpaulhum price close to s617:34
kristianpaulwhat? XC3S2000-4FGG456I 40600 LE is 48.7USD and  XC6SLX45-2FGG484C still 39USD17:38
wpwrakat what quantity ?17:40
kristianpaulah, good point17:41
wpwrakbesides, the XC3 seemd to have a few more logic while the XC6 seems to have a bit more RAM. so it's not trivial to compare them. dunno about speed grades.17:42
lekernels3 is slower and smaller17:42
lekerneland older, more expensive and obsolete sooner17:42
lekernelif we ever change the fpga it will be a 7 or altera17:43
kristianpaulsure, i wasnt point you to do it, just intelectual curiosity17:44
rohyay. lasering done.17:46
larscnah, i'll start moving code to the generic section of the kernel ;)18:00
mwallemh either my rework is not working or usb/mouse support is not working in the latest snapshot18:21
mwallemh test tool wokrs18:26
mwallelekernel: btw was the phy changed? i get unexpected phy id 0045 with the test tool18:27
mwallewolfspraul: were there any mac addesses assiged to the rc1 boards?18:34
mwallewolfspraul: found it :)18:36
mwallecool everthing is working :)18:44
mwallelekernel: thx for the wolfson codec :)18:44
mwallewpwrak: so i have the second working rework of the ac97 codec :)18:44
wpwrakmwalle: whee ! congratulations !18:47
kristianpaulkudos indeed, mwalle !18:47
kristianpaulsome aditional comments for those rc2 still not fixed ac97 and may want to do it some day?18:48
mwallewell just remove it with hot air and solder a new one :)18:51
mwallei'll take a picture later18:51
kristianpaulseems i definetelly i need a hot ait station..18:52
lekernelmwalle, the mdio bit banging codes has bugs at time; 0045 = (0022 << 1) | 1 ....20:38
wpwraknice ;-)20:58
mwallelekernel: oh ok, and btw is there sth wrong with bios and vga out? i only get some picture when flickernoise is started21:08
lekernelnope, the BIOS disables video out unless you press the power button long enough or ESC/F8 on the keyboard21:09
mwallehttp://walle.cc/mmone/IMG_1265.JPG http://walle.cc/mmone/IMG_1270.JPG21:09
lekernel(or if it can't boot)21:09
mwallelekernel: ah ok :) so no more spash screen?21:09
lekerneltechnically yes, but it's not that useful21:09
mwallebtw dunno the voltage rating for the capacitors for the codec, mine were rated 6V321:11
lekernelfor the USB resistors, you should be able to stack them21:11
lekernel(ie mount them on top of the existing varistors)21:11
mwallesee second picture ;)21:11
lekernelyup. but you mounted them close to the varistors, not on top21:12
lekernelseems easier to mount them on top, for me at least :)21:12
mwallepushed them together with tweezers21:13
mwallenext thing will be a working ir receiver ;)21:14
mwallebtw i noticed a lot of freezes, after flickernoise right after flickernoise has started (and started a video in patch=21:15
lekernelhave you shorted L19?21:16
mwalleshould i?21:17
mwallelekernel: but will a non working video input freeze the whole board?21:22
lekernelit should not, but in practice I have seen such things. it could be that the video chip sends some broken data to the video input core, which then DMA's crap all over the address space and crashes the board.21:26
lekernelin a perfect world, the video input core should be robust enough not to do that, but ...21:27
mwallei'll short it tomorrow ;)21:27
lekernelit's the big ferrite bead close to the video in chip, it's easy to short except that the ground plane sucks a lot of heat from the iron21:28
lekerneldo you have spare IR receivers?21:28
mwallewould it make sense to supress automatic switch to video patches when no valid input signal is detected?21:28
mwalle(ir) nope21:28
lekernelyeah, that's something that should be done21:29
lekernelalong with caching the compiled patches21:29
lekernelmaybe for flickernoise 1.1 :)21:29
mwallelarsc: cool more generic stuff (modules) :)21:29
larscmwalle: came with the openrisc linux port22:30
GitHub96[linux-milkymist] mwalle pushed 2 new commits to master: https://github.com/milkymist/linux-milkymist/compare/2b719af...8c38f7222:31
GitHub96[linux-milkymist/master] lm32: syntax fixes - Michael Walle22:31
GitHub96[linux-milkymist/master] lm32: redefine sys_mmap to prevent undef reference - Michael Walle22:31
mwallelarsc: please review these two commits22:31
larsclooks good22:33
mwallewhy we undef NR_mmap but not NR_vfork?22:37
larscbecause we define our own vfork function in uclibc22:40
larscbut the generic mmap will use NR_mmap if defined otherwise NR_mmap222:40
larschm, i guess my module cleanup was a bit to abious. missed that one function was using Elf32_Rel and the other Elf32_Rela22:42
larsci've been wondering whether we should treat scall like a normal function call and not save/restore r0-r10. Since for most functions it will be a tail call they won't use the restored regs anyway22:49
--- Fri Jul 29 201100:00

Generated by irclog2html.py 2.9.2 by Marius Gedminas - find it at mg.pov.lt!