#milkymist IRC log for Friday, 2011-08-26

wpwrakwolfspraul: good news: M1 arrived ! and it seems to behave :)00:41
wolfspraulof course it behaves00:44
wolfspraulor you want to claim that we ship out untested goods? :-)00:45
wpwrakhehe :)00:46
wpwrakhmm, those bottoms ... i feel a strange urge to just mount a block of aluminium and mill a monolithic one00:47
wolfsprauloh buttons?00:48
wpwraker yes, buttons. wakeup not quite complete yet :)00:49
wolfspraulwakeup? wow00:49
wolfspraulmy morning coffee just ready, one sec... (picking up from stove) :-)00:49
wolfspraulhow can it be both my morning and your morning at the same time? strange :-)00:50
wolfspraulabout test results: Adam finished 37 boards now, all fine00:50
wolfspraulonly 0x4D stepped out of the line00:50
wolfspraulanother 9 to go00:50
wpwrakwolfspraul: with fedex syncing me to day time and some nasty toothache the last two night keeping me from sleeping (fixed today - dentists are amazingly efficient nowadays), my pattern is even crazier than ever :)00:53
wpwrak(boards) nice !00:54
wpwrakroh: basic button shape ~3 mm + 0.7 mm shaft followed by ~1.1 mm disc, thickness about 4.8 mm in total ?00:56
wpwrakroh: shaft diameter ... duh .. 300 mil ? disc diameter 1.2 mm ?00:57
Action: wpwrak looks at a 100 x 150 x 5 mm Al plate from some misguided experiments at thermal distribution01:02
wpwrakwell, fun for later. for now, buttons aren't convenient to have anyway01:24
wpwrakgrmbl. X crash.01:30
wpwraknow .. a 12 V wall wart for the camera. hmm ...01:39
kristianpaulsame from wrt should work..01:45
wpwrakhmm, no wrt supply at hand01:46
wpwrakmust be hiding01:47
wpwraki'll just try 9 V01:47
wpwrakhmm, video brightness seems to be quite hard to set02:15
wpwrakat least at night. maybe it's better with daylight02:16
kristianpaulalready tried to increase that brifght on with flcikernoise?02:21
wpwrakyeah. but the range in which i get useful images is incredibly narrow02:22
kristianpauland thats ccd, case cmos was not that good02:22
kristianpaulzoom range?02:22
wpwrakzoom ?02:23
wpwrakit's the standard M1 cam. ain't no zoom :)02:23
kristianpauli mean you can focus..02:23
kristianpaulah yeah, i forgot :)02:23
wpwrak(focus) hmm, i can unscrew the lens. but that doesn't look like focus02:25
kristianpaulpersonally i felt confident at no more than 3 meters far from camera02:26
rohwpwrak: it should be 8mm diameter button caps02:30
wpwrak"explosive minds" may look even cooler with the direction reversed. now it's more like imploding :)02:31
rohthe spacer (about 7.9mm diam) should be 0.5mm thick and the inner end cap (12mm diam) should be 1mm thick02:31
wpwrakkristianpaul: oh, i'm about 30-50 cm from the cam :)02:31
rohthe button cap has (should have) the same thickness as the sidewall02:31
wpwrakso the spacer should have a smaller diameter than the cap ?02:32
kristianpaulwpwrak: wow, too near02:32
kristianpaulwpwrak: about to make video chat with lekernel ;)?02:33
wpwrakspace constraints :)02:33
wpwrakshort cable, limited length of arms, and unwillingness to raise to touch anything] :)02:35
kristianpaul(short cable) oh, well i bought a 3m cable ;-D02:37
kristianpaulalso a black cloth :)02:37
rohwpwrak: that difference is only to make it easier to glue without it standing over and hindering it from sliding in completely02:39
wpwrakroh: aah, i see. nice.02:42
wpwrakkristianpaul: (cloth) to hide behind ? :)02:42
kristianpaulno, i wanted to see if quality around video in effects improv02:43
wpwrakand, did it ?02:52
kristianpaulnot for my own like03:00
kristianpaulbut i hold some comments to avoid recall the topic of this channel :)03:01
wpwraklekernel: idea for future improvement: if no local display of some sort is added, maybe have a LED next to each input. turn it on if the patch is using that channel (video, audio, etc.). blink it if the patch is using the channel but the signal doesn't look right (e.g., no sync, too much black / too much white, etc.)03:03
wpwraklekernel: regarding recompiling patches, does it actually need to do this for each setup change ? i.e., do the patches depend on setup items ? if not, you could just have a flag in RAM which patches you've already compiled. should be much easier to implement than a persistent cache that survives power cycling.03:05
wpwrakoh, and if you go multi-core, you could just compile patches in the background, while rendering ;-))03:05
stekernwpwrak: the problem with the flag approach is; how do you know that the patch haven't changed?03:33
wpwrakstekern: clear the flag when you overwrite/edit a patch03:35
stekernof course the 'flag' could be some crc/hash of the source, that might solve it03:35
wpwrakyes, or do a hash if you want to get fancy :)03:36
stekernwpwrak: yes, but what if the patch have been modified externally03:36
stekern(admittedly, I am not to familiar with how things work, is it possible that it would be externally modified?)03:37
wpwraki don't think without you noticing. i.e., you'd still have to transfer it.03:39
stekernwell, in that case, the flag approach might work03:44
stekernif cpu time need for calculating hash vs compiling patch is about the same, then there's no point with that03:47
wpwrakyeah. no idea how they compare03:48
stekernme neither ;)03:48
wpwrakjust noticed that the M1 spends quite a bit of time compiling patches, even if all i do is go to the camera settings03:49
stekernI'm in larval stage, at the point where I've got the toolchain compiled and tested to run flickernoise in qemu03:49
kristianpaulthats good !04:05
kristianpaulsince yday i started to try port the debian memtester package, it got it to compile, but after dirty comment mmu related code..04:05
kristianpaulalso some posix functions that rtems dislked (mlock and related..)04:06
kristianpauli dint tested yet, i still need to harcode some memory lenghts..04:07
kristianpaulmay be you can take a look to the code, i really hackishm now... but it compiles ! ;)04:08
wpwrakkristianpaul: hmm, porting a memtester that tries to defeat virtual memory to a MMU-less system, and having to defeat the VM-dependent feature the program uses, somehow sounds wrong to me ;-)04:11
kristianpaulwpwrak: what you suguest for a memtest/stres test?04:12
rohsomething simple and small which runs completely from sram and tests the complete dram?04:13
rohoutput/input via serial04:13
kristianpaulalso acording to changelog mmap was for adding the feature of testing specific physical regions of memory04:13
kristianpaulgood point roh04:13
kristianpaulrun from sram04:14
kristianpaulthis is the code i found http://pyropus.ca/software/memtester/04:14
kristianpaulmy main concern about dram in M1 is posible corruption, as is just a *guess* as i never undertood well the DMA problem with first minimac core04:17
stekernkristianpaul: as roh said, start out with something simple, like just writing all '0's and all '1's and see if they read back ok, do walking '0'/'1's and see if they read back ok04:37
stekernif those simple tests passes, then you can start looking into more complex algorithms04:41
stekernif they don't pass, you might have saved yourself some trouble :)04:41
kristianpaulgood plan ;)04:41
stekern(but perhaps got yourself into the trouble figuring out why they don't pass)04:42
lekernelI think hashing a patch will be much faster than compiling it09:02
lekerneland yes patches can be modified externally, via FTP, shell, file manager, etc.09:03
lekernelafaik there's no "file modified" notification API in RTEMS like there is in Linux, and given how badly the RTEMS filesystem is designed I'd rather not touch it09:03
lekernelkristianpaul, I have done tons of SDRAM tests, check the archives09:05
wolfspraulwpwrak: you up?09:12
lekernelwolfspraul, hi09:13
lekernelany prospect regarding when the first boards are shipped?09:13
lekernelyes, there seems to be fully working ones. but how about packaging them and selling them?09:14
lekernelsorry to be insistent, but so many things are depending on that...09:17
wolfspraulit doesn't worry you that 2 boards that worked perfectly stopped working after a little bit of rendering?09:17
wolfspraulit's your brand. you think we can ship products that are known to fail after a few times rendering?09:18
wolfspraulhere's the plan: Adam is currently dumping the nor of those two09:19
wolfsprauloverall the test results look really good now09:19
wolfspraulbut I would like to have at least a theory for what happened on those 2 boards09:19
wolfspraulcan you rule out that the bad reset ic we chose causes nor corruption on power down?09:19
lekernelmaybe it's just the same thing that happened to the video chips on the RC1 boards Adam reworked and sent me09:20
wolfspraulI have an idea09:21
wolfspraulwhy don't we just erase all trace of those two boards, 0x4C and 0x7D, from the production and testing plans, and sell the rest as if everything was always perfect?09:21
wolfspraulwhen Adam is here we ask him about the solder he used09:22
wolfspraulhow would that explain a board that first works and then fails?09:22
wolfspraula whisker - where? which chip?09:22
wolfsprauland it shows up after a few render cycles?09:22
wolfspraulare you trying to find a theory that can explain what we find, or are you trying to find a theory that will allow you to sell the remaining boards with a straight face?09:23
wolfspraulso the best would be if we can come up with a quick test to identify boards that will later fail09:23
wolfspraulthe worst would be if we find that the wrong reset ic we have causes nor corruptions09:23
wolfspraulwe can also close our eyes really hard and just sell the stuff even though we cannot produce it at a consistent quality09:24
wolfspraulI think that's suicidal for the Milkymist brand in the long run though.09:24
wolfspraulaw: hey Adam :-)09:24
wolfspraulcongratulations on finishing the reworks of another 47 boards!09:24
lekernelperfectionism is suicidal too, because you can't get anything done in the end09:25
wolfspraulwe have a question for you: which solder are you using for the reworks?09:25
awwpwrak, have you settled down on your board? ;-)09:25
wolfspraullekernel: oh that's why I'm asking you, it's your brand. Please think about the test results carefully.09:25
wolfspraulI am every bit aware that perfect is the enemy of good.09:25
wolfspraulbut boards that fail after successful rendering worry me, that's all.09:25
wolfspraulin the companies I've worked so far (all Western brand companies), something like this would not ship.09:26
wolfspraula Chinese company would long have started shipping, of course09:26
awlead soldering to be used while reworks09:26
wolfspraullekernel: does that settle your whisker theory?09:26
lekernelaw, and what solder did you use for the two video chips you reworked on rc1?09:27
wolfspraulaw: have you dumped some nor partitions from 0x4C and 0x7C ?09:27
lekernelthe ones that failed09:27
awlekernel, the same lead soldering of currently one i used, it's reel. same as while in rc109:28
aw0x4c: http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/bitstream/0x4c-standby1.bit/09:28
wolfspraulaw: seems that Werner is sleeping09:28
rejonyahyah, last night at sharism presents beijing, we projected the milkymist entire time, froze at least 3 times09:28
rejonneeded full reboots09:29
aw0x7c: http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/bitstream/0x7c-standby1.bit/09:29
rejonkind of embarrasing, hope we can make them better09:29
lekernelrejon, do you have L19 shorted?09:29
rejonit was wolfgangs09:29
rejonme wolfgang and xiangfu there09:30
wolfspraulnot shorted I think (forgot)09:30
awxiangfu, can we start to update urjtag now?09:30
xiangfuaw, yes. sure.09:30
awxiangfu, lead me in instructions, tks. ;-)09:30
awrejon, hi ;-)09:31
xiangfuwolfspraul, the m1 do connect one camera. maybe that is the reason (L19 shorted)09:32
xiangfuaw, goto your urjtag.git folder and run 'git pull'09:32
lekernel4c/7c: there is some corruption in those two bitstreams09:33
lekernelare the flash partitions locked?09:33
xiangfulekernel, no. we are update aw's urjtag now.09:33
wolfspraulah I'm just downloading. corruptions, hmm.09:34
awi've never used 'lockflash' while rc3 until now.09:34
wolfspraulok, so xiangfu and adam will try to work out how to lock the rescue partitions09:34
wolfspraullekernel: what is your theory on where those corruptions come from?09:35
wolfspraulone word zeroed09:36
lekernelmaybe wrong power-down ramps, as Werner suggested09:36
wolfspraulwhat is the chance in your opinion that the power-down ramps cause this zero word?09:37
xiangfu4c: is only one bit. but 7c is more.09:37
lekernelif that's the case, locking certainly would help restrict the incidence of the problem to the unlocked partitions, which means the board would always be able to boot in rescue mode09:37
wolfspraulfrom 7 to 0 is 3 bits, no?09:37
wolfspraulit may even make it go away entire if this mostly affects small addresses (IF)09:38
xiangfuoh. sorry. yes 3bits. :(09:38
lekerneland it seems the standby bitstream is affected more often (since when the board fails, it's usually no reconfiguration at all and not other issues) so there's some chance locking would make the problem disappear entirely09:38
wolfspraulI'm willing to accept all sorts of theories and support the product, but I need to do it with a straight face, i.e. after doing my best to understand and mitigate the problem.09:39
wolfspraulok, so let's get the locking done first09:39
wolfspraulcould the writing of zeroes also come from a software bug?09:40
lekernelyes, maybe09:41
wolfspraulthat'd be best for me :-)09:41
lekernelactually the whole flash corruption could come from software bugs09:41
wolfsprauluntil Werner is back and requests some tests, Adam & Xiangfu will get the locking setup09:42
wolfspraulthen we lock the rescue partitions on some boards (let's say 10), and do some render cycling09:42
wolfspraulxiangfu: are you on this with Adam?09:42
wolfspraulalso make a short script that will let adam lock the partitions of an existing board without reflashing...09:43
xiangfuwolfspraul, yes. we are talking09:43
wolfspraulgreat, thanks09:43
xiangfuwolfspraul, (lock-only.sh) sound good.09:43
wolfspraullekernel: I think so far all zero words I've seen are at low addresses09:44
wolfsprauleven within the 640 KB standby bitstream09:44
wolfspraulbut I haven't paid close attention to all cases we had, Werner knows them all09:44
wolfspraul0x7C is not an entire word. offset 0x1EC: from 44 0C -> 40 0009:47
wolfspraulone bit remaining :-)09:47
wolfspraula low offset again09:47
wolfspraulif it's a power ramp down problem, why would only small addresses be affected?09:48
wolfspraulis it easier for a wire to be 0 than to be 1?09:49
lekernel it's also interesting to notice that the two corruption events occurred at different addresses but with very similar content09:53
lekernel hmm, no, actually the whole beginning of the bitstream contains an almost periodic pattern09:53
wolfspraulin the second case one bit remains09:56
wolfspraul7c: offset 0x1EC from 44 0C -> 40 0009:56
wolfspraulwell, all I've seen was at low addresses09:56
wolfspraulso if we are lucky, a locking of the standby bitstream will make the problem go away entirely09:56
wolfspraulalthough if it's power ramp-down cause, who knows maybe the locking will not work? :-)09:56
wolfspraullekernel: if it's a ramp-down problem, is there a theory that suggests that low addresses are more likely to get hit than higher ones?09:57
wolfspraulis it more likely that an address line is 0 than 1?09:57
lekernelthe power ramp down theory is that underpowering the FPGA while the flash is still running causes the FPGA to put out incorrect signals that are interpreted as valid writes by the flash09:57
wolfspraulit sounds very far fetched to me09:58
lekernellocking makes the accepted write sequence a lot more complex09:58
wolfspraulbut I know too little about the signals between fpga and nor and how likely this is to happen09:58
wolfspraulwell ok. we definitely try locking.09:58
wolfsprauleventually luck has to be with us09:58
lekernelthat flash does receive write commands09:59
wolfspraulif it's a software bug, that's also ok09:59
wolfsprauleventually we'll hunt it down, or at least defuse it first with locking etc.09:59
lekernelthe 3.3V supply is correct, so if the flash gets written, then it has received a proper write command09:59
lekernelunless the flash chips are counterfeit/crappy, but you do not think this is true09:59
wolfspraulsounds pretty unlikely to me in an uncontrolled ramp-down09:59
wolfspraulno no09:59
wolfspraulget your mind off of that, that's a mental trap10:00
lekernelwhat can cause write commands are:10:00
lekernel* incorrect signals during power up/down - the reset IC was supposed to prevent that by holding the reset during those events. it does it during power up, but the power down case is less clear as Werner pointed10:01
lekernel* software bugs10:01
lekernel* FPGA configuration system going mad10:01
lekernelby making the accepted write sequence way more complex, locking would probably rid us of the symptoms of any of those problems10:03
wolfspraulif in addition for whatever reason this happens only on low addresses, we are all set10:03
wolfspraulour users will never experience the downside of the bandaid we use to keep the product working -> perfect solution10:04
wolfspraulif it also happens on higher addresses, we may still decide to ship, because the event is rare and will 'only' trigger the need for a web update10:04
wolfspraul(assuming the rescue path and web update actually work, which I assume now)10:05
wolfspraulbasically in 470 render cycles (30 seconds each), we had this happen twice10:05
wolfspraulthe numbers are a little low, but it seems to be in about 1 out of 200 render cycles10:06
wolfspraul[numbers low] I mean our statistical data is limited to really say 1/20010:06
wolfspraulbut something like that10:06
wolfspraulxiangfu: can you also update Adam's flterm to the latest version?10:08
wolfspraullet's just get both flterm and urjtag updated10:08
xiangfuwolfspraul, yes. already done that.10:08
awxiangfu, thanks for your instructions, now my jtag is new10:08
wolfspraulxiangfu: everything updated?10:08
xiangfuwolfspraul, we just done update. now I finish the small lock_only.sh10:08
wolfspraulwow, great10:08
wolfspraulok good10:08
wolfspraulaw: here is what I propose10:08
wolfspraul1. xiangfu writes a little lock_only.sh script that you can use to lock the partitions of already flashed good boards10:09
wolfspraul2. I think we can reflash 0x4C and 0x7C and see whether they boot again10:09
wolfspraul3. we pick 10 boards, 0x4C and 0x7C and 8 others, and run lock_only.sh on them10:09
wolfspraul4. then we do 10 render cycles on those 10 boards10:10
GitHub173[milkymist] sbourdeauducq pushed 1 new commit to master: http://git.io/2_lHRQ10:10
GitHub173[milkymist/master] flterm: add check if c is 0x00 - Xiangfu Liu10:10
xiangfuthanks lekernel10:10
wolfspraulwell, that's only 100 render cycles, so maybe not enough10:10
wolfspraulaw: do you think we should reflash 0x4C and 0x7C ?10:10
wolfsprauluntil Werner is back, I have no reason for any measurements now. just want to reflash them (including locking)10:11
wolfspraulshould we do that?10:11
awwolfspraul, yes, i think before lock flash, we can reflash 0x4c and 0x7c firstly10:11
wolfspraulyes, let's reflash both and see whether they boot to render10:11
wolfspraulfirst step10:11
awBUT, we're doing a no-bigger data base even 10-times power-cycle. my question is:10:12
wolfspraulmaybe we should buy a programmable power supply :-)10:13
wolfspraulthen we still have a problem how to press the middle button automatically10:13
wolfspraulwe don't have this now10:13
awif after this 10 boards with 10 times through lock flash function, say NO err happens, but can we trust us and say this step is safe?10:13
wolfspraulgood question10:13
wolfspraulfrom your tests, it seems we need about 200 cycles for 1 failure10:13
wolfspraulbut let's do step by step, not speculate too much10:14
wolfspraullet's reflash 4C and 7C and see whether they boot10:14
xiangfuaw, let me test first. ..10:14
wolfspraulthen we lock10:14
awi quite don't think that we should pick 10 boards firstly10:14
wolfspraulthen we think :-)10:14
wolfspraulfirst step: reflash 4C and 7C, see whether they boot10:14
awhow about we just use 0x4c and 0x7c to do individually 100-times tests after reflash and lock?10:15
wolfspraulyes, why not. good idea.10:15
awthat's total 200 times10:15
wolfspraulbut let's reflash first and see whether they boot :-)10:15
wolfspraulI have seen too many surprises, don't want to speculate too much.10:15
GitHub130[scripts] xiangfu pushed 2 new commits to master: http://git.io/V6b2WA10:15
GitHub130[scripts/master] compile-lm32-rtems: add clean-rtems for easy re-build rtems - Xiangfu Liu10:15
GitHub130[scripts/master] scripts: lockflash only script file - Xiangfu Liu10:15
awsorry that we do this firstly even if Werner say later we were wrong10:15
wolfspraulthen we just speculate speculate, and then the test results don't come out as expected -> time wasted speculating :-)10:15
wolfspraulwe should move forward10:16
wolfspraulit cannot be so totally wrong :-)10:16
wolfspraulbtw, I am online for about 1h, then I need to go to some club opening to demo m110:16
wolfspraulso if I'm offline later, just fyi10:16
awi meant that missed some good chance to find...well10:16
wolfspraulno I don't think so10:17
wolfspraulreally - no worries10:17
xiangfuaw, BTW: you can put this file 'http://downloads.qi-hardware.com/people/xiangfu/tmp/72-qi-hardware.rules' under your '/etc/udev/rules.d' and change the GROUP to 'adam' then you don't needs 'sudo' on nanonote and milkymist one.10:17
awokay...once xiangfu send me that. I'll do it10:17
wolfspraulI think you can reflash already, 4C and 7C10:17
xiangfuaw, 'wget https://raw.github.com/milkymist/scripts/master/scripts/lockflash_only_m1_rc3.sh'10:17
wolfspraulxiangfu: from now on, Adam should always automatically lock after flashing10:17
wolfspraulso Adam's reflash_m1.sh should have the locking commands enabled by default10:18
xiangfuput this file under your '2011-07-13/for-rc3' same folder of 'reflash_m1.sh'10:18
wolfspraulxiangfu: does Adam's reflash_m1.sh always lock by default now?10:19
GitHub86[scripts] xiangfu pushed 1 new commit to master: http://git.io/Ey0mog10:19
GitHub86[scripts/master] scripts: reflash_m1_rc3.sh bump version and enable lockflash - Xiangfu Liu10:19
xiangfuwolfspraul, not yet.10:19
wolfspraulplease let's enable locking by default10:20
wolfspraulwe move full power to locking now, always lock10:20
xiangfuaw, after you download lockflash_only_m1_rc3.sh, you can update your reflash_m1.sh by download this file: https://raw.github.com/milkymist/scripts/master/scripts/reflash_m1_rc3.sh10:21
xiangfuaw, and overwrite your local version10:21
wolfspraulxiangfu: I even think the reflash_m1.sh original should enable locking by default10:21
wolfspraulit almost becomes part of the m1 design/architecture :-)10:22
wolfspraullocking only the standby and rescue partitions, but that should be enabled by default10:22
awxiangfu, okay10:22
xiangfuwolfspraul, yes. agree.10:22
awxiangfu, the difference between 'lockflash_only_m1_rc3.sh' and 'reflash_m1_rc3.sh' is just one for lock the other is for reflash too?10:25
xiangfuaw, have you update your local version reflash_m1.sh?10:26
awnot yet...change now...my one line cmd is that with log function you gave me before. ;-)10:27
awxiangfu, i.e.: ./reflash_m1_rc3.sh $1 $2 2>&1 | tee -a log/urjtag_$2.log10:28
xiangfuaw, ok10:29
xiangfuyou better delete old reflash_m1.sh . for don't confuse.10:29
wolfspraulxiangfu: why do we have a separate reflash_m1_rc3.sh ? can we have just one m1 reflash script?10:30
xiangfuwolfspraul, no. it's just the name in my repo.10:30
awwolfspraul, no need though10:30
awwolfspraul, sometimes is managed on my site i think...10:31
awxiangfu, btw, i rename log file name as: ./reflash_m1_rc3.sh $1 $2 2>&1 | tee -a log/urjtag_lock_$2.log10:31
awalright..now to reflash/lock those two.10:33
wolfspraulyes good10:34
wolfspraulxiangfu: name? don't understand. well. the name says _rc3 and that is hopefully temporary. there should be only one m1 reflash script.10:35
wolfspraulif we need multiple variants, there should be options (command line parameters)10:35
wolfspraulI didn't even look inside the script, just saying from the name - this will cause confusion, guaranteed.10:36
xiangfuwolfspraul, yes. I know. just don't have time merge them. we have 'snapshots' 'updates' different URL and different way to generate bios.bin file.10:36
wolfspraulso there should be only 1 script10:37
wolfspraulthe script should have a version number right at the beginning in some variable, maybe just the date it was last edited10:37
wolfspraulso when someone has the script locally, they can quickly check whether they have the latest version10:37
xiangfumaybe I can do that this weekend :)10:37
xiangfuwolfspraul, (version) yes. should be already in adam's log file10:37
wpwrakgood morning ! :) catching up and replenishing my caffeine store10:38
wolfspraulwell I'm sure there are reasons for the different scripts, it's all work.10:39
wolfsprauljust remember to fix it at some point (merge) - this will GUARANTEED create confusions10:39
wolfsprauleven among ourselves :-)10:39
wolfspraulyou will see :-)10:39
wolfspraulso if we don't merge them, we pay the price in a different way10:39
wolfspraulbut sort it in with your other priorities, you have overview...10:39
wolfspraulI'm already with the first evening beer :-)10:39
wolfspraulgotta get ready for the club opening...10:39
wolfspraulwpwrak: have you seen any nor corruptions at higher addresses?10:40
wolfspraul(after you caught up...)10:41
aw0x4c reflash and lock okay, 0x7c is not...wait..upload log...10:45
Action: xiangfu after cleanup the reflash_m1.sh will send email to list. I am already lazy on this task :)10:45
wolfspraul"7c is not" - bah10:46
wolfspraulwpwrak: what's your take on the new 4C and 7C findings?10:50
wolfspraulcurious about the log update and why 7C did not reflash...10:50
wolfspraulwe are hoping that locking will safely eliminate this problem10:50
aw0x4c: http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/log/urjtag_lock_4C.log10:51
aw0x7c: http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/log/urjtag_lock_7C.log10:51
wolfsprauloh, unknown stepping10:53
aw0x7c: i was thought my usb cable not contacted well , so i reflash twice. :) but first time seems already stands for.10:53
wolfspraulno no10:53
awunknown stepping again?10:53
awoah..yes, just saw it10:54
awso i need to edit again?: Added "0011<tab>xc6slx45<tab>3" in (/usr/local/share/urjtag/xilinx/xc6slx45/STEPPINGS) file.10:55
wolfspraulit's just the stepping, from the urjtag update probably10:56
wolfspraulwe need to update the 011 stepping into one file, remember?10:56
wolfspraulxiangfu: can you get that patch sent upstream?10:56
wolfspraulwhich file was it again? (searching mmlogs...)10:56
wolfspraulprobably overwritten from the update10:56
wolfspraulyou have to edit /usr/local/share/urjtag/xilinx/xc6slx45/STEPPINGS10:56
wolfspraulxiangfu: we need to get this sent upstream10:56
wolfspraulyes I think so10:56
wolfsprauledit it, then try to flash/lock again10:56
awhm...i see10:56
xiangfuwolfspraul, (patch upsteram) ok.10:57
awgood...reflashing... ;-)10:58
wolfspraulotherwise we need to remember to edit the file, which you see we can easily forget...10:59
wolfspraulso the time is better spent getting this 1-line patch upstream10:59
xiangfulekernel, what is the 'Added "0011<tab>xc6slx45<tab>3" ' do exactly. I am writing a commit log11:03
wolfsprauladded another Spartan-6 stepping11:03
lekerneljust make the latest xilinx silicon stepping recognized11:03
awdone...let's test 100 times of boot to rendering or just reconfiguration only to verify if 'lockflash' really work?11:04
wolfspraulso now both 4C and 7C are reflashed, and they both can render?11:05
wolfspraullet's first boot to render once, so we know they work11:05
wolfspraulif that's the case, yes, I agree. let's do 100 cycles with each.11:05
wolfspraulactually 100 may not be enough, our data suggests more like 200 each. phew.11:05
wpwrak0x4c: http://pastebin.com/9QDs7B4D11:06
wolfspraulsorry about that, we have no automation now!11:06
wolfspraulhow long does this take?11:06
aw7C rendered11:06
wolfspraul90 seconds each test11:06
wolfspraul45 seconds boot, 30 seconds render, some more for the cycling11:06
wpwrak0x7c: http://pastebin.com/w4mCvbRT11:06
wolfspraul90*200=18,000 seconds = 4-5 hours11:06
wpwrakboth have a single-word corruption. so a reflash should fix them.11:07
xiangfuUrJtag 0011<tab>xc6slx45<tab>3 sent out11:07
wolfspraulaw: let's do 100 each first11:07
wolfspraulsorry that we don't have this better automated right now11:07
aw4C rendered11:07
wolfspraulwpwrak: do you have any other ideas? do you agree with the approach to reflash 4c/7C (already done), and then 100 thirty-second render cycles on each?11:08
xiangfuaw, you have to unplug the power cable for reboot right?11:10
awxiangfu, yes11:10
wolfspraulxiangfu: some automation thoughts11:11
xiangfuaw, ok. there is a command can reboot m1 in 'flterm' but anyway we can not use that command in our case11:11
wolfspraulfirst - we are not sure which exact sequence triggers the problem11:11
wolfspraulfor example whether a soft-reboot is enough11:11
wolfspraulso to be safe, we do a cold power cycling right now (unplug dc jack)11:11
wolfspraulsimply because that's how we always tested so far11:11
wpwraka bias towards small numbers if common in real life. so that may not mean much. particularly if it's a sw bug :)11:11
xiangfuwolfspraul, yes.11:12
wolfspraulwe don't really have any comparison data for cutting power at the mains, or for soft reboots11:12
awi think that no way that I have to simulate a real power on and off action. ;-)11:12
wolfspraulaw:  we know too little now11:12
wolfsprauland we just want to start selling :-)11:12
wolfspraulso it's difficult11:12
wolfspraulwe need your help in manual testing, because that's how we tested so far11:12
wolfsprauland we cannot get a better automation understood and setup fast11:12
wolfspraulxiangfu: the next problem is the middle button, which needs to be pressed11:12
wolfspraulin the future we would use programmable power supplies, but they can only simulate certain types of power cycling11:13
wolfspraulthey cannot simulate the user unplugging the DC jack with his hands (potentially even causing effects simply from touching the metal...)11:13
awwolfspraul, ha...sorry that you would misunderstand my last sentence. sorry. i meant that I have to manually power on and off to simulate. ;-)11:13
wolfsprauland we will always run into the middle button press as well11:13
wolfspraulwell, we will try to improve some of those things, but it will take time11:14
awno complain at all. ;-)11:14
wolfspraulwpwrak: ok, so you are good with the 2*100 cycles test?11:14
wpwrak(making the sequence more complex) the unlocking would also be an uncommon code path. so if it's a sw bug of just using the wrong address somewhere, you'd never hit this11:14
wolfspraullet's just see what we get11:16
wolfspraulthen we move from there11:16
wolfspraulI gotta run to the club...11:16
wolfspraulaw: see you tomorrow or Monday. I think we are close :-)11:16
wolfspraulthanks for all the hard work!11:16
wolfspraull8... will read the backlog...11:16
wolfspraulgood luck!11:16
awso agree to test 200 times?11:16
wolfspraulI do11:16
awwpwrak, agreed?11:16
wolfspraulthen you just follow what Werner agrees with too :-)11:17
awhe...okay ;-)  you go firstly. ;-)11:17
wolfspraulplus you will probably need dinner at some time first :-) and it's Friday evening!11:17
wolfspraulwe are close I think11:17
wolfspraulmaybe the locking is the final nail11:17
wolfspraulI certainly hope so11:17
wpwrakreflashing 0x4c and 0x7c sounds good to me. the single-word corruption we've already seen a few times doesn't look related to what happens in 0x3c/0x77. which is good news. it means that no new boards have joined the "something very very wrong but we don't quite know what" cluster.11:17
wolfspraulbut we have to see the real data, what can we do11:17
wolfspraulwpwrak: yes11:18
wolfsprauland the addresses are all small, even in the 640 kb block we look at11:18
wolfspraulso there's a good chance whereever this comes from, it will never hit anything past the standby bitstream11:18
wolfspraulall wishful thinking of course...11:18
awalright...so after dinner. I'll go for test 200 times.11:18
wolfsprauland I will read the backlog later :-)11:18
wolfspraulaw: THANK YOU!11:18
wolfspraulthanks so much for the great energy and passion11:19
wolfspraulalmost there!11:19
awalright...no problems, i ought to.11:19
wpwrakthe single NOR word corruption cluster may be: 1) fixed 100% by locking (unlikely, imho); 2) fixable in the field; 3) not fixable in the field but with a not too hard recovery path, so people can work around the issue; 4) point to a NOR defect (unlikely, imho)11:19
wolfspraulI think locking stands a good chance11:20
wolfspraulunless the problem just bypasses locking entirely11:20
wpwrak(programmable power supply, middle button) i'm on it .. :) http://projects.qi-hardware.com/index.php/p/wernermisc/source/tree/master/labsw/11:20
wolfspraulwhat do you mean with "fixable in the field"?11:20
wolfspraulanyway gotta run11:21
lekernelwpwrak, how can locking not fix the standby bitstream problem?11:21
wpwrakif it's a bad NOR cell (and not a rogue write), it may still lose data later11:27
wpwrakalso, we may hit other addresses, which could still render the M1 unusable (that is, without human intervention)11:27
wpwraki'm thinking of the VJ at club scenario: you plug it in and it doesn't start flickernoise, or comes up with a friendly message telling you to fix your bitstream or whatever. the crowd cheers, the VJ gets nervous :)11:28
lekernelrendering is possible in rescue mode11:29
wpwraknow, if we can properly protect standby and recovery, which i hope and expect we can, it's not insanely difficult to bring the system back to life after such a mishap11:29
wpwrakyou could still lack new FN features, or your patches themselves may get corrupted11:30
wpwrakbut yes, we can make recovery from NOR trouble relatively benign, even if it's not possible to prevent it from occurring in the first place11:31
wpwrakalso, the users could simply be instructed to plan to have a few minutes before the show to deal with any potential NOR problem. plus, don't power cycle during the show.11:32
wpwraknot nice, but it would reduce the impact of the issue further11:32
wpwraknow, for testing what's really going on. i'd suggest to do the current power cycling test at least 1000 times and until the corruption has happened at least 10 times, i.e., whichever comes last.11:33
lekernelcan you name a single technology device those days that has none of such problems?11:34
wpwrakeach time a corruption is found, record the location of the corruption and fix, then continue11:34
lekerneleven those overrated apple macbooks suffer display problems because of poor BGA soldering11:34
lekerneleven with all the money and resources apple has, they failed to fix it in the first place11:35
scrts2power cycling at least 1000 times... :D11:35
wpwrakafter this, do the same test, but with a soft reset. that avoids the power drop. if the corruptions magically go away, we know it's a power up/down issue. if they don't, it's software, FPGA logic, NOR itself, EMI, etc.11:35
scrts2I wonder who wouldn't bother doing this11:35
lekernelso i'm more than willing to accept a little incidence of NOR corruption in unlocked partitions here11:35
wpwrakscrts: you're saying aw will run screaming to the other end of taiwan when he reads this ? ;-)11:36
lekernelof course we should fix it, but we should balance it against the massive delays a perfect solution would cause11:37
wpwraklekernel: some products do in fact much worse, e.g., recently, it was in the news that Intel SSDs are losing data quite predictably. they fixed one path via a firmware upgrade and are still guessing about another one. that much about the power of big corps :)11:37
wpwrakmy hope its that it won't take all that long11:38
wpwrakif it's a general problem, each of us should be able to reproduce it11:38
wpwrakso the question is simply who manages to automate the test first :)11:39
wpwrakbtw, any magic key combination to switch rendering to 1024x768 ?11:39
wpwrakbtw2, it may be cool to have some patch that has a camera reaction in an augmented reality way. e.g., show the camera input; overlay it with white blocks in some area; sample the camera image "behind" these white blocks; if there's a sudden brightness/color change of a large number of pixels, let the block "explode"11:45
wpwrakthat may motivate people to experiment with interactive effects, which i think could be very cool. alas, if you don't show the way in a simple example as the one i've described, it will take much much longer before someone gets motivated enough to try.11:46
scrts2I did not read the problem, but I suppose the device hangs up?11:46
lekernelnah, rendering in 1024x768 is only supported on git head with the demo firmware (not FN)11:57
lekernelit's slow too (~7-12 fps)11:58
lekerneland buggy11:59
wpwraklekernel: (1024x768) :-( any hope to be able to get it to work ? your earlier experiments sounded encouraging12:13
lekernelmaybe by doubling the SDRAM frequency12:14
wpwrakah, and can midi control adjust audio sensitivity and maybe camera brightness ? these two often seem to need some tweaking12:15
wpwrak(double sdram) sounds scary :)12:15
lekernelthere are already Fx keys to adjust camera brightness and contrast12:15
wpwrakoh, cool12:15
lekernel(sdram) yeah, i'll probably feel motivated to do that if/when this project becomes popular12:16
wpwrak(sdram) nice :)12:16
wpwrakbtw, i think a tutorial mode would be nice. the current default of going as quickly as possible into "show" mode doesn't really seem to fit what most people will expect. e.g., first you want to explore, getting all the feedback and guidance you can. only once you're familiar with the system, you'd turn off those things. of course, someone would have to program this ...12:19
wpwrak(at least it's not scary verilog ;-)12:19
wpwrakaw: how did the 100 cycles go ? ;)12:38
wpwrakor was it 200 ? :)12:38
awwpwrak, hi sorry, i just started .;-)12:39
kristianpaullekernel: cross talk?12:40
adamw_0x4c: 10th power-cycle pass12:50
wpwrak"Shift" by Geiss is really cool12:50
adamw_wpwrak, i bought a relay card with Christopher in om to do tons of tests via auto tests with programmable power supply and multimeters (GPIB)12:52
adamw_with that way can verify many things. ;-)12:53
wpwrakadamw_: you still have them ?12:59
wpwrakadamw_: oh, and what multimeter do you have ?13:00
wpwrakheh, conduirebourre ;-) best camera effect, i think13:04
adamw_at that time we used Keithley 2303 and Agilent 34401A, 16 channels relay card through GPIB13:04
wpwrakadamw_: and what do you have now ?13:05
adamw_wpwrak,  now i have 34401A13:05
adamw_no programmable power supply. :(13:05
wpwrakah, okay. do you have GPIB to the PC ?13:05
adamw_need buy one. ;-)13:06
adamw_so you want me to capture NOR corruption as it happens while auto measure current. ;-)13:06
adamw_well...hope we don't do this. then solve, but as a lab site with auto equipments is good. ;-)13:08
wpwraknaw, just thinking ahead13:08
wpwrakyes, automation is good. very good :)13:08
adamw_we probably will go for this auto... ;-)13:08
adamw_i even do think 100 times is not enough. ;-) you know that we can't five up any reasons caused especially that it's not a probability distribution.13:11
wpwrakyeah, my guess would be more like 100013:12
adamw_the single NOR word corruption cluster may be: 1) fixed 100% by locking (unlikely, imho); 2) fixable in the field; 3) not fixable in the field but with a not too hard recovery path, so people can work around the issue; 4) point to a NOR defect (unlikely, imho)13:12
adamw_you just posted those four candidates. ;-)13:12
wpwrakyup. by the way, do you run the CRC check or just see if standby loads ?13:15
adamw_process of boot to rendering with power-cycle13:15
adamw_NO CRC check13:15
adamw_that'd be long period...;-)13:16
wpwrakheh ;-)13:16
adamw_wpwrak, btw, how do you think that boards were failed in CRC test?13:17
adamw_wpwrak, since one board I caught it and re-performed CRC test without power off then just pass, how to explain this?13:18
adamw_that was 0x85: got "flickernoise.fbi(rescue)(CRC)CRC failed(expected aa12a56a, got b0c6b06d)" and "splash.raw CRC failed(expected 978f860c, got 33d3152a)" while using test program 10. keep performing CRC test again, then pass without power-cycle. 11. rendering and CRC test pass13:19
wpwrakhmm, 0x85 sounds like one of those NOR bus problem boards then13:37
wpwrakmay be similar to 0x3c and 0x77. or maybe the NOR bus problems (without the "pulses" on PROGRAM_B) are something else13:37
wpwrakthis is a touch one13:43
wpwraklekernel: your USB stack can't be all that bad - it managed to find the first device (the keyboard) in this little mess: http://pastebin.com/p1ymfXL713:51
lekernelwhat is sad here is you need to go through all that crap just to receive stupid keystrokes13:52
lekerneldie USB, die13:52
wpwraklekernel: alas, it didn't find the mouse. otherwise, this little gem would work 100%: http://blog.brightpointuk.co.uk/riitek-rii-mini-wireless-keyboard-mouse-laser-pointer-combo13:53
lekerneland it wouldn't even tell you the keyboard layout13:53
wpwraklekernel: it'll outlive both of us ;-)13:53
wpwraktelling the keyboard layout would spoil the sense of mystery and adventure ;-)13:53
adamw_0x4c: 100th boot to rendering done: http://downloads.qi-hardware.com/hardware/milkymist_one/production/rc3/test_results/4C-lock-results15:10
adamw_i gotta go and 0x7c will be the next one. cool. ;-)15:11
wpwrakgrr, vanished15:16
wpwrakwould have been nice to get a CRC check at the end15:17
wolfspraulwpwrak: ok, 100 tests on 0x4C succeeded - good sign15:57
wolfsprauluntil I see evidence against it, I am assuming/hoping the locking fixes the bug ;-)15:57
lekernelit rather fixes the the symptom, but that's good enough for now16:10
wpwrakah, that was with locking ?17:12
lekerneler... hopefully17:25
wpwrakyeah ;-)17:26
wpwrakhmm, there seems to be another issue with external connections. connected line in to my stereo (had used the battery-powered kaossilator before). then it stopped responding to audio. even when i connected back to the kaos.19:24
lekernelthis totally sucks19:25
wpwrakpower-cycled. everything okay again. connected stereo again. M1 froze (wouldn't get to the desktop with a mouse click)19:27
lekernelthere's another FB between analog and digital ground, maybe that's the same problem as on the video in19:27
wpwrakpower-cycled. still no reaction to the stereo.19:27
wpwrakhehe ;-)19:27
lekernelFYI, audio chip failure when rendering would freeze the software19:27
wpwrakwent back to kaossilator. audio dead. power-cycling ...19:28
lekernelthose run3 boards are the worst disaster that ever happened in this project19:29
wpwrakone issue that quite clearly exists in M1 is that it combines a lot of different grounds. and you can't quite know at what potential they are.19:29
wpwrakwell, i think it's also seeing more intensive testing now. so it's normal that more critters come out. we turn more stones ;-)19:30
wpwrakaudio back to normal after power-cycling19:30
wpwrakat least it seems i can paralyze audio quite reliably :)19:31
lekerneltry shorting L3 ...19:31
wpwraki'm kinda curious what exactly my stereo sends out there19:32
lekernelthe wm9707 datasheet says avss/dvss voltage should be max +/- 0.3V19:34
lekernelit could easily be exceeded by transients across L3 ...19:34
lekernelyay, smells like even more rework delays19:34
lekernel(and of course, the problem never manifested itself with the lm4550 nor on my wm9707 test board ...)19:35
wpwrakmaybe your signal sources have better/different grounding19:36
wpwraki'm also a little suspicious about DMX. those expensive USB-DMX dongles all seem to have galvanic isolation. that's probably not just because it sounds cool ...19:38
wpwrakand DMX seems a particularly good candidate for potential differences because the devices will be far away from the DJ desk, probably connecting to very different points in the mains wiring19:40
wpwrak(well, that's my layman's suspicion. i didn't know DMX even existed before i saw it in the M1 schematics, so maybe i'm all wrong :)19:40
wpwrakanyway, let's see what's up with the audio19:41
lekernelI haven't had any DMX issue so far, but it seems to be a persistent and inconvenient pattern that all problems happen on other people's boards19:43
lekernelotoh they make expensive DMX isolation devices http://www.fullcompass.com/product/303310.html19:53
lekernelwhich suggests there are also non isolated devices out there19:53
wpwrakgrmbl. all i see of my audio signals is some 100 Hz noise. very weird.19:55
wpwrak(also from the kaossilator. something's clearly wrong with my measurement ... let's try a different cable)19:58
lekernelI'd bet this is the same weird problem there was with the video input...19:58
lekernelit makes sense that it doesn't happen from the battery powered device but happens from the mains powered one19:59
wpwrakdifferent cable seems to work better (or maybe it was setting the probe to X1 - dunno how that output driver works)20:02
wpwrakkaossilator is indeed ~+/- 0.3 V20:02
wpwrakabout 1.3 Vpp on "tape out" on my stereo. and it's not active when playing from line in. that much about the pass through i wanted to try.20:09
wpwrakwhat does the audio chip spec say about 1.3 Vpp ? deadly ? or just clipping ?20:09
wpwrakhmm, well beyond the absolute maximum ratings20:11
wpwrakah, but you have 1:2 divider20:13
lekernelyou have DC?20:13
lekerneli'm talking about digital to analog ground potentials20:13
lekernelacross L320:13
wpwraknaw, shouldn't have DC. in any case, you're blocking DC>20:13
lekernelnot the voltage between the ground of your cable and its signal20:14
wpwrakat the moment i'm checking the audio signal that comes out of my system. if it's acceptable for the M1.20:14
lekernelit's also the voltage between the ground of your cable and the digital ground of the M120:14
lekernelit can develop across L320:14
wpwraklooks good. so ground is the next step.20:14
lekerneland the maximum voltage is +/- 0.3V20:14
wpwrakyes yes, i see that it looks quite like the video20:14
wpwrak1.3 Vpp measures is probably okay. that's with a few mV of noise, and you have a 1:2 divider. so it'll be around 0.6 Vpp20:15
Action: wpwrak wonders what "normal" line in/out levels really are20:17
wpwraki'm runing a scope calibration to get rid off the little DC offset it shows20:19
wpwrakafk for a bit20:19
wpwrakand ~30-60 min more afk fun, and then i'll be back to the M121:22
kristianpaulwhy rc3 worst run? more hardware more issues pop, that is not a hiden secret i guess21:31
wpwraklekernel: you'll like this: the ~1.2 Vpp you have on line in may be insufficient: http://en.wikipedia.org/wiki/Line_level22:15
lekerneljust crank up the volume, this is a totally trivial issue22:17
lekernelkristianpaul, easy to say for you22:18
wpwraklekernel: even more so if we consider that the "normal" level wolfson consider seems to be around only +/- 100 mV, so 0.2 Vpp22:18
wpwraklekernel: no, i mean the voltage the M1 input is designed to handle22:18
wpwraklekernel: the codec does up to 0.6 Vpp (absolute maximum ratings), you have an 1:2 divider, so you get 1.2 Vpp for the input signal22:19
wpwraklekernel: (probably already with distortions, etc., but that may not matter so much)22:19
wpwraklekernel: but it seems that "LINE" levels you may encounter can go up to about 1.8-2.2 Vpp, particularly with "professional" equipment22:21
wpwrakmy sony, with ~1.3 Vpp would be high for a consumer electronics device, but still well below "professional equipment"22:22
wpwrakhmm, new hypothesis: the data sheet is simply wrong ;-)22:42
wpwrakand the absolute maximum rating is in truth AVss-0.3 V to AVdd+0.3 V22:42
wpwrakin which case everything is nice and well22:43
wpwrakadam will be disappointed that we failed to create yet another rework item for him ;-) L3 is still on, though. let's see about it ...22:44
lekernelI was talking about "Difference DVSS to AVSS"22:57
lekernelwhich is also +/- 0.3V22:57
wpwrakah, i see. yes, that doesn't agree with L3.22:58
wpwrakheh, i see L19 also has a history of being made eliminated ;-) (huge solder blob)23:01
wpwrakreworked. works like a charm23:23
wpwrakdoing a few unplug/plug cycles23:25
wpwraksolid as a rock23:26
wpwrakwhat's funny is that the stereo makes noise when i connect the stereo:line-out to m1:line-in. some interesting things must be passing over that ground.23:26
kristianpaulokay never mind my easy comments i regret now23:50
--- Sat Aug 27 201100:00

Generated by irclog2html.py 2.9.2 by Marius Gedminas - find it at mg.pov.lt!