#milkymist IRC log for Wednesday, 2011-11-02

wpwrakhmm .. "oggenc" ... let's see ...00:02
wpwrakah, but that's audio only :-(00:04
Thihi.ogm is at least the filetype for an ogg movie file, I think00:07
ThihiBut I have no idea about encoders00:07
wpwrakah well, everyone has mplayer to see it anyway :)00:08
wpwrakpatch sent. upload is still running00:14
wpwrak... sync-before.mov is done, sync-after.mov at 71% ...00:17
wpwraklekernel: did you see this ?00:19
wpwrakChain_Control looks a bit suspicious. there's one definition in cpukit/score/include/rtems/score/chain.h and another one in doc/tools/bmenu/chain.h00:19
wpwrakgdb tells me flickernoise uses the latter definition. i find the path name rather scary ...00:20
wpwrakvideo upload complete00:28
errordeveloperkristianpaul: ?03:28
wolfspraulyeah, I also wondered what that URL meant :-)03:29
aw_xiangfu, one question: i took out audio codec chip and used test program to test vga, and it showed well even video source. but when I tried to boot up m1. there's no screen shows on monitor. http://dpaste.com/645906/06:02
aw_xiangfu, what possible reason to let no screens after booted up? Does this reasonable in boot procedure if my audio codec unmounted on m1?06:04
xiangfuaw_, probably because the audio chip not mount. the system stopped when check the audio chip.06:06
aw_xiangfu, after "Unable to open audio mixer: No such device" msg, the d2/d3 is fully ON, i think it's in rendering mode, isn't it?06:06
xiangfuaw_, since there is error while boot. I can not make sure if it already boot to rendering mode.06:10
wpwrakwhen we ran into the audio grounding problem (Lxxx), lekernel mentioned - if i understood this correctly - that the codec provided a clock that was vital for the system06:11
xiangfuaw_, it should be boot to rendering mode.06:11
wpwrakso perhaps you're just hitting a condition where the system depends on it06:11
xiangfuwpwrak, 'that the codec provided a clock that was vital for the system' oh.06:12
wpwraki'm not entirely sure i understood this correctly. but he basically suggested that, if the codec somehow crashed, the whole system would hang (which is what i had indeed observed)06:13
wpwrakand of course, and absent codec can only be worse ;-)06:13
aw_xiangfu wpwrak so that clock is designed from AC97_SOUT then feeding into fpga to identify and depend?06:14
wpwrakno idea which clock it is and what exactly depend on it. but lekernel should wake up soonish ;-)06:15
aw_wpwrak, hmm...this may really explain my condition now. since the codec was mounted before and worked well for couples minutes only last time then a very huge noise occurred then my screen went to frozen and never showed screen on monitor more.06:16
aw_i doubted my reworks on codec previously. now I took out the audio codec and used test program to test vga/video source etc. all pass except audio section.06:18
aw_so seems now i have no choice to go. just go remount codec again... I hope that codec itself is still good. ;-)06:21
xiangfuwpwrak, those 'clock' stuff is under bitstream code right?06:26
wpwrakaw_: heroic rework ;-)06:27
wpwrakxiangfu: i would think so, yes. but then, i don't really know what it is or even whether i understood this correctly06:28
xiangfuaw_, we trust your soldering skill, but don't trust the chip :)06:29
aw_xiangfu, ha...no...  consolation is not bad. but fact is I reworked two m1, reward: 1 done 1 failed, so poor skill though. ;-)06:32
wpwrakprobably not a bad result - you're moving from relatively simple and well-understood changes into increasingly difficult territory06:34
wpwrakalso, the board you're working on may have had other rework in the past. so the potential errors add up ...06:35
wpwrakhmm. setting a conditional breakpoint on _CORE_message_queue_Seize seems to be a bit too much work for this poor system06:37
aw_wpwrak, yes. taking risk to potential err added up already.06:37
wpwrakit's funny. the system does seem to advance. but very very slowly. the queue grows by about ten messages per minute :)06:43
wpwrakhmm, or less :) amazingly slow. but it keeps on growing. just a question of time until it hits 64. and then ....06:53
aw_xiangfu, it acts as rendering mode while detected successfully an audio codec. Your guess was right. Hope this second board can still rendering well for more couple hours after temperature goes up.08:19
wolfspraulaw_: cool, so everything works *right now* on the second upgraded rc2 board as well?08:21
aw_wolfspraul, needs run rendering for more hours, yes they works well after test program for all items. Don't know what exactly happened in my previous work. It was frozen and never showed up in the past. ;-)08:24
wolfspraulsure, I understand08:25
wolfspraulbut that's a good first step08:25
wolfspraulof course let's do more testing now, let it run for 24hours, then wait a day, then again for a few hours, etc.08:25
wolfspraulwe are in no rush with this08:25
aw_I'll append this board into here for records: http://en.qi-hardware.com/wiki/Milkymist_One_run_3_schedule#Upgrade_h.2Fw_RC2_to_RC308:27
wolfspraulok, good08:28
wolfspraulhow are the rc3 reworks going?08:29
xiangfuaw_, great. also thanks to Werner. :-)08:30
aw_still have 14 remaings in rc3. meanwhile I started to gather boards which will go for x-ray.08:31
aw_14 boards: 1) midi 0x46 2) nor 0x55 / 0x67 / 0x6d / 0x6f 3) no boot up 0x32 / 0x70 4) video i2c 0x4d 5) dimly lit 0x3a08:32
aw_xiangfu, oh..yes thanks to Werner too. ;-)08:33
aw_6) short 0x57 / 0x59 / 0x5d / 0x62 / 0x7008:34
aw_fixed one short board with C104/0805 which surrounding D16/R30 area. that must be caused by carelessness while first replaced R30 in factory.08:37
aw_I keep checking short boards now.08:38
wolfspraulok, so 5 completely fixed so far?08:39
wolfspraulremaining boards down to 14?08:39
aw_yes. down to 1408:39
aw_(x-ray condidates) 0x32, 0x3a, 0x46, 0x4d, 0x70.....will gather more I think.08:40
aw_0x32 / 0x70 are the BTN2 (bga ball AA4) with keeping high voltage after power on which must be 0. As a rough guess: the AA4 is nearby the area of D16 and R30. (i.e. at the corner of fpga), so this may completely damage by first heat air in factory already to replace R30.08:45
wolfspraulok, good news still08:51
wolfspraulso the yield is 76/90 now, 14 to analyze08:51
wolfsprauland those 14 are very important as preparation for rc408:51
aw_0x46, midi_rx (ball AB21). 0x4d, videoin_sda (ball AB17) those are abnormal level.08:54
aw_0x32 / 0x70 may also be involved my several reworks fix2/fix2b (potential errors added up in the past)08:58
GitHub100[flickernoise] sbourdeauducq pushed 3 new commits to master: http://git.io/SjK5zQ10:54
GitHub100[flickernoise/master] input.c: synchronize with MIDI status and ignore real-time messages - Werner Almesberger10:54
GitHub100[flickernoise/master] input: remove MIDI timeout - Sebastien Bourdeauducq10:54
GitHub100[flickernoise/master] New X2 patch from Werner - Sebastien Bourdeauducq10:54
GitHub17[flickernoise] sbourdeauducq pushed 2 new commits to stable_1.0: http://git.io/7OYdsQ10:54
GitHub17[flickernoise/stable_1.0] input.c: synchronize with MIDI status and ignore real-time messages - Werner Almesberger10:54
GitHub17[flickernoise/stable_1.0] input: remove MIDI timeout - Sebastien Bourdeauducq10:54
kristianpaulwolfspraul, errordeveloper, nah just vaporware it seems, until i see a dek kit with zynq chip11:08
lekernelI don't think it's really "vaporware"... Xilinx often ships experimental stuff to a few lab-rat companies before it is generally available11:23
wpwrak(midi timeout) ah, interesting ... how long is a "tick" ?12:26
lekernel10ms (iirc)12:33
lekernelwpwrak, btw, if you think you are getting lost bytes because of the interrupts not being serviced fast enough, mwalle has made a new UART interface design that should be a lot friendlier to implement a small hardware FIFO12:33
lekernelit's in soc git head, but there's no RTEMS driver for it yet12:34
wpwrak(timeout) then is probably would only have worked if there's no clock. the clock ticks at 24*bpm, so something like 30-50 Hz. i may actually have observed some slight changes when playing with the clock. interesting :)12:39
wpwrak(UART) great ! that's definitely something worth considering. at the moment, i seem to get very few losses, maybe even none. but a lot more of those hangs :-(12:41
lekernelah, there is no clock with my MIDI keyboard12:41
lekernelmaybe that explains why you got bugs and not me12:41
wpwrakheh, yes, that might be just the trigger12:43
wpwraklekernel: any ideas about the hang ? i've now set a conditional breakpoint on _CORE_message_queue_Seize (for e_message_queue->number_of_pending_messages == 64) and i can watch it crawl to towards its doom, but i still don't have any smoking gun14:22
wpwrakit appears that disaster doesn't necessarily strike the very first time the queue fills up14:23
wpwrakalso, with the conditional breakpoint in place, I didn't get to stop in memcpy. instead, the first evidence of trouble I see is the_message_queue->Pending_messages.last = 0x014:24
kristianpaullekernel: i wrote then just by curiosity and point me to use qemu  instead of a board :)14:31
kristianpaulbut yes, they may have a real board for sure i guess14:31
kristianpauloh http://digilentinc.com/Products/Detail.cfm?NavPath=2,400,836&Prod=ATLYS15:21
kristianpaulha, supported by petalinux ;-)15:24
kristianpaulhum it uses serial flash instead15:43
lekernelwpwrak, not off the top of my head, sorry15:57
lekernelwhat sets the_message_queue->Pending_messages.last to 0?15:57
lekerneliirc you can also use watchpoints15:57
wpwrakwatchpoints ? hmm, let's see. the conditional breakpoints are glacially slow. takes something like 10-30 seconds per queue size increment16:49
lekernelhmm... I don't know how they work. if they result in a lot of traffic exchanged between the PC and the M1 every time the code is executed, that may explain it16:50
lekernelthe serial link is not fast16:50
lekernelbtw - the FT2232H might support 30Mbps there as well. and with a redesign of the FPGA UART, the SoC could support similar speeds too.16:51
kristianpaulor start by moving the uart core from csr to wishbone?16:51
wpwrakwatchpoints would only work usefully if there's hardware support for them16:52
lekernelthere should be hardware support for thel16:52
lekernelI have not tested it, but maybe mwalle did16:52
wpwrakperfect. let's put it to good use then :)16:52
wpwrakoh, btw, did you implement any NULL pointer dereferencing trap ?16:53
lekernelthis would happily land in the flash16:53
wpwrakthat would be a worthwhile feature for catching bugs16:54
lekernelin theory, you could easily generate a bus error on such a condition with something like16:54
wpwrakoh, even in the flash ? wow ;-)16:54
wpwrakyes, a bus error is what i had i ming16:54
wpwrakif it's even NOR address space, then the CPU has no business accessing the beginning of that range anyway (standby bitstream)16:55
lekernelassign wb_err_i = wb_adr_o[31:lower_bit] == <# of bits>'d0 & wb_stb_o & wb_cyc_o;16:55
lekernelright on the CPU buses16:55
lekernelI have never tested bus errors with LM32 though16:55
lekernelI don't know how the current debugger handles them (they are never asserted with the current design)16:56
wpwrakanyone here who got time ? :)16:56
wpwrakhmm. watchpoints kinda pseudo-work :-(17:08
wpwrakthe watchpoint per se seems fine17:08
wpwrakbut the conditional part is weird17:08
wpwrakthe "Backtrace stopped: previous frame inner to this frame (corrupt stack?)" in the backtrace seems to be "normal". at least i get it from very early on. hmm.17:10
wpwrakamazing. there are no less than three instances of chain.inl in RTEMS. two of them overlap in what they define. the third is a set of wrappers for (which ?) one of the others. if i was looking for a design that made broad allowances for letting subtle but nasty errors creep in, that approach would be a good candidate.20:51
lekernelI know you dislike this system and I will easily admit it's far from perfect. but... seriously try running FN under Linux, and you'll see it's a lesser evil :)20:53
wpwrakoh, i hope very much to meet these evils ;-)20:54
wpwrakwhat's irritating with these lists/chains is that they're such a fundamental thing and there are at least two potentially dangerous things in how they're done. of course, i keep telling myself that, given that they're so fundamental, everything must pan out in the end. but still, ...20:56
wpwrakof course, the code says "1989-2006". not that lists would particularly new, but, say, the considerably more elegant solution linux uses for the same problem (not just lists but some internal properties of them as well) may not have been common knowledge back then. (not that i'd expect the solution in linux to originate from linux, of course)20:59
mwallewpwrak: lekernel: yeah conditional watchpoints/breakpoints are handled by gdb (not by the gdbstub)21:43
mwalleand watchpoints are hardware watchpoints, but i dont know if they are switched on the MM1, i remember the comparators were within the critical path and we wont meet timing21:46
wpwrakthe watchpoints seem to work. but the conditional part isn't handled correctly.21:54
wpwrak(or so it seems)21:54
wpwrakinterestingly, i get conditional breakpoints work just fine21:55
wpwraklike this: http://pastebin.com/t6fbcSqa21:56
wpwrakalso tried  watch ...  with  condition ...  which should be equivalent to  break/watch ... if ...   but got the same result21:57
wpwrakit traps all the time in _Chain_Append_unprotected, which is indeed where "last" changes21:57
wpwrakregarding the mixed-up types, at least gdb is confused: http://pastebin.com/Tg3Xqyvk22:02
wpwrakthe struct with first/permanent_null/last is from doc/tools/bmenu/chain.h while gdb locates the sources for the rest from the more plausible cpukit/score/inline/rtems/score/ universe22:03
mwallebtw iirc watchpoints are always two instructions behind22:03
wpwrakat least these structures should be compatible (both by intention and by the way they were compiled), but such things don't exactly inspire confidence ...22:04
mwallewatch or awatch? (or are these cmds equivalent?22:06
wpwrakwatch is for writes, says the manual :)22:06
wpwrakawatch for read/write22:06
mwallelm32 only supports access (read and write)22:06
wpwraki sense potential for some improvement ;-)22:07
mwallewpwrak: forget it, should be fixed within the latest gdbstub, it supports write and read and access22:08
wpwrakwheee ! :)22:09
mwallewpwrak: but have a look at $pc, i guess its two instructions behind the actual sw or lw instruction22:09
mwallei don't know if this has some influence to gdb's conditional logic22:10
wpwrakhm, shouldn't ... after all, i'm giving it a constant address22:14
mwallewpwrak: so you made sure $pc is a sw or lw instruction?22:16
mwallegdb does some weird single stepping after a watchpoint22:17
mwalleiirc ;)22:17
mwallei guessed that some archs break before the actual store/load instruction and some after the instruction was executed22:18
mwallebut gdb is always singlestepping one instruction22:19
mwalleyou may turn on gdbstub debugging to see whats actually going on22:20
mwalleset debug remote on22:20
wpwrakit's a sw22:20
mwallemh ;)22:20
wpwrakone of many. so it may very well be a little off22:21
wpwrakin fact, it probably is22:21
mwallewhats $r1 + 72 ?22:22
mwalleyour watchpoint? :)22:22
wpwrakhmm. i'm not entirely sure about those offsets. the difference seems to large to make sense22:22
wpwraknaw, nowhere in right :)22:23
wpwrakoh, wait. typo22:23
wpwrak$r1+88 is my watchpoint22:23
wpwrakso $pc is correct22:23
mwallei should probably update my gdb ;)22:24
wpwrakin any case, the calculation should be affected by where $pc is only in as much as what instructions have executed since the trap22:24
wpwrakthe calculation of the value does not depend on any local context (except for the symbol table)22:25
mwallewatch if cond.. should set the hw watchpoint, and then gdb checks cond on every exception. to find out whats broken, assuming you want to use conditional watchpoints, a little test binary, which triggers the bug, would be helpful :)22:33
wpwrakwatch <var> if <cond>  is what i tried. it breaks all the time, no matter what the condition evaluates to :-(22:35
mwallewpwrak: so try to enable remote debug and see whats going on22:37
mwallethe packets are described here: http://sourceware.org/gdb/onlinedocs/gdb/Packets.html#Packets22:38
mwalleyou should see the set hw watchpoint packet, then continue, then a signal packet, when the watchpoint has hit and after that there should be some memory read commands where the reply should be interesting22:40
mwallesorry but i have to go to bed now, my alarm wakes me up very early ;)22:42
wpwrakoh dear22:43
mwalleyeah and some register info packets ;)(22:44
wpwrakand http://pastebin.com/5pk8ph6d22:45
mwalleso gdb doesn't read the memory at all?!22:51
wpwrakmaybe this is it ? $m408dfe68,4#06 -> 7fffffff22:52
wpwrakbut i'm not so sure what it thinks it's reading :)22:52
mwallegdb disables the watchpoint and reads 401365dc, 401365e0 and 401365e422:53
mwalledo you have some print statements on break enabled?22:53
mwallemaybe you should try raw memory addresses :)22:55
wpwraklet's see ..22:55
mwallegn8 :)22:55
wpwraknot even .. if *(uint32_t *) 0x408da714 == 0  does the trick :-(22:57
wpwrakkewl. now i killed it so hard gdb doesn't get through anymore23:41
lekernelhmm... I wonder if this could be because the CPU tries to access unmapped bus areas that never get acked23:43
lekernelgenerating a bus error in those cases (I'm not sure if they exist) would solve the problem23:44
wpwraki certainly get a very hung CPU. i suppose with some jtag magic, i could also find out where exactly it hangs :)23:45
wpwraknow, i set a watchpoint on 0x10, since this seems to be a popular "NULL" pointer.and it tripped in rtems_message_queue_send: http://pastebin.com/t1zHBWwM23:48
--- Thu Nov 3 201100:00

Generated by irclog2html.py 2.9.2 by Marius Gedminas - find it at mg.pov.lt!