#milkymist IRC log for Monday, 2012-07-30

lekernelhttps://www.coursera.org/course/vlsicad08:55
lekernelA modern VLSI chip has a zillion parts -- logic, control, memory, interconnect, etc.  How do we design these complex chips?  Answer: CAD software tools.  Learn how to build these tools in this class.08:56
lekernelwolfspraul :)09:00
wpwrakthinking of how to overcome the LM32's slowness ... if we'd have several lm32 cores, complete with cache and tlb, and ignoring cache coherence for a moment, in M1, room for how many such cores would there be in M1 ?14:15
lekernelsince most software is single-threaded, you won't overcome slowness this way14:18
lekerneland if you have to rewrite software to make it parallel, then you're better off designing proper hardware accelerators instead instead of introducing the CPU overhead14:19
wpwraklekernel: think concurrent but loosely related programs. or different layers working concurrently. there, you can get a speedup.14:24
wpwrakso, how many cores would fit ? 2 ? 4 ? 10 ?14:25
lekernelmaybe 8 or so14:27
wpwrakwow, great.14:27
lekernelperhaps even more, but I'm not sure about the block RAM for the caches14:27
wpwraki think something around 4 may be interesting for a general-purpose workload. one for the kernel, one for the main application, one for background tasks, and one for whatever else comes along.14:29
lekerneldoesn't sound too good... imo the real way out of the CPU slowness is ASIC14:31
wpwrakmay need a bit of kernel tuning because the kernel tries to keep related things on the same cpu, assuming cycles are cheap but memory accesses (i.e., moving data accessed by one core to another) aren't. in our case, it's almost the opposite.14:31
wpwrakif you have the money ... ;-)14:31
lekernelwell, aerospace institutes do. they're paying a lot of money for eg LEON chips.14:32
wpwraknot that i'd disagree with the technical merit of having the core in a dedicated asic ...14:32
wpwrakdo you have any that would finance such work ?14:33
lekerneland maybe those parts that don't pass space qualification could still be used elsewhere14:33
wpwrakit's not only what they'd pay for the chip, but also what they'd pay to have it developed ...14:33
lekernel(or don't need, since a lot of the radiation hardening stuff is in the package, not the silicon)14:33
lekernelsounds much easier to me to get aerospace funding than anything else for this purpose14:35
wpwrakwell, if you have the contacts ...14:35
lekernelyou only need a few. not 11k.14:36
lekernelotherwise you're running rat races like http://www.kickstarter.com/projects/joylabs/makey-makey-an-invention-kit-for-everyone vs. http://www.kickstarter.com/projects/1091976372/open-source-5-axis-cnc-router-and-plasma-machine-p14:36
wpwrakthe numbers for makey makey look rather happy14:38
lekernel...which is my point14:38
wpwrakthe monster cnc machine .. well, consider how many people would even have the room for such a monster :)14:39
lekernelyes, about the same number of people who'd buy a free CPU instead of a $35 rasperry pi or similar piece of crap14:39
wpwrakwell, but with your aerospace contacts, you'd probably not go to kickstarter anyway14:40
wpwrakand the rpi will be a victim of its own success anyway. i wouldn't worry too much about them.14:41
kristianpaullekernel: cparty, are you giving a talk about overclocking fpgas? :-)14:59
kristianpaulwhat aditional hw besides adding the other lm32 cores to the SoC is required to get SMP?15:06
Fallenouadapt wishbone code maybe to have one more master15:07
kristianpaulah well conbus said upto 8 both master and slaves...15:09
wpwrakyou also need to consider cache coherency. that can be done in hw or in sw, though.15:12
wpwrakof course, doing it in sw can make things slow. and limits the type of tasks you can use it for.15:13
Fallenouand for now lm32 caches are not doing any kind of bus snooping :(15:25
lekernelbe happy that since they are write-through, you only need bus snooping and not relatively complicated protocols like MSI or its variants15:27
wpwrakyeah :)15:27
Fallenouhehe sure15:30
mwallelekernel: i bought an rpi for me, so i'm banned now in this channel? ;)20:06
mwallebut actually, i didnt use it yet, just installed an xbmc distro, took ages and didnt work in the end..20:07
mwallelm32/milkymist/qemu is better to hack on ;)20:08
kristianpaulbetter and still a long/lot to :-)20:34
larscmwalle: I have some issues with the latest qemu. Whenever I send a multi-character key (like the arrow keys) all further keypresses get delayed by the number of extra characters in a multi-character key sequence20:37
larsce.g. press left and then type "Hello" the H will appear when you press the e, the e will appear when you press the l and so on20:38
larscif i press left twice, the H appears when I press the l and so on20:38
Fallenoumwalle: I bought one as well20:47
mwallelarsc: yeah i noticed that too20:48
Fallenouas I told wolfspra1l , I turned it on for 5 minutes, it booted on my TV, and then back in the box ^^20:48
Fallenou"cool it boots", boxed20:48
mwallelarsc: once the input buffer overflows, the ringbuffer will always be N characters 'behind'..20:49
mwallelarsc: bug fixed in my qemu repository22:07
Fallenou:)22:12
mwalleFallenou: how hard is it to add a cache inhibit bit to the tlb?22:31
Fallenoua cache inhibit bit ?22:32
mwallenon-cacheable22:33
mwalleeg, set this bit to bypass the d/icache22:33
Fallenoumaybe a simple way would be to trigger a cache miss when this is is set ?22:34
Fallenouso that it fetches from main memory anyway22:34
Fallenouwhen this bit is set*22:34
wpwrakthat would also make sure the cache is kept in sync22:35
Fallenouyes22:36
Fallenoubut is this happens too often the cache is like useless22:36
Fallenouvery suboptimal22:36
mumptaiand you don't need an ugly bypass22:36
mwallemh but it will replace other entries, yes..22:37
Fallenouand it would just be a single ( && tlb_lookup_bypass) addition to the assign miss = line22:37
mwalleFallenou: there should be already some logic to bypass the cache, theres some define22:38
mwalleCFG_ICACHE_LIMIT22:38
mwalleCFG_DCACHE_LIMIT22:38
mwallehave to go22:42
mwallegn822:42
mumptaigute nacht22:42
Fallenouah yes you're right22:50
Fallenouit's even easier22:50
Fallenouhttps://github.com/milkymist/milkymist/blob/master/cores/lm32/rtl/lm32_load_store_unit.v#L41323:03
Fallenouwishbone is selected if dcache is not, if address < base or address > upper_limit23:03
wpwrakso would we even need a special bit ?23:06
FallenouI don't exactly know why mwalle asked that23:07
Fallenouwhat does he wants to do ?23:07
Fallenouwhat did he have in mind ? :)23:07
Action: Fallenou cannot locate the similar trick for icache though, wonders if icache has limit/base stuff working23:08
Fallenoudatasheet seems to say "yes"23:08
Fallenoubut cannot locate it in the code23:08
wpwrakyou don't need this for icache. i think he's after data access to memory-mapped devices23:08
Fallenouoh, right23:08
Fallenouthen yes it's limit/base stuff to have memory mapped regions non cachable23:09
FallenouIf an instruction cache is used, attempts to fetch instructions from outside of the range of cacheable addresses result in undefined behavior, so only one cached region is supported.23:12
Fallenouhum ok23:12
Fallenouso basically the BASE/LIMIT stuff does not work for icache :)23:12
Fallenougute Nacht :)23:13
wpwrakyou don't need that for icache anyway23:16
Fallenouwell you could want to fetch from wishbone directly in case of DMA containing code :p23:18
Fallenoubut then maybe it's best to just invalidate Icache23:19
--- Tue Jul 31 201200:00

Generated by irclog2html.py 2.9.2 by Marius Gedminas - find it at mg.pov.lt!