#milkymist IRC log for Monday, 2012-07-09

wolfspraulgood morning01:18
wolfspraulthe other day lekernel said fpgas are mostly designed/optimized for synchronous designs01:19
wolfspraulI'm wondering what the underlying technical reasons are? What specifically makes them geared towards synchronous vs. asynchronous designs?01:19
wpwrakperhaps the structure of the clock distribution ?03:16
wpwrake.g., perhaps many things connect to the same clock ? (instead of letting you just use any random signal as clock)03:17
wpwrakFallenou: wow. 1 k TLB entries. that's **!!!*HUGE*!!!**. 1k+1k may be the largest TLB in existence for a uniprocessor design :)03:34
Fallenou(huge TLB) : yes but we don't have hardware page table walker, so tlb miss will be quite expensive (exception raised, then software lookup, TLB refill, and return from exception)06:42
Fallenouso we want to avoid spending all the cpu resources on TLB refilling :)06:42
Fallenouand we have BlockRAM resources, better use them ! ;)06:43
wpwrakwell, if you have block RAM to burn ... ;-)09:09
wpwrakthe TLB size probably doesn't help all that much to improve performance. but let's see ...09:13
qi-botThe MMU firmware build was successful, see images here: http://fidelio.qi-hardware.com/~xiangfu/build-milkymist/milkymist-firmware-mmu-20120709-0910/09:13
Fallenouwpwrak: anyway it's not hard coded, you can easily change the TLB size in the code09:17
Fallenouand all the rest of the code will adapt to it09:17
Fallenouindex width, bit position etc09:17
Fallenouwpwrak: that could be a marketing bullshit pitch line "we have a f*cking huge TLB"09:23
Fallenoubigger than Cortex A1509:23
Fallenoutoo bad our value is not in marketing BS :p09:24
wpwrakuse of the lowest bit of a phys/virt address as TLB selector seems a bit hackish. but perhaps you often need to mix other bits in there anyway ?09:24
wpwrak"The M in M1 is for Monster" ;)09:25
Fallenou(but perhaps you often need to mix other bits in there anyway ?) <= what do you mean ?09:26
Action: Fallenou didn't get that09:26
wpwrakthings like permission bits, when writing a TLB entry09:27
wpwraki suppose they would also be encoded in the address, wouldn't they ?09:28
Fallenouin the page offset09:32
Fallenouyou have 12 free bits09:32
Fallenouerr 11 free bits09:32
Fallenoubecause lowest bit is used to chose the TLB :)09:32
wpwrakso these registers are write-only ?09:33
wpwrakor do you get anything meaningful if you read them ?09:33
wpwraki should actually ask this on the list :)09:34
Fallenouhehe that would allow someone else to benefit from the information09:36
Fallenouyou cannot read back what you have written in TLBCTRL, TLBPADDR and TLBVADDR09:36
Fallenouthe rcsr (read) gives you another kind of information09:37
Fallenoureading tlbpaddr and tlbvaddr gives you address of latest tlb miss09:37
Fallenouin my sample code, I read it in the miss handler09:38
Fallenousm volatile("rcsr %0, dtlbma" : "=r"(vaddr) :: );09:41
Fallenoudtlbma is an alias for tlbpaddr or tlbvaddr I don't remember which one09:41
Fallenouasm volatile("rcsr %0, itlbma" : "=r"(vaddr) :: );09:41
Fallenousame thing here09:41
Fallenouitlbma is an alias09:42
Fallenoutlbctrl does not have anything bind to rcsr yet, maybe we will need to get some information one day, it could be used to retrieve some data09:47
Action: Fallenou will put the README.txt documentation in his github Wiki to update it10:02
Action: Fallenou going to eat10:08
Fallenouwpwrak: thanks for your feed back on the ML :)11:54
FallenouI will detail a little bit later11:54
Fallenoubtw I meant others bits than ([21:12] and 0)11:55
Fallenouand not "other bits than [21:12], and [0]"11:55
Fallenoubut you got the idea :)11:55
Fallenouwpwrak : updated documentation is there : https://github.com/fallen/milkymist-mmu/wiki12:04
wpwrakah yes, parse error on my end ;-)13:02
wpwrakby the way, page 17 on www.latticesemi.com/documents/doc20890x45.pdf says "DataBusError exceptions are imprecise"13:03
wpwrakcould this perhaps be connected to the problems you're experiencing ?13:03
Fallenouhum it's something else13:04
Fallenouit's about when wishbone says "error !"13:04
FallenouI've seen a comment in the code about that13:04
Fallenoulet me find it13:04
wpwrakah, good. so you're not using that mechanism13:05
Fallenouno I'm not13:06
Fallenouit's for unaligned access or something like that13:07
wpwrakhmm. i wonder if this could cause trouble for linux. hopefully only in theory13:09
Fallenouwell, we cannot forbid the user to try to do unaligned access13:14
Fallenouso we have to handle this correctly in the exception handler13:15
Fallenoumaybe another upcoming surprise :)13:15
Fallenouwpwrak: for now I only added two exception vectors : DTLB_MISS and ITLB_MISS13:16
wpwraki guess it all depends on just how much of a mess LM32 can leave behind in such a case13:17
FallenouI don't know yet if I will share those too with the "page fault" (read/write/execute protection stuff)13:17
Fallenouor if I will add exception vector specific to protection fault13:17
wpwrakhow would permission checks work ? e.g., if there's a write to an address that isn't in the TLB. would the TLB fault handler add the entry, return, and then, if there's a permission issue, you'd get another fault ?13:19
GitHub169[migen] sbourdeauducq pushed 1 new commit to master: https://github.com/milkymist/migen/commit/ed27783a5363cd80ad9409aa8298d40bcf8ed41213:19
GitHub169[migen/master] fhdl: arrays (TODO: use correct BV for intermediate signals) - Sebastien Bourdeauducq13:19
wpwrakand i think it would help to keep code paths short if you separate TLB fault and permission exceptions13:20
Fallenouhttp://gattis.github.com/milkshake/ < ahah nice WebGL milkdrop renderer13:20
wpwrakyou also get a pseudo-exception, which is the page fault (i.e., if a page isn't present). that would basically be a continuation of the TLB fault13:21
wpwrakin case we add a page table walker later, it would become an exception on its own and the TLB faults would disappear13:21
Fallenouwpwrak: I would say : 1°) TLB miss exception, TLB is refilled, access is replayed 2°) protection fault because the type of access violates the page right13:21
Fallenouso two exceptions13:21
Fallenouwpwrak: page table walker is really a mess to implement13:22
Fallenouit will touch a much broader part of the lm32 source code13:22
wpwraktwo exceptions sounds good. checking permissions in software may be messy13:22
FallenouI am trying to touch as little as possible the source code13:22
wpwrak(page table walker) hmm, dunno. we'll see.13:23
wpwrakbtw, have you thought about the case of having multiple faults in the pipeline ?13:23
Fallenounot yet, sorry :(13:27
wpwrakwell, it's also something that can hopefully wait a bit :)13:35
Fallenouthis week I will have very little time to spend on MMU13:42
Fallenoubut starting next week I will have a looooot more time than since january13:43
Fallenou(no I'm not getting fired :p)13:43
wpwrakhave you convinced your boss to let you work on it at work ? :)13:45
Fallenouhehe no13:46
Fallenouand I don't think it will ever happen :p13:46
Fallenouunless I convince them so integrate lm32 in their next ASIC13:46
wpwrakheh :) now you have an objective13:47
Fallenouwhich they won't since they are using more powerful ARM cores13:47
Fallenouthey are prototyping an asic with a Cortex A9 inside13:52
FallenouI think they won't switch to lm32 ^^13:52
wpwrakhmm yes, may be hard to sell them that idea13:55
FallenouI could convince them, they only have 128 TLB entries ;)13:56
Fallenou"with lm32 you can have 1k and more !"13:56
sh4rm4hmm is an arm cortex that much faster than lm32 on an asic ? afaik you can only get ~500 mhz anyway14:04
Fallenouarm cortex -> 800 MHz14:08
Fallenouand you can get it up to 2 GHz14:08
lekernelclock frequency is easy, just add more pipeline stages14:13
lekerneldoesn't mean software runs faster though ;)14:13
Fallenouhehe sure14:14
wpwrakfor stages -> infinity: fCLK -> infinity && work_done -> 0  ;-)14:15
Fallenoufor stages -> infinity: time_for_instruction_completion -> infinity14:33
wpwrakyup :) also makes debugging much easier. you'll never get an incorrect result :)14:37
Fallenoulekernel: you canceled your presentation at RMLL ?14:39
kristianpaulwolfspraul: basically for me, well because there are "lots" of flip flop, basically thats basic for rtl, adding that to  wpwrak answer15:00
wolfspraulwhy does "lots of ff" favors sync over async?15:04
kristianpaulbecause the clock distribution around then15:06
kristianpauli havent seens in detailt but must flip flip on a s6 LUT have a clock signal right?15:06
wpwraki don't think the FFs per se make a difference15:08
wpwrakbut if you assume that all of yours FFs will use the same clock, it's easier to have a good clock distribution15:09
kristianpaulokay, so there is a clocking distribution in the  s6 swich matrix that allow have like said "main" clocks or such easilly15:09
wolfspraulwe are all just guessing :-)15:10
wpwrakwolfspraul: you're the expert on the internal structures, so you should know :)15:10
kristianpaulyes ! ;-)15:10
wolfspraulit's not just the structures, it's what they mean15:10
lekernelthere are lots of FFs (which are synchronous elements, already) and a limited clock routing15:10
wpwraki.e., do large groups of FFs generally share the same clock ?15:10
wolfspraulwhy are the ff synchronous elements?15:11
lekernelif you have too many clocks, they will use the local interconnect, which will cause a lot of skew15:11
lekernelbecause they have a clock15:11
wolfsprauleach slice has a sync/async flag, what does that mean?15:11
lekernelas opposed to e.g. latches15:11
lekernelthough you can also use some FFs in latch mode15:11
wpwrak(skew) ah, so you could actually go async if you insist. but at a cost.15:11
wolfspraulhalf of them, but I think you then loose the other half :-) (nice punishment)15:12
lekernelin fact, if you can deal with the skew problems, you can probably have nice async circuits15:12
lekernelbut the xilinx toolchain won't do it15:12
lekernelyou need a completely new toolchain if you want to do async15:12
wpwrakslight complication :)15:12
wolfspraulwhat does the sync/async flag in each slice mean?15:12
wpwrakwolfspraul: so that's a feature for version 2 then ;-)15:13
lekernelverilog/vhdl aren't even nice languages for async either15:13
lekernelalso, no one uses async these days... except in very limited portions of designs...15:13
wpwraklost on a maze of little always blocks :)15:13
lekernelyes, exactly15:14
wolfspraulok I got it now - thanks a lot!15:14
lekernelwhat's the per-slice sync/async flag?15:15
lekernelFallenou: yes, can't go for multiple reasons15:16
Fallenoulekernel: too bad :(15:18
qi-botThe firmware build was successful, see images here: http://fidelio.qi-hardware.com/~xiangfu/build-milkymist/milkymist-firmware-20120709-1520/16:50
Fallenoumwalle: activating and disactivating both TLB is what I called "going into kernel/user mode"20:29
Fallenouthe name is badly chosen I agree20:30
FallenouI will change this :)20:30
Fallenoudocumentation is now available there : https://github.com/fallen/milkymist-mmu/wiki/Documentation-of-milkymist-mmu20:31
Fallenouso switching off I/D TLB is using the command "switch to kernel mode" : number 5'h820:32
Fallenoubut since lowest bit is for chosing ITLB or DTLB20:32
Fallenouyou multiply it by two (shift left)20:32
Fallenouso you write 0x10 to TLBCTRL20:32
Fallenoulowest bit 0 => acts on ITLB20:32
Fallenouwritting 0x11 to TLBCTRL => lowest bit 1 => acts on DTLB20:33
Fallenouerr all of this is switching ON the TLBs ... sorry I was wrong in my last email20:34
Fallenouok I'm definitely tired ...20:35
FallenouI repeat correctly, switching OFF T/D TLB is using the command "switch to kernel mode" : number 5'h420:36
Fallenouso you shift left the command ID20:37
Fallenouit becomes 0x820:37
Fallenouso you write 0x8 to TLBCTRL to switch OFF ITLB, and 0x9 (lowest bit set) to switch OFF DTLB.20:37
Fallenouthat's my last word !20:37
qi-botThe firmware build was successful, see images here: http://fidelio.qi-hardware.com/~xiangfu/build-milkymist/milkymist-firmware-20120709-1910/20:46
--- Tue Jul 10 201200:00

Generated by irclog2html.py 2.9.2 by Marius Gedminas - find it at mg.pov.lt!