| wolfspraul | good morning | 01:18 |
|---|---|---|
| wolfspraul | the other day lekernel said fpgas are mostly designed/optimized for synchronous designs | 01:19 |
| wolfspraul | I'm wondering what the underlying technical reasons are? What specifically makes them geared towards synchronous vs. asynchronous designs? | 01:19 |
| wpwrak | perhaps the structure of the clock distribution ? | 03:16 |
| wpwrak | e.g., perhaps many things connect to the same clock ? (instead of letting you just use any random signal as clock) | 03:17 |
| wpwrak | Fallenou: wow. 1 k TLB entries. that's **!!!*HUGE*!!!**. 1k+1k may be the largest TLB in existence for a uniprocessor design :) | 03:34 |
| Fallenou | (huge TLB) : yes but we don't have hardware page table walker, so tlb miss will be quite expensive (exception raised, then software lookup, TLB refill, and return from exception) | 06:42 |
| Fallenou | so we want to avoid spending all the cpu resources on TLB refilling :) | 06:42 |
| Fallenou | and we have BlockRAM resources, better use them ! ;) | 06:43 |
| wpwrak | well, if you have block RAM to burn ... ;-) | 09:09 |
| wpwrak | the TLB size probably doesn't help all that much to improve performance. but let's see ... | 09:13 |
| qi-bot | The MMU firmware build was successful, see images here: http://fidelio.qi-hardware.com/~xiangfu/build-milkymist/milkymist-firmware-mmu-20120709-0910/ | 09:13 |
| Fallenou | wpwrak: anyway it's not hard coded, you can easily change the TLB size in the code | 09:17 |
| Fallenou | and all the rest of the code will adapt to it | 09:17 |
| Fallenou | index width, bit position etc | 09:17 |
| wpwrak | great | 09:23 |
| Fallenou | wpwrak: that could be a marketing bullshit pitch line "we have a f*cking huge TLB" | 09:23 |
| Fallenou | bigger than Cortex A15 | 09:23 |
| Fallenou | too bad our value is not in marketing BS :p | 09:24 |
| wpwrak | use of the lowest bit of a phys/virt address as TLB selector seems a bit hackish. but perhaps you often need to mix other bits in there anyway ? | 09:24 |
| wpwrak | "The M in M1 is for Monster" ;) | 09:25 |
| Fallenou | ahah | 09:25 |
| Fallenou | (but perhaps you often need to mix other bits in there anyway ?) <= what do you mean ? | 09:26 |
| Action: Fallenou didn't get that | 09:26 | |
| wpwrak | things like permission bits, when writing a TLB entry | 09:27 |
| wpwrak | i suppose they would also be encoded in the address, wouldn't they ? | 09:28 |
| Fallenou | yes | 09:32 |
| Fallenou | in the page offset | 09:32 |
| Fallenou | you have 12 free bits | 09:32 |
| Fallenou | err 11 free bits | 09:32 |
| Fallenou | because lowest bit is used to chose the TLB :) | 09:32 |
| wpwrak | so these registers are write-only ? | 09:33 |
| wpwrak | or do you get anything meaningful if you read them ? | 09:33 |
| wpwrak | i should actually ask this on the list :) | 09:34 |
| Fallenou | hehe that would allow someone else to benefit from the information | 09:36 |
| Fallenou | you cannot read back what you have written in TLBCTRL, TLBPADDR and TLBVADDR | 09:36 |
| Fallenou | the rcsr (read) gives you another kind of information | 09:37 |
| Fallenou | reading tlbpaddr and tlbvaddr gives you address of latest tlb miss | 09:37 |
| Fallenou | in my sample code, I read it in the miss handler | 09:38 |
| Fallenou | https://github.com/fallen/milkymist-mmu/blob/mmu-bios/software/mmu-bios/tlb_miss_handler.c | 09:41 |
| Fallenou | sm volatile("rcsr %0, dtlbma" : "=r"(vaddr) :: ); | 09:41 |
| Fallenou | dtlbma is an alias for tlbpaddr or tlbvaddr I don't remember which one | 09:41 |
| Fallenou | asm volatile("rcsr %0, itlbma" : "=r"(vaddr) :: ); | 09:41 |
| Fallenou | same thing here | 09:41 |
| Fallenou | itlbma is an alias | 09:42 |
| Fallenou | tlbctrl does not have anything bind to rcsr yet, maybe we will need to get some information one day, it could be used to retrieve some data | 09:47 |
| Action: Fallenou will put the README.txt documentation in his github Wiki to update it | 10:02 | |
| Action: Fallenou going to eat | 10:08 | |
| Fallenou | wpwrak: thanks for your feed back on the ML :) | 11:54 |
| Fallenou | I will detail a little bit later | 11:54 |
| Fallenou | btw I meant others bits than ([21:12] and 0) | 11:55 |
| Fallenou | and not "other bits than [21:12], and [0]" | 11:55 |
| Fallenou | but you got the idea :) | 11:55 |
| Fallenou | wpwrak : updated documentation is there : https://github.com/fallen/milkymist-mmu/wiki | 12:04 |
| wpwrak | ah yes, parse error on my end ;-) | 13:02 |
| wpwrak | by the way, page 17 on www.latticesemi.com/documents/doc20890x45.pdf says "DataBusError exceptions are imprecise" | 13:03 |
| wpwrak | could this perhaps be connected to the problems you're experiencing ? | 13:03 |
| Fallenou | hum it's something else | 13:04 |
| Fallenou | it's about when wishbone says "error !" | 13:04 |
| Fallenou | I've seen a comment in the code about that | 13:04 |
| Fallenou | let me find it | 13:04 |
| wpwrak | ah, good. so you're not using that mechanism | 13:05 |
| Fallenou | no I'm not | 13:06 |
| Fallenou | it's for unaligned access or something like that | 13:07 |
| wpwrak | hmm. i wonder if this could cause trouble for linux. hopefully only in theory | 13:09 |
| Fallenou | well, we cannot forbid the user to try to do unaligned access | 13:14 |
| Fallenou | so we have to handle this correctly in the exception handler | 13:15 |
| Fallenou | unfortunatel | 13:15 |
| Fallenou | +y | 13:15 |
| Fallenou | maybe another upcoming surprise :) | 13:15 |
| Fallenou | wpwrak: for now I only added two exception vectors : DTLB_MISS and ITLB_MISS | 13:16 |
| wpwrak | i guess it all depends on just how much of a mess LM32 can leave behind in such a case | 13:17 |
| Fallenou | I don't know yet if I will share those too with the "page fault" (read/write/execute protection stuff) | 13:17 |
| Fallenou | or if I will add exception vector specific to protection fault | 13:17 |
| wpwrak | how would permission checks work ? e.g., if there's a write to an address that isn't in the TLB. would the TLB fault handler add the entry, return, and then, if there's a permission issue, you'd get another fault ? | 13:19 |
| GitHub169 | [migen] sbourdeauducq pushed 1 new commit to master: https://github.com/milkymist/migen/commit/ed27783a5363cd80ad9409aa8298d40bcf8ed412 | 13:19 |
| GitHub169 | [migen/master] fhdl: arrays (TODO: use correct BV for intermediate signals) - Sebastien Bourdeauducq | 13:19 |
| wpwrak | and i think it would help to keep code paths short if you separate TLB fault and permission exceptions | 13:20 |
| Fallenou | http://gattis.github.com/milkshake/ < ahah nice WebGL milkdrop renderer | 13:20 |
| wpwrak | you also get a pseudo-exception, which is the page fault (i.e., if a page isn't present). that would basically be a continuation of the TLB fault | 13:21 |
| wpwrak | in case we add a page table walker later, it would become an exception on its own and the TLB faults would disappear | 13:21 |
| Fallenou | wpwrak: I would say : 1°) TLB miss exception, TLB is refilled, access is replayed 2°) protection fault because the type of access violates the page right | 13:21 |
| Fallenou | so two exceptions | 13:21 |
| Fallenou | wpwrak: page table walker is really a mess to implement | 13:22 |
| Fallenou | it will touch a much broader part of the lm32 source code | 13:22 |
| wpwrak | two exceptions sounds good. checking permissions in software may be messy | 13:22 |
| Fallenou | I am trying to touch as little as possible the source code | 13:22 |
| wpwrak | (page table walker) hmm, dunno. we'll see. | 13:23 |
| wpwrak | btw, have you thought about the case of having multiple faults in the pipeline ? | 13:23 |
| Fallenou | not yet, sorry :( | 13:27 |
| wpwrak | well, it's also something that can hopefully wait a bit :) | 13:35 |
| Fallenou | this week I will have very little time to spend on MMU | 13:42 |
| Fallenou | but starting next week I will have a looooot more time than since january | 13:43 |
| Fallenou | (no I'm not getting fired :p) | 13:43 |
| wpwrak | have you convinced your boss to let you work on it at work ? :) | 13:45 |
| Fallenou | hehe no | 13:46 |
| Fallenou | and I don't think it will ever happen :p | 13:46 |
| Fallenou | unless I convince them so integrate lm32 in their next ASIC | 13:46 |
| wpwrak | heh :) now you have an objective | 13:47 |
| Fallenou | which they won't since they are using more powerful ARM cores | 13:47 |
| Fallenou | they are prototyping an asic with a Cortex A9 inside | 13:52 |
| Fallenou | I think they won't switch to lm32 ^^ | 13:52 |
| wpwrak | hmm yes, may be hard to sell them that idea | 13:55 |
| Fallenou | I could convince them, they only have 128 TLB entries ;) | 13:56 |
| Fallenou | "with lm32 you can have 1k and more !" | 13:56 |
| sh4rm4 | hmm is an arm cortex that much faster than lm32 on an asic ? afaik you can only get ~500 mhz anyway | 14:04 |
| Fallenou | arm cortex -> 800 MHz | 14:08 |
| Fallenou | and you can get it up to 2 GHz | 14:08 |
| lekernel | clock frequency is easy, just add more pipeline stages | 14:13 |
| lekernel | doesn't mean software runs faster though ;) | 14:13 |
| Fallenou | hehe sure | 14:14 |
| wpwrak | for stages -> infinity: fCLK -> infinity && work_done -> 0 ;-) | 14:15 |
| Fallenou | for stages -> infinity: time_for_instruction_completion -> infinity | 14:33 |
| Fallenou | :p | 14:33 |
| wpwrak | yup :) also makes debugging much easier. you'll never get an incorrect result :) | 14:37 |
| Fallenou | lekernel: you canceled your presentation at RMLL ? | 14:39 |
| kristianpaul | wolfspraul: basically for me, well because there are "lots" of flip flop, basically thats basic for rtl, adding that to wpwrak answer | 15:00 |
| wolfspraul | why does "lots of ff" favors sync over async? | 15:04 |
| wolfspraul | favor | 15:05 |
| kristianpaul | because the clock distribution around then | 15:06 |
| kristianpaul | i havent seens in detailt but must flip flip on a s6 LUT have a clock signal right? | 15:06 |
| wolfspraul | yes | 15:07 |
| wpwrak | i don't think the FFs per se make a difference | 15:08 |
| wpwrak | but if you assume that all of yours FFs will use the same clock, it's easier to have a good clock distribution | 15:09 |
| kristianpaul | okay, so there is a clocking distribution in the s6 swich matrix that allow have like said "main" clocks or such easilly | 15:09 |
| wolfspraul | we are all just guessing :-) | 15:10 |
| kristianpaul | ;) | 15:10 |
| wpwrak | wolfspraul: you're the expert on the internal structures, so you should know :) | 15:10 |
| wolfspraul | nah | 15:10 |
| kristianpaul | yes ! ;-) | 15:10 |
| wolfspraul | it's not just the structures, it's what they mean | 15:10 |
| lekernel | there are lots of FFs (which are synchronous elements, already) and a limited clock routing | 15:10 |
| wpwrak | i.e., do large groups of FFs generally share the same clock ? | 15:10 |
| wolfspraul | why are the ff synchronous elements? | 15:11 |
| lekernel | if you have too many clocks, they will use the local interconnect, which will cause a lot of skew | 15:11 |
| lekernel | because they have a clock | 15:11 |
| kristianpaul | yup | 15:11 |
| wolfspraul | each slice has a sync/async flag, what does that mean? | 15:11 |
| lekernel | as opposed to e.g. latches | 15:11 |
| lekernel | though you can also use some FFs in latch mode | 15:11 |
| wpwrak | (skew) ah, so you could actually go async if you insist. but at a cost. | 15:11 |
| wolfspraul | half of them, but I think you then loose the other half :-) (nice punishment) | 15:12 |
| lekernel | in fact, if you can deal with the skew problems, you can probably have nice async circuits | 15:12 |
| lekernel | but the xilinx toolchain won't do it | 15:12 |
| lekernel | you need a completely new toolchain if you want to do async | 15:12 |
| wpwrak | slight complication :) | 15:12 |
| wolfspraul | what does the sync/async flag in each slice mean? | 15:12 |
| wpwrak | wolfspraul: so that's a feature for version 2 then ;-) | 15:13 |
| lekernel | verilog/vhdl aren't even nice languages for async either | 15:13 |
| lekernel | also, no one uses async these days... except in very limited portions of designs... | 15:13 |
| wpwrak | lost on a maze of little always blocks :) | 15:13 |
| lekernel | yes, exactly | 15:14 |
| wpwrak | s/on/in/ | 15:14 |
| wolfspraul | ok I got it now - thanks a lot! | 15:14 |
| lekernel | what's the per-slice sync/async flag? | 15:15 |
| lekernel | Fallenou: yes, can't go for multiple reasons | 15:16 |
| Fallenou | lekernel: too bad :( | 15:18 |
| qi-bot | The firmware build was successful, see images here: http://fidelio.qi-hardware.com/~xiangfu/build-milkymist/milkymist-firmware-20120709-1520/ | 16:50 |
| Fallenou | mwalle: activating and disactivating both TLB is what I called "going into kernel/user mode" | 20:29 |
| Fallenou | the name is badly chosen I agree | 20:30 |
| Fallenou | I will change this :) | 20:30 |
| Fallenou | documentation is now available there : https://github.com/fallen/milkymist-mmu/wiki/Documentation-of-milkymist-mmu | 20:31 |
| Fallenou | so switching off I/D TLB is using the command "switch to kernel mode" : number 5'h8 | 20:32 |
| Fallenou | but since lowest bit is for chosing ITLB or DTLB | 20:32 |
| Fallenou | you multiply it by two (shift left) | 20:32 |
| Fallenou | so you write 0x10 to TLBCTRL | 20:32 |
| Fallenou | lowest bit 0 => acts on ITLB | 20:32 |
| Fallenou | writting 0x11 to TLBCTRL => lowest bit 1 => acts on DTLB | 20:33 |
| Fallenou | err all of this is switching ON the TLBs ... sorry I was wrong in my last email | 20:34 |
| Fallenou | ok I'm definitely tired ... | 20:35 |
| Fallenou | I repeat correctly, switching OFF T/D TLB is using the command "switch to kernel mode" : number 5'h4 | 20:36 |
| Fallenou | so you shift left the command ID | 20:37 |
| Fallenou | it becomes 0x8 | 20:37 |
| Fallenou | so you write 0x8 to TLBCTRL to switch OFF ITLB, and 0x9 (lowest bit set) to switch OFF DTLB. | 20:37 |
| Fallenou | that's my last word ! | 20:37 |
| qi-bot | The firmware build was successful, see images here: http://fidelio.qi-hardware.com/~xiangfu/build-milkymist/milkymist-firmware-20120709-1910/ | 20:46 |
| --- Tue Jul 10 2012 | 00:00 | |
Generated by irclog2html.py 2.9.2 by Marius Gedminas - find it at mg.pov.lt!