| stekern | I've been playing with running openrisc (mor1kx) on M1 again, this time using milkymist-ng. I've made the stuff available here: https://github.com/skristiansson/milkymist-ng-mor1kx/ if someone might be interested. | 07:03 |
|---|---|---|
| stekern | it was actually pleasantly painless to drop it in to milkymist-ng, most changes are of sw nature | 07:05 |
| lekernel | stekern, cool | 08:39 |
| lekernel | still able to run at 83MHz? | 08:39 |
| lekernel | seems so... :) | 08:40 |
| lekernel | Number of Slice LUTs: 6,766 out of 27,288 24% | 08:43 |
| stekern | yup, it's not very small (yet) | 08:43 |
| stekern | a couple of features can be omitted (like the internal timer, overflow exceptions, add with carry etc) | 08:44 |
| lekernel | that's for the whole SoC - doesn't look bad... | 08:45 |
| lekernel | let me get the precise number for the lm32 version | 08:45 |
| stekern | I got ~1700 slices for mor1kx and ~700 for lm32 I think | 08:45 |
| lekernel | Number of Slice LUTs: 4,700 out of 27,288 17% | 08:47 |
| lekernel | so, 2K LUTs to go ... :p | 08:53 |
| stekern | yep ;) | 08:53 |
| stekern | I'll update the configuration so it's more identical to lm32, but I think lm32 still beats it | 08:54 |
| lekernel | you can try this too | 08:55 |
| lekernel | http://www.eecs.umich.edu/mibench/ | 08:55 |
| stekern | the or1k architecture have a lot of special registers that need quite some space | 08:56 |
| lekernel | can't push them into BRAM? | 08:56 |
| lekernel | if they are seldom used, multi-cycle access may be acceptable | 08:57 |
| stekern | I've been meaning to run those benchmarks, up until now I've only ran coremark and dhrystone | 08:57 |
| lekernel | btw one thing I want to do with migen is automatic virtual BRAM ports using multiplied + phase aligned clocks | 08:58 |
| stekern | yeah, pushing hem into bram is something that could be done to some of them, but the whole address space is a bit annoying (basically they are divided into groups) | 08:58 |
| lekernel | BRAM is fast, several hundred Mhz, while the rest of the fabric is the slowness pig we know | 08:58 |
| lekernel | so you could easily have a 4-port BRAM out of a 2-port BRAM with 2x clock multiplication for many designs | 08:59 |
| stekern | hmm, aren't bram outputs usually slower than register outputs? | 09:27 |
| lekernel | there's some clock-to-output delay, yes | 09:29 |
| lekernel | on slowtan6 it's 2.10ns, or 1.75ns if you enable the output register (ie reads take 2 cycles, pipelined) | 09:31 |
| lekernel | setup/hold are all under 1ns | 09:31 |
| lekernel | and you can clock at 280MHz max | 09:32 |
| lekernel | most designs aren't that fast | 09:32 |
| lekernel | this output register seems pretty useless, if all you get is 0.35ns of extra time at the output at the expense of one more cycle of latency ... | 09:33 |
| lekernel | you certainly get a better deal by registering outside the BRAM | 09:37 |
| lekernel | another crippled S6 feature it seems... | 09:39 |
| stekern | it's only for marketing, "with built-in registers" ;) | 09:40 |
| stekern | of course, if you need the result registered and the output delay isn't a problem, you'd benefit from them, but the use cases sounds a bit restricted, yes | 09:43 |
| Fallenou_ | lekernel: I don't remember the reason why Milkymist(-ng or not) SoC uses a lm32 core configured with 512 B of I and D cache, knowing that caches can go up to 32 kB on lm32 | 13:57 |
| Fallenou_ | keeping the caches below or equal to 4 kB is indeed helping for the MMU part (no cache alias problem) | 14:00 |
| Fallenou_ | in some of your slides there is a graphic about cache hit probability, you seemed to have chosen 32 kB at that time in order to get cache hit 95% of the time | 14:04 |
| Fallenou_ | but I remember you had synthesis issues with big caches as well ... | 14:05 |
| Fallenou_ | on the other hand 512 B seems small, when you know you can go up to 4 kB without risking any cache aliasing (caused by VIPT cache) | 14:05 |
| lekernel | it's not 512 bytes, it's 256*16 bytes | 14:54 |
| lekernel | so 4K | 14:54 |
| lekernel | big caches cause timing problems on slowtan6 | 14:54 |
| Fallenou | right, I mixed up things, I did 256*2 instead of *16 ... (habit of converting 16 bits to 2 bytes ...) | 14:57 |
| Fallenou | ok so 4K, perfect :) | 14:57 |
| lekernel | I might want larger caches when moving to a less slow FPGA... | 14:59 |
| Fallenou | it would be cool to allow software to read cache size then | 14:59 |
| Fallenou | for the OS to adapt and handle cache aliasing issues when they are possible | 15:00 |
| lekernel | send a patch :) | 15:00 |
| Fallenou | SH4 cpu allows to read cache size for instance | 15:00 |
| Fallenou | I won't hesitate ;) | 15:00 |
| Fallenou | it may end up in CFG2 or maybe in a CFG3 | 15:00 |
| Fallenou | for now, I will only handle the current cache configuration hard coded in NetBSD kernel | 15:01 |
| Fallenou | first things first :) | 15:01 |
| GitHub97 | [NetBSD] fallen pushed 2 new commits to master: http://git.io/rq36JA | 15:04 |
| GitHub97 | NetBSD/master 7be2287 Yann Sionneau: Update TODO | 15:04 |
| GitHub97 | NetBSD/master ed27bd4 Yann Sionneau: Move TLB helpers into cpu.h | 15:04 |
| GitHub46 | [NetBSD] fallen pushed 1 new commit to master: http://git.io/ZPUoqA | 15:06 |
| GitHub46 | NetBSD/master 81a01e8 Yann Sionneau: Add implementation of pmap.9 MD functions... | 15:06 |
| lekernel | the figure you are talking about is about the TMU cache, not the CPU cache | 15:09 |
| Fallenou | oh, ok | 15:10 |
| Fallenou | 95% seemed a bit high :) | 15:10 |
| stekern | that's interesting, I would have expected the opposite, larger cache, less tag bits to compare against, less timing problems. | 15:35 |
| stekern | Fallenou: on or1k, you can read the cache size out of an spr, but then we are 2000 luts larger than lm32 too ;) | 15:37 |
| lekernel | you need more BRAMs too | 15:38 |
| lekernel | so they spread on more area on the chip, and then the particularly slow S6 routing does the rest ... | 15:39 |
| Fallenou | stekern: that's convenient :) | 15:42 |
| stekern | lekernel: ah, yeah, that of course makes sense, several brams might slow things down | 15:46 |
| stekern | (aliasing) that's another thing that's slightly annoying in or1k, you max out on 16kb with a 2-way cache if you don't want to worry about it | 15:50 |
| stekern | what page size did you decide on in the end? | 15:51 |
| Fallenou | for now I'm going for 4 kB pages | 15:53 |
| Fallenou | it seems to be the size used almost everywhere (except "big pages" options and such) | 15:53 |
| stekern | oh, so you're actually worse off in that regard ;) | 16:06 |
| Fallenou | when you say 16 kB, is it total cache size ? taking into account the associativity ? | 16:08 |
| stekern | what are the benefits of having a smaller cache size, for me it was already predefined to 8kb when I did the mmus for mor1kx, so I haven't given it much thought | 16:08 |
| stekern | err, smaller page size | 16:09 |
| Fallenou | you get more fine grain management of your virtual memory | 16:09 |
| Fallenou | so less fragmentation I would say | 16:09 |
| Fallenou | I mean, the kernel in a few places allocate by multiple of page size | 16:10 |
| Fallenou | but it does not need that much memory usually (8 kB or 16 kB) | 16:10 |
| Fallenou | But personally I didn't give the page size a big thought | 16:11 |
| stekern | yeah, that's obvoius of course, but is there something else? perhaps that's reason enough though | 16:12 |
| Fallenou | I took 4 kB as granted because it's almost everywhere in the litterature | 16:12 |
| Fallenou | for instance, on recent linux kernel for x86, do they use bigger pages? (for Ubuntu, debian etc) | 16:12 |
| Fallenou | I know there is an option for that, but I don't know if it's checked or not | 16:13 |
| stekern | (16kb) yes, total cache, 2*8kb | 16:13 |
| GitHub177 | [milkymist-ng] sbourdeauducq pushed 1 new commit to master: http://git.io/b65uyw | 16:13 |
| GitHub177 | milkymist-ng/master 8e76c96 Sebastien Bourdeauducq: timer, uart: EventSourceLevel -> EventSourceProcess | 16:13 |
| GitHub42 | [migen] sbourdeauducq pushed 1 new commit to master: http://git.io/f9HVcQ | 16:14 |
| GitHub42 | migen/master b9b6df6 Sebastien Bourdeauducq: bank/eventmanager: refactor, rename EventSourceLevel -> EventSourceProcess, add fully externally controlled event source | 16:14 |
| stekern | you could of course do more ways, but then the replacement logic becomes alot more complicated | 16:14 |
| Fallenou | well not that much, if you use round robin for instance | 16:15 |
| stekern | at least if you use lru | 16:15 |
| Fallenou | but it's still the same issue of routing more block rams etc | 16:15 |
| Fallenou | I wonder if lru has a big impact on performance | 16:16 |
| Fallenou | when you have 2 ways for instance | 16:16 |
| stekern | lru for 2-way is dead simple | 16:18 |
| stekern | just 1 bit to check against | 16:19 |
| stekern | but, I agree, round robin for 4-way wouldn't be that complex | 16:20 |
| Fallenou | I mean, is it really better than rr ? | 16:20 |
| stekern | maybe I should do that, keep lru for 2-ways, and do rr for 4-ways | 16:21 |
| stekern | it probably depends on the application, I haven't done any comparisons | 16:22 |
| stekern | but i've got the impression that it would be better | 16:23 |
| Fallenou | I would think that when you increase the number of ways, indeed lru can start to get interesting, because you really have a "bigger" choice to make | 16:26 |
| Fallenou | 1 among 4 (or 8 or more) | 16:27 |
| Fallenou | but 1 among 2 seems a poor choice anyway | 16:27 |
| Fallenou | using rr or lru | 16:27 |
| Fallenou | I think only a very precise software benchmark could really give better performance with lru than rr when the associativity is 2 | 16:28 |
| Fallenou | but it's just feeling, I don't really know :) | 16:28 |
| stekern | yeah, it probably doesn't make a big difference | 16:31 |
| GitHub17 | [migen] sbourdeauducq pushed 1 new commit to master: http://git.io/5GmfyA | 17:00 |
| GitHub17 | migen/master 10212e8 Sebastien Bourdeauducq: dma_asmi: cleanup | 17:00 |
| GitHub59 | [milkymist-ng] sbourdeauducq pushed 1 new commit to master: http://git.io/-OfjOA | 19:00 |
| GitHub59 | milkymist-ng/master 89dbc37 Sebastien Bourdeauducq: cif: do not generate write function for CSRStatus | 19:00 |
| GitHub126 | [migen] sbourdeauducq pushed 1 new commit to master: http://git.io/8HoOow | 19:01 |
| GitHub126 | migen/master c82b53f Sebastien Bourdeauducq: bank/description/AutoCSR: add autocsr_exclude | 19:01 |
| GitHub48 | [milkymist-ng] sbourdeauducq pushed 3 new commits to master: http://git.io/T96SmA | 20:33 |
| GitHub48 | milkymist-ng/master 29efa85 Sebastien Bourdeauducq: dvisampler: new DMA engine (buggy) | 20:33 |
| GitHub48 | milkymist-ng/master b3d87e1 Sebastien Bourdeauducq: software/videomixer: use new DMA engine | 20:33 |
| GitHub48 | milkymist-ng/master 66b4bae Sebastien Bourdeauducq: top: connect dvisampler DMA IRQs | 20:33 |
| GitHub82 | [milkymist-ng] sbourdeauducq pushed 1 new commit to master: http://git.io/lTVgFQ | 20:52 |
| GitHub82 | milkymist-ng/master d685ed2 Sebastien Bourdeauducq: dvisampler/dma: bugfixes | 20:52 |
| lekernel | yay! all works now. | 21:00 |
| lekernel | there is some noise on the picture that I suspect is due to poor SI | 21:00 |
| wpwrak | you get clean frames ? | 21:00 |
| lekernel | yes | 21:00 |
| wpwrak | congratulations ! | 21:00 |
| lekernel | on the VGA framebuffer, and in color :) | 21:00 |
| wpwrak | kewl | 21:00 |
| lekernel | with just a couple random pixels | 21:01 |
| lekernel | probably SI, it gets worse when the pixel clock increases | 21:01 |
| lekernel | and 800x600 is pure noise (even sync fails) | 21:04 |
| lekernel | well I hope the direct TMDS board will fix this issue | 21:04 |
| lekernel | I think I can consider myself lucky that at least 640x480 works :) it's pretty much on the brink of failure, and debugging SI would waste days... | 21:05 |
| wpwrak | yeah, the expansion header isn't a great place for high-speed signals | 21:05 |
| wpwrak | indeed. and now you can also do the mixing and fading :) | 21:06 |
| lekernel | if I can get my chB to work... parts for assembling two extra boards are with fedex atm... | 21:07 |
| Fallenou | congratz :) | 21:09 |
| wpwrak | always order a generous number of spares :) | 21:10 |
| Fallenou | there is no correction code on dvi to fix SI caused errors? | 21:10 |
| larsc | nope | 21:15 |
| larsc | HDMI has BCH for data islands though | 21:16 |
| larsc | DVI is really just VGA in digital | 21:16 |
| Fallenou | ok | 21:17 |
| --- Thu May 9 2013 | 00:00 | |
Generated by irclog2html.py 2.9.2 by Marius Gedminas - find it at mg.pov.lt!