#milkymist IRC log for Tuesday, 2011-11-29

GitHub187[scripts] xiangfu pushed 2 new commits to master: http://git.io/vmKSDw06:19
GitHub187[scripts/master] reflash_m1.sh snapshot: don't flash data by default - Xiangfu Liu06:19
GitHub187[scripts/master] update the power-on message - Xiangfu Liu06:19
qi-botThe Firmware build was successfull, see images here: http://fidelio.qi-hardware.com/~xiangfu/build-milkymist/milkymist-firmware-11292011-0735/08:19
GitHub134[flickernoise] sbourdeauducq pushed 1 new commit to master: http://git.io/JI_FKw09:11
GitHub134[flickernoise/master] performance: fix unmapped key handling - Sebastien Bourdeauducq09:11
azonenberglekernel_: Any experience working with multiprocessor softcores?09:37
azonenbergI'm particularly interested in the interconnect09:38
azonenbergin terms of cache coherency and how multiple processors share the bus09:39
azonenbergi'm working on a triple-core SoC from scratch09:39
azonenbergand am designing the interconnect fabric now09:39
azonenbergi'm using a shared bus (only one core can talk at a time, but it's full duplex)09:39
lekernel_what's at the other end of the shared bus? DRAM?09:40
lekernel_also, softcores are slow. why not use dedicated accelerators?09:40
azonenbergA fixed-mapping MMU09:40
azonenbergThat splits the address bus between memory mapped IO and DDR209:40
azonenbergthe DDR2 has an L2 cache in front of it09:40
lekernel_MMU? you mean address decoder?09:40
azonenbergeach core has its own dedicated L109:40
azonenbergBasically, yes09:41
azonenberghardwired mapping09:41
azonenbergThe L1 is gonig to be structured in such a way as to be a passthrough for the IO address range and cache DRAM and flash addresses09:41
azonenbergthen DRAM and flash will have their own SoC-wide L2 caches09:41
azonenbergI know i'm reinventing the wheel a bit, its mostly an educational exercise09:42
Action: azonenberg is writing a dissertation on computer architecture soon and wants to sharpen his skills first09:42
azonenbergBut its actually going to be quite fast09:42
azonenbergon spartan6 -2 speed i am shooting for 200 MHz09:42
azonenberg* 2-way superscalar09:43
azonenberg= 800 mflops for 2 cores09:43
azonenbergI had to pipeline the heck out of it, but its looking feasible09:44
lekernel_until you get timing paths into the bus arbiter? :)09:44
azonenbergActually, the bus arbiter is looking just fine09:46
azonenbergi just did a standalone test of it at 200 mhz and it works just fine09:46
azonenbergon hardware09:46
azonenbergMy solution to this thing is, pipeline it like crazy09:47
azonenbergits a barrel processor09:47
lekernel_with all the cores and memory controller connected to it?09:47
azonenbergso a 16 stage pipeline means zero latency09:47
azonenbergand 32 stages means one stall09:47
azonenbergi run 16 threads and context switch every clock09:47
lekernel_ah, i see09:47
azonenbergRight now its looking like when running out of L1 cache with a 16 stage pipeline i will have no stalls09:47
azonenbergdespite not having any forwarding whatsoever09:47
azonenbergan L1 cache miss that hits in L2 will most likely stall one instruction09:47
azonenbergif i can fit the L1=>L2 and back in 16 clocks09:48
azonenbergor 2 instructions if it takes me 3209:48
azonenbergas long as i can keep the entire bus structure pipelined09:48
azonenbergthis is a very GPU-esque architecture09:48
azonenberghiding latency by multithreading09:48
lekernel_what about cache miss rates when you have 16 threads switching so fast?09:48
azonenbergI envision it being something like CUDA, each thread executing mostly the same instructions09:48
azonenbergBut they can branch as they see fit'09:48
azonenbergThe entire architecture is mostly an experiment09:49
lekernel_you should compile dedicated hardware accelerators ...09:49
azonenbergyou mean, ASIC level?09:49
lekernel_adding layers over layers makes things slow09:49
azonenbergSure, go get me $30K and i'll get it fabbed in MOSIS :p09:49
lekernel_yes, generate VHDL from CUDA directly09:49
azonenbergand no, this is  mostly an educational exercise09:49
lekernel_no, I mean use the FPGA fabric directly09:49
azonenbergThe goal is to see how many flops i can pull out of a softcore CPU09:49
azonenbergrunning real code09:50
lekernel_softcores are only good to run housekeeping or legacy software09:50
azonenbergalso i have a project in mind that will involve me working with non-hardware people09:50
azonenbergI have dedicated accelerators for stuff like JPEG encoding that i'm working on09:50
azonenbergBut the flight control code has to be in C09:50
azonenbergor C++09:50
azonenbergor assembly09:50
azonenbergsince i am working with CS people who dont knowh hardware09:51
azonenbergSo i want to design a nice powerful architecture for them to run it on09:51
azonenbergthe other motivation as i said is just cutting my teeth on computer architecture09:51
azonenbergthis is not something i envision being a softcore forever, but custom ASICs are not cheap09:51
azonenbergif things go well and it works as planned i might try sending it out to mosis eventually09:52
azonenbergi would love to have a laptop running a CPU i designed09:52
azonenbergin 180nm TSMC or something09:52
azonenbergBut i'm not that advanced yet :p09:52
azonenbergI read your post about the latticemico32 synthesis lol09:53
azonenbergand i think my processor will be faster09:53
azonenbergBut i'd have to reimplement some of the xilinx hard IP cores like the memory controller09:54
azonenbergand their soft FPU09:54
azonenbergI'm pretty sure i can write a better FPU but i havent gotten around to it yet, and as long as it's interface-compatible with theirs it'd be a drop-in replacement09:54
lekernel_their soft fpu? what's that?09:56
lekernel_you're using coregen for a fpu?09:56
azonenbergYes, for now09:56
azonenbergi wanted to focus on the datapath and interconnect first09:56
azonenbergthen go and write myself an FPU when i had all of the surrounding stuff done09:56
azonenbergin the meantime i have theirs because it tells me an FPU of that size and speed is possible09:56
azonenbergiow, setting a lower bound09:56
azonenbergthen i can try and outperform it with an open one09:56
azonenbergCoregen lets you generate floating point add/sub, multiply, divide, and sqrt units separately09:57
azonenbergSo i'll replace them with my own one by one09:57
azonenbergBut again the focus for now is on the datapath and microarchitecture more than implementation09:58
lekernel_you can use the milkymist pfpu pipelines btw ...09:59
azonenbergThe goal here is to practice efficient pipelined architecture10:00
azonenbergSo i want to use as little premade code as possible10:00
azonenberglike i said i'm doing a thesis on computer architecture soon and i want practice10:00
lekernel_but you reused the coregen pipelines already :-)10:01
azonenbergTemporarily, so i could build the other stuff around them10:01
azonenbergits not expected to stay10:01
azonenbergif i had used a free one i'd have less incentive to replace it :p10:01
lekernel_so that's what I get for developing free hardware ...10:04
azonenbergproduction project? Sure10:04
azonenbergBut for educational value sometimes its better to reimplement10:04
azonenbergOnce i build mine, i'll compare it to yours and any other open ones i find10:05
azonenbergand use the best one in real projects10:05
qi-botThe Firmware build was successfull, see images here: http://fidelio.qi-hardware.com/~xiangfu/build-milkymist/milkymist-firmware-11292011-1026/11:28
GitHub122[flickernoise] sbourdeauducq pushed 5 new commits to master: http://git.io/va42-g12:38
GitHub122[flickernoise/master] Do not create ramdisk folder - Sebastien Bourdeauducq12:38
GitHub122[flickernoise/master] filedialog: lock in ssd - Sebastien Bourdeauducq12:38
GitHub122[flickernoise/master] filedialog: prevent slash in filenames - Sebastien Bourdeauducq12:38
GitHub168[flickernoise] sbourdeauducq pushed 1 new commit to master: http://git.io/TnE0CQ13:52
GitHub168[flickernoise/master] shutdown: rename button - Sebastien Bourdeauducq13:52
GitHub120[flickernoise] sbourdeauducq pushed 1 new commit to master: http://git.io/qJvhUw13:59
GitHub120[flickernoise/master] png: enable loading of RGBA images - Sebastien Bourdeauducq13:59
qi-botThe Firmware build was successfull, see images here: http://fidelio.qi-hardware.com/~xiangfu/build-milkymist/milkymist-firmware-11292011-1343/14:25
GitHub36[flickernoise] sbourdeauducq pushed 1 new commit to master: http://git.io/ND-mFA14:30
GitHub36[flickernoise/master] New patch - Sebastien Bourdeauducq14:30
xiangfuwhat is the different between MicroBlaze and LM32.16:12
xiangfuis that same thing in one SOC system. on LM32 is open but MicroBlaze?16:12
wpwrakkinda like MIPS vs. ARM. same purpose, different origin, different style, etc.16:15
xiangfuwpwrak, got it.16:16
qi-botThe Firmware build was successfull, see images here: http://fidelio.qi-hardware.com/~xiangfu/build-milkymist/milkymist-firmware-11292011-1535/16:18
lekernel_new screenshots16:22
kristianpaulMMM...  :-)16:33
kristianpaultoo much zoomed effects i think16:34
wpwrakbah. the end of the year is nearing. fireworks !! :)16:38
kristianpaulyeah , fireworks are nice16:38
lekernel_kristianpaul: if you design new patches that look better, there's no reason I would refuse them...16:39
Action: wpwrak is amazed by how well USB can work even though he completely misunderstood the handshake between fpga and navre ...16:40
wpwraklet's see if anything still works after fixing that16:40
wpwraki thought the SYNC would also set rx_pending ...16:41
lekernel_no, rx_pending is only set after the first byte is completely received16:41
wpwrak(but i never tried to retrieve it. sometimes, two wrongs make an almost right :)16:41
lekernel_but it doesn't make much change, does it?16:41
lekernel_(I mean the first byte of "payload" after the sync, ofc)16:42
wpwrakyeah, just means that my loop was a little late16:42
wpwrakand unnecessarily complicated, too16:43
kristianpaullekernel: i dont wanted to mean that, i just a comment (from what i like) no rush :-)16:44
kristianpauland no i dont imaging designing patches soon16:45
AlarmWhat is the best way to load the latest binary M1.?17:38
lekernelAlarm: as I said, web update17:38
Alarmno with the jtag ?17:41
lekernelno, JTAG is for developers17:41
lekerneland generally slower and harder to use than the web update if you just want a release upgrade17:42
lekernelhttp://www.linux-kvm.org/wiki/images/1/1f/2011-forum-usb.pdf "Remove funky (ab-)use of the usb devices in bluetooth and milkymist." wtf?17:43
kristianpaullekernel: nice !!!)17:48
kristianpaulOnce an application for custom ASIC cores, this demanding computer graphics process is now the province of low-cost FPGAs.17:48
AlarmThe problem is to download the latest version I'm using wget but it's not great for a set of files17:49
lekernelthe M1 downloads the latest version itself17:50
lekerneljust connect it to your internet router ...17:50
wpwraklekernel: (ab-use) what on earth is that presentation about anyway ?18:00
lekernelUSB in QEMU it seems18:01
lekernelbut I asked myself the same question for a while ;)18:01
AlarmI want to do the update by the jtag for pedagogic reasons. The method "WebUpdate" has no interest for me18:01
Alarmmy problem is basic. I am looking for a simple command to download binaries18:03
Alarm"wget-r" aspire all files18:04
Action: lekernel is giving orcc a try. of course, hundreds of MB of java bloat to install ...18:51
kristianpaulsome comments from a friend "you can get video switch for 8usd, but mixer.. as minimun do fading from one picture to another"21:03
kristianpauland please dont be angry with me for posting this, i'm just replying comments21:04
lekernelthe M1 isn't a video switch or mixer. the switch functionality is just a little add-on. you can also get an arduino led blinker for $25 which can do the same as the front panel LEDs on the M1... same kind of stupid comparison21:05
wpwrakmixer may be tricky: you need two codecs for that21:05
wpwrakand i'm not sure if the chip we use has multiple codecs inside21:05
lekernelit does not21:06
lekernelM1 was never intended as a video mixer21:06
kristianpauli'm very exited to bug other friends about M1/FN new features also bring back some feedback21:06
kristianpaulsure not21:06
lekernelthe main feature of this software update is image support - and stress that it can be used with MIDI controllers. the rest is secondary.21:08
kristianpaulsure sure21:11
kristianpauland for you hapiness he really likes the pacman video from wpwrak21:12
wpwrakand one more device enumerates :)21:12
wpwrakhehe ;-)21:13
wpwrakwe need a few more images per patch. then we can have real games :)21:13
kristianpaulwee :)21:14
wpwrakC64 retro style :)21:14
wpwrakof course, the LV3 is still mute. that one's a tough cookie21:14
wpwrakstekern: the latest patch set may also fix the low-speed regression you experienced.21:45
wpwrakstekern: at least it removes quite a bit of confusion i had added before :)21:46
stekernwpwrak: cool, do you keep those patches in a git repo somewhere?21:58
wpwrakonly locally21:59
stekernok, well, lekernel seems to be quite quick to apply them anyways22:00
stekernI need to sign up on the ML22:01
wpwrakyeah. he probably has his alarm clock connected to "grep PATCH" :)22:01
mwallelekernel: (usb abuse) thats qemu and it used the hid layer in a strange way22:50
mwallegerd and i fixed that some time ago ;)22:50
--- Wed Nov 30 201100:00

Generated by irclog2html.py 2.9.2 by Marius Gedminas - find it at mg.pov.lt!