#milkymist IRC log for Saturday, 2013-07-06

--- Sat Jul 6 201300:00
azonenbergwoot http://pastebin.com/raw.php?i=ZaYb9mvY06:05
azonenbergmy xc2c32a bitstream decompiler is coming along nicely06:06
azonenbergflipflop and clock config is the only thing i have left at the core conceptual level06:06
azonenbergthen i need to script decoding of the other 3/4 of the global routing matrix, i only have one quadrant done so far06:06
azonenberglekernel: http://pastebin.com/raw.php?i=ZaYb9mvY08:18
larscazonenberg: pretty neat08:58
azonenberglarsc: it's getting there, at this point i'd say only a few more days (of actual work, depends on my schedule when i can get to it) to have the whole chip pretty much decoded08:59
azonenbergAt which point i can stop the reverse engineering and start the forward engineering08:59
azonenbergas well as some refactoring to make it easier to scale the code to larger CR-II devices08:59
azonenbergright now i only support the 32a08:59
azonenbergthe 64a will be a fairly easy upgrade as the architectures are almost identical08:59
azonenbergthe 128 and larger are a different architecture which would require some code changes09:00
larschow does the architecture influence the bitstream format? Is it mostly different, or same structure but different data?09:01
azonenbergthe low end devices each have one input-only pin, i have not yet decoded the bits for using it09:02
azonenbergthe higher-end do not09:02
azonenberglow end are two I/O banks, larger is more (but thats an easy change too)09:02
azonenbergthe PLA is identical09:02
azonenbergthe global routing is different for each device but i have not yet seen any reason to believe the big are any more different from each other than the small09:03
azonenbergthe big difference is the low-end devices have one macrocell format09:03
azonenbergthe high-end have two, and it's not the same as that of the low end09:03
larschm09:03
azonenbergthey introduce a new class "buried macrocells" which are internal logic only, not broken out to pads09:03
azonenbergthese are basically a subset of the low-end macrocell09:04
azonenbergminus the I/O config09:04
azonenbergbut the non-buried mcells in the high-end devices have a lot of new options09:04
azonenbergfor example "datagate"09:04
azonenbergand SSTL capability on the inputs09:04
azonenbergthat will take more work09:04
azonenbergfirst priority is a full F/OSS toolchain for the 2c32a09:04
larscit's nice seeing you getting there09:05
azonenbergI think in another week i'll be able to go from bitstream to RTL source09:05
azonenbergthen use the xilinx tools to re-synthesize09:05
azonenbergand get a bit-for-bit identical output09:05
azonenberg(for the 32a)09:05
azonenbergor at least, logically equivalent09:06
azonenbergthere's a few spots where its hard to control what the optimizer does09:06
azonenbergwhich makes generating certain structures to study tricky :p09:06
larscif your tool was e.g. a compiler and the cplds be ISAs, would you say the difference between high and and low end is more like the differens between MIPS I and MIPS II or more like between MIPS and ARM?09:06
azonenbergI'd say it's closer to x86 and x86-64 lol09:07
larscok09:07
azonenbergLow end macrocell config (xilinx-generated comment in the bitstream)09:07
azonenbergN Aclk ClkOp Clk:2 ClkFreq R:2 P:2 RegMod:2 INz:2 FB:2 InReg St XorIn:2 RegCom Oe:4 Tm Slw Pu*09:07
azonenbergsorry the N shouldnt be there09:07
azonenbergthats the comment marker09:08
azonenbergBuried macrocells in high-end09:08
azonenbergAclk Clk:2 ClkFreq ClkOp FB:2 P:2 Pu RegMod:2 R:2 XorIn:209:08
azonenbergI/O macrocells in high end09:08
azonenbergAclk Clk:2 ClkFreq ClkOp DG FB:2 InMod:2 InReg INz:2 Oe:4 P:2 Pu RegCom RegMod:2 R:2 Slw Tm XorIn:209:08
azonenbergthe DG bit enables/disables datagate but i dont know which is which yet09:08
azonenbergthe schmitt trigger (ST) bit was replaced with a two-bit "InMod" field09:09
azonenbergthis is to handle the fact that an IOB now has three encodings09:09
azonenbergnormal, schmitt trigger, SSTL comparator09:09
azonenbergvs just two09:09
azonenbergi dont know if the fourth value is used for anything09:09
azonenbergthen the PLA is the same09:09
azonenbergthe global routing differs from device to device09:09
azonenbergglobal clock stuff i have not decoded for the 32a yet so i dont know how that differs09:10
larscbut that's something that can describded using data (e.g. a big table) or do you have to write different code for each device?09:10
azonenbergMost code is going to be the same for all devices09:11
larscok09:11
azonenbergThere are several spots i have device-specific stuff though09:11
azonenbergMy tools are on track to be *massively* faster than the xilinx ones lol09:11
larschehe09:11
azonenbergas a minimum, i'll take muuuuch less time to go from an in-memory model of the device to a bitstream09:13
azonenbergtheir tool takes like a second and a half09:13
azonenberglol09:13
azonenbergmine took 260ms to load a bitstream, parse it, decompile it, then reserialize :p09:13
azonenberg... and most of that time was spent in printf calls in the decompiler09:13
azonenbergthis is also debug -O0 builds09:14
azonenbergvs the xilinx shipping release build lol09:15
azonenbergWhen, in the future, I write any FPGA stuff09:15
azonenbergi will take great pains to design for scalability from the ground up09:15
azonenbergincluding parallel algorithms from day one09:15
larscyea09:15
azonenbergBut for CPLDs i think optimized serial code is fast enough09:16
larsccores per machine won't be getting less09:16
azonenbergLol09:16
azonenbergthat, and i have a rack of stuff in the living room09:16
azonenbergi dont want it gathering dust while i build with one core :p09:16
larscI guess another interesting thing is to actually split the task over multiple physical machines in a network09:17
azonenbergThat is the goal09:17
larscnice09:17
azonenbergI plan to design my FPGA tools to be extremely scalable09:17
azonenbergexactly what algorithms i use are TBD09:17
azonenbergBut i was reading some papers that got like 100x speedups on simulated annealing09:19
azonenbergbefore i do that i'd want to look at more future-looking algorithms that are deterministic thoguh09:19
azonenbergso i dont end up like xilinx's tools :p09:19
azonenbergeven if i do twice as much work as the random algorithm09:19
azonenbergon 16 cores i can afford that :p09:20
azonenbergAnyway that's a LONG way out09:20
azonenbergCPLDs first09:20
larscdoesn't vivado use some kind of multivariable function solver to do P&R?09:23
azonenbergSo they claim, yes09:23
azonenbergi want to explore such algorithms09:24
azonenbergthey seem to give better results09:24
azonenbergWhich is why i dont want to implement SA09:24
larschm09:26
azonenbergRight now i'm actually tuning some cluster settings to make my research codebase build / test faster09:27
azonenbergi have lots of different dev boards and many test cases could run on any of several09:28
azonenbergso i'm trying to load-balance so that each board is used about the same amount09:28
azonenbergrather than having long queues on a few09:28
lekernelnice... AMOLED interface standards also use the I¬¬¬ model14:39
lekernelhttp://www.mipi.org/specifications/display-interface14:39
lekerneloooh but it *is* I¬¬¬14:44
lekernelhttp://www.ieee-isto.org/member-programs/mipi-alliance14:44
GitHub138[linux-milkymist] larsclausen pushed 4 new commits to master: http://git.io/mmYQZA14:49
GitHub138linux-milkymist/master f6c70cb Lars-Peter Clausen: lm32: Use free_reserved_area helpers...14:49
GitHub138linux-milkymist/master ac44f7f Lars-Peter Clausen: lm32: Put signal trampoline in static code...14:49
GitHub138linux-milkymist/master d9b73e7 Lars-Peter Clausen: lm32: Directly link against libgcc...14:49
larscysionnea1: https://github.com/milkymist/linux-milkymist/commit/d9b73e7c50bb795ea70962980b183db6675f8049 much easier than copying the files from libgcc15:23
lekernelmeanwhile... http://www.raspberrypi.org/phpBB3/viewtopic.php?f=7&t=2061#p39441 :)15:41
lekernelthe rpi way: "let's use a members-only standard implemented with a proprietary chip plus an obscure blob, and go conference-hopping in OSHW events"15:44
ysionnea1larsc: oh, very nice trick :)16:56
ysionnea1thanks16:57
ysionnea1lekernel: :/ sad16:59
ysionnea1larsc: ./obj/tooldir.Darwin-10.8.0-i386/lm32--netbsd/bin/gcc -print-libgcc-file-name17:09
ysionnea1libgcc.a17:09
ysionnea1that gives me only the filename, not the path17:09
larscgives me the full path here17:11
larsc${CROSS_COMPILE}gcc -print-libgcc-file-name17:12
larsc/opt/rtems-4.11/lib/gcc/lm32-rtems4.11/4.5.2/libgcc.a17:12
ysionnea1hum weird I'll have a look at my gcc source tree17:14
larschttps://lists.yoctoproject.org/pipermail/poky/2012-March/007675.html17:22
ysionneaumacbookprodeyannsionneau:NetBSD fallen$ $PWD/obj/tooldir.Darwin-10.8.0-i386/lm32--netbsd/bin/gcc --sysroot=$HOME/dev/NetBSD/obj/tooldir.Darwin-10.8.0-i386/lm32--netbsd/ -print-libgcc-file-name17:27
ysionneaulibgcc.a17:27
larscprobably your macbook or something ;)17:30
ysionneauyep, or maybe the lm32--netbsd toolchain with a wrong configuration17:33
GitHub196[linux-milkymist] larsclausen pushed 1 new commit to master: http://git.io/cEnpSw18:24
GitHub196linux-milkymist/master 064001a Lars-Peter Clausen: lm32: Fix idle function...18:24
larscone fallout from the v3.10 merge, idle didn't work anymore18:25
wpwrak_azonenberg: (My tools are on track to be *massively* faster than the xilinx ones) what's next on your list ? competitive swimming against a pile of rocks ? ;-)18:36
azonenbergwpwrak_: lol19:44
azonenbergI was actually thinking a 100m sprint vs a tree sloth19:44
Action: wpwrak_ wonders how the sloth would perform if adding hornets to the equation19:46
larsca sloth is quite power efficent though19:52
wpwrak_optimally fine-tuned sleep states19:53
larscby years and years of natural selection, he who sleeps the most survives ;)19:55
--- Sun Jul 7 201300:00

Generated by irclog2html.py 2.9.2 by Marius Gedminas - find it at mg.pov.lt!