#qi-hardware IRC log for Sunday, 2012-09-02

DocScrutinizer05and sometimes you hear in coding style lessons to use for ( stream = SNDRV_PCM_STREAM_LAST; stream >(=)0; stream-- ) for optimizing reasons (compare to 0 faster&shorter than compare to const) 03:37
DocScrutinizer05anyway this form has the benefit of being to the point about how to interprete pragma-origin-0 and upper-limit03:40
DocScrutinizer05for ( stream = --SNDRV_PCM_STREAM_AFTER_LAST; stream >(=)0; stream-- )03:41
DocScrutinizer05for ( stream = --SNDRV_PCM_STREAM_AFTER_LAST; stream >(=)0; --stream )03:42
DocScrutinizer05actually I learnt about pre- vs post-in/decrement as well, one is better than the other, alas I already forgot which is which03:43
DocScrutinizer05I guess pre-*crement is better since it doesn't require creating a temporary variable03:44
DocScrutinizer05unless your compiler is slightly smarter than any assembler and recognizes that there's no need for any such var03:54
DocScrutinizer05jr@halebop:~> time for (( i=0; i<300000; i++ )); do :; done      #->  real    0m1.807s        ###  jr@halebop:~> time for (( i=300000; i>0; i-- )); do :; done     #->  real    0m1.694s04:01
DocScrutinizer05sure, it's shellscript04:02
DocScrutinizer05and then there's been that one CPU that didn't know decrements ;-P04:05
DocScrutinizer05for (( i=-300000; i<0; i++ ))04:06
wpwraksince the invention of SSA, you can pretty much stop worrying about temporary variables :)09:44
lindi-static single assignment09:45
wpwrakyup :)09:45
viricwhat's that like?09:45
lindi-viric: you only assign each variable once in the IR09:46
wpwrakyour compiler basically creates a new temporary variable for each basic operation09:46
viricand that's any good?09:47
wpwrakafterwards, it clears that up and removes everything it doesn't actually need09:47
wpwrakyes, it gives you very nice semantics for optimization09:47
larscDocScrutinizer05: your coding style lessons are from the 90ies ;)10:17
DocScrutinizer05though held 6  months ago10:17
larsce.g. there is performance wise no difference between i++; and ++i; with a modern compiler10:19
mththere can be a performance difference if you're using C++ iterators, but not for integers10:19
DocScrutinizer05I think I already mentioned that?10:19
DocScrutinizer05ooh yes, I forgot to recall to mention it is of course *very* depending on type of variable, and esp c vs c++10:20
DocScrutinizer05c++ you err compiler might even have to instanciate a new object10:21
DocScrutinizer05maybe not nowadays anymore, or more seldom nowadays10:22
larscmth: yes, looks like a bug10:22
DocScrutinizer05but this ENEA guy told us that as well10:22
DocScrutinizer05and it took him like 2 sentences resp 10s to handle the whole topic10:25
wpwraklarsc: early 90es ;-)10:26
DocScrutinizer05c'mon guys, cpu still is faster to check register for 0 than against a constant10:27
viric++ pre and post operators can be implemented completely different, for objects.10:27
larscDocScrutinizer05: depends on your arch10:28
DocScrutinizer05and that's sth a compiler usually can't auto-optimize10:28
mthlarsc: will you fix it or shall I fix it?10:29
wpwrakdepends. most compiler don't see beyond function calls, that's true. so they don't know whether there are side effects lurking.10:29
DocScrutinizer05I don't know any arch that wouldn't set flags on arithmetic operations like inc or whatever10:29
wpwrakbut if there are no calls or all the functions are inline (or maybe static), then there's a good chance the compiler can rearrange that check if it thinks it makes a difference10:30
DocScrutinizer05so comparing against 0 is basically for free10:30
larscmth: I'm lazy today10:30
mthok, I'll make a patch then; can I include your name in a header?10:31
wpwraki'm lazy too. so i won't look up one that doesn't set flags on inc/dec ;-)10:31
larscmth: sure10:31
DocScrutinizer05that for sure was too much for a lazy weekend10:31
wpwrakbut the point is that the compiler should be perfectly capable of changing the loop itself, provided the context isn't spread out too widely (in which case the optimization is irrelevant anyway)10:32
DocScrutinizer05compiler can't optimize a inc loop to a dec loop, since that's way beyond what compiler understands about code semantics10:33
larscthere are archs where you don't jump on flags, but the jmp or branch instruction does a compare10:33
larscso it does not really matter whether inc/dec sets a flag or not10:33
DocScrutinizer05dbnz 10:33
DocScrutinizer05yes, I know10:34
DocScrutinizer05I don't know a ibnz instruction10:34
DocScrutinizer05was it 8080 or z80 that invented decrement-and-branch-if-non-zero instruction?10:35
DocScrutinizer05this training lesson was targeted at embedded. The CPUs there are amazingly stupic little things sometimes. And still today you can find those10:37
DocScrutinizer05I strongly suspect a compiler will rather create a second variable it increments, inside a djnz loop, rather than comparing this variable against a constant for loop termination10:41
DocScrutinizer05of course when you're using a dec loop terminating at zero, the compiler can use the djnz register10:42
DocScrutinizer05wpwrak: and I really wanna see the compiler smart enough to usually detect this case and convert for (i=0, i<10,i++) to for(i=9, i>=0, i--)10:44
DocScrutinizer05it needs thorough understanding of all that's done inside the loop with i, and the side effects10:44
larscDocScrutinizer05: if i for example is not used inside the loop the compiler will do it10:45
larscif it actually is faster10:45
DocScrutinizer05yep, then it's safe10:45
DocScrutinizer05and that's the only safe case the compiler is 'smart' enough to detect10:46
DocScrutinizer05I'd say10:46
DocScrutinizer05anyway even bashscript is faster on comparing to 0 ;-D10:48
larscif it can be sure that the loop is side effect free, I'd say I can also optimize other cases. E.g. summing up a array10:48
DocScrutinizer05and while I've been aware of that since 1980, I tend to forget it in daily coding10:48
DocScrutinizer05securing side effect absence isn't trivial10:49
DocScrutinizer05it's possible 95% of cases, as long as you got 'clean' architecture10:50
DocScrutinizer05means no mem-mapped IO that the compiler doesn't know about, etc10:50
DocScrutinizer05even then in a multitasking environment there always might be differences10:51
larscDocScrutinizer05: e.g. foo and bar generate the same code: http://pastie.org/465024810:51
wpwrakand you didn't even enable optimization ;-)10:53
DocScrutinizer05larsc: sure, but in one of the cases your results may be bogus10:53
wpwrakDocScrutinizer05: how so ?10:54
DocScrutinizer05well, not in this very example10:54
wpwrakqed :)10:54
DocScrutinizer05 for (i = 0; i < 10; i++);  if (i=0 || i=1) {while x[i]==0 {usleep 1000} } else { tmp += x[i] };    return tmp;10:57
DocScrutinizer05 for (i = 0; i < 10; i++);  if (i==0 || i==1) {while x[i]==0 {usleep 1000} } else { tmp += x[i] };    return tmp;10:58
wpwrakluckily, the loop optimization should be very simple here ;-)10:59
DocScrutinizer05sure, you shouldn't do that inside the loop anyway11:00
wpwrakdo what ? ;-)11:00
wpwrakyour loop is empty11:00
wpwrakwell, the for loop11:01
wpwrakand the while loop won't compile, so .. :)11:01
DocScrutinizer05I can spam the chan with another 5 to 10 typo fixed verions11:01
DocScrutinizer05though it's a pita to edit in this IRC client11:02
wpwrakthe main problem in your example is the usleep anyway, because there's no good way to tell the compiler that the side effect in question is a delay11:03
DocScrutinizer05err wut?11:04
wpwrakah, it's an endless loop. i see.11:04
DocScrutinizer05the side effect in question is another process initializing the array with values and only _then_ set the semaphore in x[0] x[1]11:04
wpwrakthere are a few "volatile" missing already for correctness ;-)11:05
DocScrutinizer05good point11:06
wpwrakand obviously, if you tell the compiler explicitly to avoid any optimizations, you shouldn't be too disappointed if it indeed doesn't optimize :)11:06
DocScrutinizer05though I'm not sure you need volatile for storage that can get changed by CPU only11:06
wpwrakyou're confused :)11:07
DocScrutinizer05I'm just saying it mustn't optimize in that case11:07
wpwrakif you add the right volatiles, it won't optimize11:09
DocScrutinizer05since (it seems I should explain my code) when the loop reads the array from x[9] down to x[0], then the test for semaphore set and usleep will be done last, not first11:10
DocScrutinizer05hell, I'm not setting volatile for all global stroage that might get accessed by other processes11:11
wpwrak"volatile" tells the compiler that the access sequence must not be changed11:11
wpwrakthen your code is broken :)11:11
wpwrakof course, since function calls are usually a barrier, too, you can get away with that in many cases. but the sooner or later, you'll run into trouble.11:12
DocScrutinizer05any static var already implies that there may be (possibly concurrent) access to the storage from other location in code, no?11:14
DocScrutinizer05I mean, that's one of the reasons you get static var11:14
wpwrakconcurrency is never assumed unless you explicitly indicate it with "volatile"11:15
wpwrakwhat ?11:15
DocScrutinizer05err, yes I'm confused. Please wait until my coffe had time to kick in11:15
wpwrakin your coding class, they shouldn't waste time with prehistoric optimization techniques and instead teach some basics ;-)11:15
DocScrutinizer05a *global* var (scope beyond local function) usually is defined that way because other functions may want to access it as well11:18
larscyes, but C assumes that no two functions run at the same time11:18
DocScrutinizer05in the example *x is "form outside"11:19
wpwraklarsc: ... unless one calls the other (possibly via intermediates)11:19
DocScrutinizer05larsc: C assumes what? No taskswitching allowed during execution of int foo(int *x) ?11:20
larscDocScrutinizer05: exactly11:20
wpwrakanyway, gotta get things ready for today's barbecue. 24 C predicted for this chilly mid-winter day :)11:20
DocScrutinizer05now that sounds absolutely BS. How's preemptive multitasking going to work with that?11:20
larscDocScrutinizer05: Well if you run two processes, your two processes run in different contexts and they'll never see each other11:21
larscbut if you have two processes which share memory you need to explicitly protect that memory against concurrent access11:22
DocScrutinizer05ummpf, what if I got a IRQ service callback function in same module like int foo(int *x)11:22
larscif that IRQ service routine also accesses your x you need to explicitly write your code to handle that case11:23
DocScrutinizer05I did, by defining x[0] x[1] as semaphore11:23
larscwell, see as no native semaphores11:24
DocScrutinizer05and I assumed compiler better not comverts this loop to a decrement loop for optimization11:24
larscit won't. but that's because it does not no what happens inside of usleep11:25
larscs/not no/not know/11:25
larscwhat the fuck is wrong with my typing today?11:25
lindi-yeah usleep does a syscall so it is quite hard to optimize11:27
DocScrutinizer05I *might* kick out the usleep (though that's for sure rogue)11:28
lindi-for library calls some optimizations are possible though, like converting printf with constant arguments to puts11:29
lindi-or printf("%c", c) to putchar(c)11:29
DocScrutinizer05replace that by a simple return(-ETOOFAST) when x[0]==011:29
DocScrutinizer05still a pretty nice race when foo() sums up bogus values for x[2..9] and then finds meanwhile semaphore in x[0] got set so everything seems to be fine11:30
larscgcc has extensions like __attribute__(("pure")) which tells the compiler that the function has no sideeffects11:31
lindi-larsc: llvm IR has readonly and readnone attributes that are similar11:32
DocScrutinizer05the whole point of my reasoning been that compiler has a hard time to decide if there are side effects in a for-loop that might bite you when compiler optimizes loops from increment to decrement11:33
larsclindi-: I think llvm, well clang, should also support __attribute__(("pure"))11:35
DocScrutinizer05take a simple memmove shifting each byte down one position. Do you think compiler is smart enough to see it will fail miserably when you change sequence? like x[8]=x[9]; x[7]=x[8]...11:35
lindi-larsc: yeah, I was talking about how it is represented in the IR11:36
larscDocScrutinizer05: it must see this 11:36
DocScrutinizer05no sarcasm, really wondering if compilers might be smart enough for that11:36
lindi-DocScrutinizer05: just paste it to http://llvm.org/demo/ 11:38
DocScrutinizer05o.O ??11:38
mthlarsc: I'm testing the pwm changes and there is one annoying effect: the numbering changes12:03
mthJZ_GPIO_PWM2 becomes pwm 0 etc12:03
mthshould we just accept that? request that the pwm core can handle mappings that don't start at 0? or register JZ_GPIO_PWM0 and JZ_GPIO_PWM1 and return EBUSY if someone tries to use them?12:04
mththe hardware does have 8 PWMs, it's just that timer 0 and 1 are in use by the kernel and therefore not available as PWMs12:06
larscmth: e.g. for jz4760 timer 0 and 1 are not used by the kernel, so I think it should be fine12:10
larscI don't really get patch 1 of that series12:12
larscbut I might be missing someting12:12
larscBtw. I met with Thierry last week, seems to be a nice guy12:13
mthpatch 1 wants to get rid of a header named "irq.h" in the current directory (for 4740 arch) since it gets included instead of something from the header search path12:14
mthI don't know what that something is12:14
mthmaybe the MIPS version of irq.h?12:14
larscbut we include irq.h with "" instead of <>12:15
mthhmm, so it's the one that is supposed to be included12:16
mththe MIPS version is called asm/irq.h, so unless the "asm" path is in a search path as well, it cannot be masked by just "irq.h"12:17
mthmaybe it's shadowing arch/mips/include/asm/mach-generic/irq.h?12:19
mththat's the one that defines NR_IRQS12:19
larscbut we never do #include <irq.h> anyway12:21
mthI don't see the rest of the patches include <irq.h> either12:24
mthso I'm out of ideas12:24
mthI guess you should just ask Thierry12:24
larscI think the pwm_{enable,disable} in jz4740_pwm_config should be jz4740_pwm_{enable,disable}12:27
mthI get the pwm backlight driver to init now using the legacy api, but the screen goes dark12:28
mththe difference in the enable/disable is this: test_and_clear_bit(PWMF_ENABLED, &pwm->flags)12:30
larscah, I misunderstood you earlier, the new id's start with PWM2 = 012:30
larscthat's indeed not so nice12:30
mthI think the numbering thing is even causing bugs12:36
mthpwm->hwpwm is 5 for PWM7, so it will conifgure timer 512:37
larscyea, just saw this as well12:37
mthbtw, I think the is_enabled check before pwm_disable in jz4740_pwm_config is redundant, since the pwm core checks this as well12:39
mthand the core checks it atomically, so that's a bit safer as well12:39
mthor jz4740_pwm_config is called with a lock held anyway?12:40
larscI think we should set NUM_PWM to 8 and return -EBUSY for 0 and 1 for in request12:40
mthyes, I think that makes it much clearer since it corresponds better with the hardware12:41
mthok, with that change it works via the legacy API12:45
mthnow I have to figure out how the new API works12:45
larscthe same, I think12:46
mthit seems some kind of registration to work12:46
mthsee pwm_add_table()12:47
larscah, probably like the clk api 12:48
mththere is nothing arch/ that uses 'struct pwm_lookup' though12:48
mthso it seems the new API is so new that it's not actually being used yet12:49
mthshall I send a diff to Thierry?12:51
larscI think all you need for the PWM_LOOKUP("jz4740-pwm", 4, 0, 0)12:52
larscor similar12:52
larscall you need for the pwm backlight12:52
larscI'd just comment on the patch12:54
mthcode is sometimes easier to understand than a description of code...12:58
mthPWM_LOOKUP("jz4740-pwm", 7, 0, 0) does not work12:59
mthpwm-backlight pwm-backlight: unable to request PWM, trying legacy API12:59
larschm, ok the first one seems to be the name of the device which requests the pwm 13:01
larscso pwm-backlight13:01
mthstrange to call it "provider" then...13:02
larscthe 3rd argument is the requesting device13:02
larscbut it should also work if it is NULL13:03
mthPWM_LOOKUP("jz4740-pwm", 7, "pwm-backlight", 0) does seems to work13:05
larscabout patch 2, I'd like to keep the timer functions as inline functions, do you agree?13:18
mthit would reduce the number of exported symbols significantly13:25
mthperformance wise, I don't think it really matters, since we don't call those functions very often13:26
larscwell, they are used in the system clock code13:27
mthI sent my comments13:30
mthdoes the system clock change the timer configuration often though?13:33
mthmaybe once adaptive tickless is in it will?13:33
larscwell it reads the counter quite often13:34
mthok, then inline is better indeed13:35
DocScrutinizer05((<larsc> I think we should set NUM_PWM to 8 and return -EBUSY for 0 and 1 for in request)) indeed13:59
larscmth: I found out that the udc core used by the jz4740 ist the musb udc core and so we already have a driver for it in mainline14:59
larscwe just need to add some boilerplate code to hook e.g. clocks up14:59
qi-botThe build was successful: http://fidelio.qi-hardware.com/~xiangfu/build-nanonote/openwrt-xburst.minimal-20120901-1311 15:44
mthlarsc: this is the glue code we're using for musb on the JZ4770: http://www.treewalker.org/temp/musb-jz4770.diff16:37
mthI migrated Ingenic's code from 2.6.31 to 3.516:38
mththat was an awful lot of work since they backported a lot of future mainline commits to 2.6.31 and published it as part of a 500,000 line diff16:39
larscwe shouldn't need as much for jz474016:48
larscalso there is a generic gpio driver for cable detection, iirc16:49
mthwe probably don't need that much on 4770 either, but I haven't tried to cut it down further yet as it works in its current state16:55
Aylalarsc: hi21:40
Aylado you remember what the '+ 2' is for?21:41
mthAyla: if it's jz_clk_pll_get_rate you're talking about, I don't think larsc is responsible21:43
mthin the version I have here, there is no +2 at line 221-22221:44
Aylathere is, on the jz4770 tree21:45
mthah, I just changed the names of those variables, not the expression21:47
mthI think the data sheet just says that the register value is 2 less than the actual value21:48
--- Mon Sep 3 201200:00

Generated by irclog2html.py 2.9.2 by Marius Gedminas - find it at mg.pov.lt!