TIP Acovea
From Gentoo Linux Wiki
This Article was taken out of the Gentoo on Acer TM803 HOWTO, as it can be useful to a wider public. Please help to turn it into a more general description of avocea's usage.
Contents |
[edit] Acovea
- For the more advanced gentoonians: I've now used a tool called acovea to test my CPU (a first-generation Pentium M) against several CFLAGS. It's a tool based on a evolutionary algorithm to test different GCC-layouts against different CPU types. Here is the output of the acovea tool:
[edit] Layout Pentium-M
optimistic options:
-fno-delayed-branch (1.056)
-fcaller-saves (1.577)
-freorder-blocks (1.198)
-freorder-functions (1.056)
-falign-jumps (1.34)
-finline-functions (1.435)
-frename-registers (1.198)
-fweb (2.428)
-fomit-frame-pointer (1.529)
-fno-trapping-math (1.198)
pessimistic options:
-fno-guess-branch-probability (-2.209)
-fno-if-conversion (-2.304)
-fno-if-conversion2 (-1.121)
-fcse-skip-blocks (-1.452)
-fregmove (-1.215)
-funroll-loops (-1.83)
-fbranch-target-load-optimize2 (-1.405)
-mfpmath=387 (-1.499)
-mfpmath=sse (-1.688)
-mfpmath=sse,387 (-1.878)
-momit-leaf-frame-pointer (-1.972)
- Conclusion: The fastest code would be produced by this CFLAGS:
CFLAGS="-march=pentium-m -pipe -fno-delayed-branch -fcaller-saves -freorder-blocks
-freorder-functions -falign-jumps -finline-functions -frename-registers
-fweb -fomit-frame-pointer -fno-trapping-math -falign-functions=64"
[edit] Layout Pentium-4
- I've also tested acovea against the Pentium-4 layout, so here are the results:
optimistic options:
-fno-if-conversion2 (1.291)
-foptimize-sibling-calls (1.08)
-fcse-follow-jumps (1.417)
-fgcse (2.261)
-frerun-cse-after-loop (1.46)
-fschedule-insns (1.164)
-fstrict-aliasing (1.333)
-freorder-functions (1.08)
-frename-registers (1.417)
-mno-align-stringops (1.164)
-minline-all-stringops (1.544)
pessimistic options:
-fno-if-conversion (-1.619)
-fstrength-reduce (-1.071)
-fpeephole2 (-1.534)
-fschedule-insns2 (-1.197)
-falign-labels (-1.113)
-funroll-loops (-1.703)
-funroll-all-loops (-1.703)
-mfpmath=sse (-1.956)
-mfpmath=sse,387 (-1.914)
-fomit-frame-pointer (-1.619)
-momit-leaf-frame-pointer (-1.534)
-funsafe-math-optimizations (-1.028)
- Conclusion: On this layout the fastest would be:
CFLAGS="-march=pentium4 -pipe --fno-if-conversion2 -foptimize-sibling-calls -fcse-follow-jumps
-fgcse -frerun-cse-after-loop -fschedule-insns -fstrict-aliasing -freorder-functions
-frename-registers -mno-align-stringops -minline-all-stringops"
[edit] Layout Pentium-3
The test against the Pentium-3 was concluding the following results:
optimistic options:
-fforce-mem (2.476)
-fdelete-null-pointer-checks (1.419)
-fnew-ra (2.188)
-mieee-fp (1.034)
-maccumulate-outgoing-args (1.082)
-minline-all-stringops (1.13)
-fomit-frame-pointer (2.236)
pessimistic options:
-fno-guess-branch-probability (-1.946)
-fno-if-conversion (-2.138)
-fno-if-conversion2 (-1.081)
-fgcse (-1.129)
-fstrength-reduce (-1.321)
-fcaller-saves (-1.081)
-fpeephole2 (-1.706)
-fschedule-insns (-1.754)
-funroll-loops (-1.129)
-mfpmath=387 (-1.946)
-mfpmath=sse (-1.225)
-momit-leaf-frame-pointer (-1.994)
- Reading this, the fastest for Pentium-3 would be:
CFLAGS="-march=pentium3 -pipe -fforce-mem -fdelete-null-pointer-checks -fnew-ra -mieee-fp
-maccumulate-outgoing-args -minline-all-stringops -fomit-frame-pointer"
[edit] Analysis
- All original texts are taken from the GCC Homepage for demonstrative puposes only.
[edit] pessimistic: -funroll-loops
- Original text: Unroll loops whose number of iterations can be determined at compile time or upon entry to the loop. -funroll-loops implies both -fstrength-reduce and -frerun-cse-after-loop. This option makes code larger, and may or may not make it run faster.
- Comment: Really astounding about this analysis is the fact that -funroll-loops is at all three layout a pessimistic one. Merely all GCC featuring sites point this flag out as a speed increasing flag. Only the original GCC Homepage says it could slow down the entire code. This is very interesting, as it doesn't only seem to not work, it also seems to slow down the whole bunch of other flags.
