Optimize Options using the GNU Compiler Collection - Android Software/Hacking General [Developers Only]

Optimize Options using the GNU Compiler Collection - Android Software/Hacking General [Developers Only]

I haven't found a good source of discussion on this topic but has anyone experimented with gcc optimization flags when compiling the kernel or a ROM? I dug around and found sparse information on the subject and rarely is Android specifically mentioned. Just curious if anyone has 2 cents on the subject.
I've been trying to figure out how to implement using these flags in kernel compilation but I'm confused if you're supposed to put these in your makefile (both main one and the arch\arm\... makefile?)
is ffast-math preferred over some of the Linaro blog post flags ("We’ve switched from a base of -O2 -fno-strict-aliasing to -O3 -fmodulo-sched -fmodulo-sched-allow-regmoves.")
I've tried sifting through Linaro git repos to see anyone actually using any flags they mention but have yet to find anyone using them yet they say they do somewhere.
GCC Optimization Flags Reference http://gcc.gnu.org/onlinedocs/gcc-4.0.4/gcc/Optimize-Options.html
Some background information on the Linaro blogs about -o3 make flag
Compiler flags used to speed up Linaro Android 2011.10, and future optimizations
People trying out the latest Linaro Android builds may notice they’re faster than the older versions. One of the reasons for this is that we’re using a new set of compiler flags for this build.
We’ve switched from a base of -O2 -fno-strict-aliasing to -O3 -fmodulo-sched -fmodulo-sched-allow-regmoves.
To optimize application startup, we’ve also switched the linker hash style to GNU (ld --hash-style=gnu), and patched the Android dynamic linker to deal with those hashes.
Getting rid of -fno-strict-aliasing enables rather efficient additional optimizations – but requires that the code being compiled follows some rules traditionally not enforced by compilers.
Given the traditional lack of enforcement, there’s lots of code out there (including, unfortunately, in the AOSP codebase) that violates those rules.
Fortunately, gcc can help us detect those violations: With -fstrict-aliasing -Werror=strict-aliasing, most aliasing violations can be turned into built time errors instead of random crashes at runtime. This allowed us to detect many aliasing violations in the code, and fix them (where doable with reasonable effort), or work around them by enabling -fno-strict-aliasing just for a particular subdirectory containing code not ready for dropping it.
-O3 enables further optimizations not yet enabled at -O2, including -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload, -ftree-vectorize and -fipa-cp-clone.
While some of them are probably not ideal (e.g. -finline-functions tends to increase code size, thereby also increasing memory usage and, in a bad case, reducing cache efficiency), overall -O3 has shown to increase performance.
-fmodulo-sched and -fmodulo-sched-allow-regmoves are fairly new optimizations not currently automatically enabled at any -O level. These options improve loop scheduling – more information can be found here.
We optimized further by adding link-time optimizations in some relevant libraries, and by using -ffast-math in selected parts of the code. -ffast-math is a bit dangerous because it can cause math functions to return incorrect values by ignoring parts of the ISO and IEEE rules for math functions (parts that make optimizations harder, or that simply require additional checks that slow down things considerably), so it should usually not be used for an entire build – but enabling it for parts of the code verified to not rely on exact ISO/IEEE rules can produce quite a speedup.
We also added the option to specify extra optimizations in a board specific config – so now our Panda builds are optimized specifically for Cortex-A9 CPUs while the iMX53 builds optimize for Cortex-A8 instead of relying on the least common denominator.
We also experimented with Graphite related optimizations, such as -fgraphite-identity, -floop-block, -floop-interchage, -floop-strip-mine, -ftree-loop-distribution and -ftree-loop-linear – those optimizations can rearrange loops to allow further optimizations. We’ve turned them back off for the release because they introduced some stability problems, and the benefit seemed minimal.
However, chances are they will come back in a future build. They can help make more efficient use of multi-core CPUs (with the addition of -ftree-parallelize-loops) once the compiler knows how to to threading (currently, Android is built using a generic arm-eabi targeted compiler that has no knowledge of the underlying OS).
Other possible future improvements – though probably not as efficient as the ones already made, or the switch to -ftree-parallelize-loops for multi-core boards – include seeing what effect a switch between ARM and Thumb2 instructions has on performance (enabling the right mode in the right modules), identifying places where the increased code size of -O3 actually hurts, and add the likes of -fno-inline-functions there, identifying further performance critical parts that can handle -ffast-math, fixing the aliasing violations we’ve worked around this time, and – of course – switching to gcc 4.7 when it becomes usable.

make -o3 -o -fmodulo-sched -o -fmodulo-sched-allow-regmoves -Wl,--hash-style=gnu -Werror=strict-aliasing ARCH=arm CROSS_COMPILE=~/(toolchain path)/bin/arm-linux-gnueabihf- zImage modules -j 4
is this how you'd do a make according to the instructions on the linaro blog? This is the only way I could do it for it to actually run something but it doesn't tell me that it's passing these flags

Related

Toolchain / Wakeup Lag

Various developers have noticed differences in the amount of lag between different toolchains. Dkcldark reported having a toolchain that nearly eliminates this problem, and was able to share with me how he built it using ct-ng. In essence, the versions and options selected in the arm-iphone sample config for ct-ng produce a toolchain that seems to give much better results. A binary package for linux-x86 is available here. I find that my device is typically ready to use after sleep in about one second, running a kernel built with this toolchain and my fast-stepup and OC modifications, and this is similar to the worst-case behavior for stock kernels. Hopefully this will be helpful to others who want to build kernels - I'm going to try other gcc versions with ct-ng without changing any other options and see what results I have with them.
Results (linked to a toolchain download when a kernel has been built successfully):
binutils 2.19.1, gcc 4.3.4, newlib: about 1s lag on wakeup
binutils 2.19.1, gcc 4.4.3, newlib: kernel panic at UI startup

Good work Unhelpful! Glad you decided to share this with everyone rather than keeping it to yourself and ignoring people.

Excellent work! Thank you SO much for sharing . I know JAC is already making a kernel using this toolchain for the Vibrant. Your work is MUCH appreciated

Thanks..
Honestly, friend... Thank you. I wanted to start looking at kernel sources and trying to build them or at least follow the changes.. I really appreciate your openness with what you've found.

Good find.

[WARNING] Compiler optimizations!

I just noticed this again and it worries me alot! Devs here are usually posting their changes as being different than others, more even about compiler optimizations and yet most of you make huge mistakes.
Personally I've been running Gentoo linux for about ten years now and what I learned from running different setups with space limitations is that one should NEVER use -Os and -O3 together as this will not only take much more time to compile it also GROWS executables.
-Os is basically the same as -O3, but it adds some flags that won't benefit executable size vs memory usage.
In general executables compiled with only -Os are nearly as optimized as -O3/2 and resulting in smaller executable. While compiled with -O3 uses more memory allocation for extra speed. While using both together will result in larger executables, more memory usage AND much longer compile times vs just -O2.
So devs can you all please remove -Os from your "optimizations" and leave just -O2? Bragging about "more compiler optimizations" really fails if you fail to understand why the -Ox optimizations levels have been added in the first place... Which is to NOT use them together!
Also -O3 might introduce more IO operations, which might have a negative effect on our snappies.

The choice: Summarised
-O2 is the default optimization level. It is ideal for desktop systems because of small binary size which results in faster load from HDD, lower RAM usage and less cache misses on modern CPUs. Servers should also use -O2 since it is considered to be the highest reliable optimization level.
-O3 only degrades application performance as it produces huge binaries which use high amount of HDD, RAM and CPU cache space. Only specific applications such as python and sqlite gain improved performance. This optimization level greatly increases compilation time.
-Os should be used on systems with limited CPU cache (old CPUs, Atom ..). Large executables such as firefox may become more responsive using -Os.
Click to expand...
Click to collapse
From: http://en.gentoo-wiki.com/wiki/CFLAGS

Quote from the wiki articale you provided:
"Note that only one of the above flags may be chosen. If you choose more than one, the last one specified will have the effect."

tincmulc said:
Quote from the wiki articale you provided:
"Note that only one of the above flags may be chosen. If you choose more than one, the last one specified will have the effect."
Click to expand...
Click to collapse
My point is that in general people here seem to use "-Os -O3", which is dumb really as it had no benefits. -O2 should be better performance wise on our phones.

great post, thank you man!

Now silently pray for our devs to change their -O3 -Os to -O2.
Or I might need to do my own branch of "optimized" nightlies.
For example what you declare with -march=A -mcpu=B will intact be -march=A -mtune=B.
edit: I think the gcc compiler even throws a warning about l2p etc.

Wouldn't the best choice be just -Os? I would have thought that optimising for systems with limited cache would be appropriate for portable devices.

Pipe Merchant said:
Wouldn't the best choice be just -Os? I would have thought that optimising for systems with limited cache would be appropriate for portable devices.
Click to expand...
Click to collapse
I highly doubt executable sizes on our phones are big enough to benefit from -Os.
In the case of my router at home -Os doesn't give any benefit at all, in fact packet inspection is slightly slower vs -O2.

Benchmarks, suggestions and researchs for kernel optimization

Benchmarks, suggestions and researchs for kernel optimization
Hello guys,
these days I have done several comparative benchmarks of what can be achieved, at least in terms of performance, applying the existing optimization options or particular compiler flags and also the main kernel memory allocators.
This, I hope, is only the beginning...
I'd like that this thread could become a place where developers, experienced or not, can share their benchmark results, knowledge, suggestions on possible improvements, patches, etc designed to improve performance and / or stability of the kernel.
Inside the next posts, you can find the conducted benchmark results.
Thanks for your attention
PS. Sorry for my bad english...

Kernel Optimization Benchmarks
These are the results of the comparative benchmarks of the performance results that can be achieved using the different kernel optimization options (-Os, -O2, -O3), using cpu and floating point tuning (eg -march=armv7-a -mtune=cortex-a8 -mfpu=neon) and other additional compilation flags.
For these tests I've used different kernels based on Arco's 3.0.60 kernel sources and built using the same toolchain (optimized for generic Cortex-A cpu):
- pure kernel built selecting -Os optimization
- pure kernel built selecting -O2 optimization
- slighty modified kernel built selecting -O3 optimization (only two simple changes to add -O3 optimization option inside Makefile and init/Kconfig)
- slighty modified kernel built selecting -O3 optimization and with the following additional compilation flags inside Makefile
Code:
...
OPTIMIZATION_FLAGS = -march=armv7-a -mtune=cortex-a8 -mfpu=neon \
-ffast-math -fsingle-precision-constant \
-fgcse-lm -fgcse-sm -fsched-spec-load -fforce-addr
CFLAGS_MODULE = $(OPTIMIZATION_FLAGS)
AFLAGS_MODULE = $(OPTIMIZATION_FLAGS)
LDFLAGS_MODULE =
CFLAGS_KERNEL = $(OPTIMIZATION_FLAGS)
AFLAGS_KERNEL = $(OPTIMIZATION_FLAGS)
...
and
Code:
...
ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
KBUILD_CFLAGS += -Os
endif
ifdef CONFIG_CC_OPTIMIZE_DEFAULT
KBUILD_CFLAGS += -O2
endif
ifdef CONFIG_CC_OPTIMIZE_MORE
KBUILD_CFLAGS += -O3 -fmodulo-sched -fmodulo-sched-allow-regmoves -fno-tree-vectorize
endif
ifdef CONFIG_CC_OPTIMIZE_FAST
KBUILD_CFLAGS += -Ofast
endif
...
The kernel has not been overclocked, I used the default min and max cpu frequencies (245 Mhz - 1401 Mhz) with Performance cpu governor and Noop I/O scheduler.
To perform each kernel benchmark, I made at least 3 test runs for each of the following tools:
- CF-Bench developed by XDA Developer Chainfire
- Antutu Benchmark
Benchmark final results
(the image is uploaded on Media Fire, if you can't see it, could be due to your proxy configuration)
You can find more detailed results (ods, xlsx and pdf format) and screenshots inside the following Media Fire folder: Results Folder
Conclusion:
As you can see, if compared with each other, -O2 optimization and -O3 optimization with additional flags give the best results in terms of performance, expecially using CF-Bench, but the differences are relatively small.
Thus, we can conclude by saying that the optimization flags above, do not do miracles, at least in terms of performance...
Memory Allocators Benchmarks
These are the results of the comparative benchmarks of the performance results that can be achieved selecting one of the different kernel memory allocators (SLUB, SLAB, SLOB and SLQB).
For these tests I've used different builds of the above descripted kernel (slighty modified kernel built selecting -O3 optimization and with the additional compilation flags):
- a build configured selecting the default SLUB memory allocator (the unqueued slab allocator V6)
- a build configured selecting the SLAB memory allocator (old and deprecated)
- a build configured selecting the SLOB memory allocator (a simpler memory allocator suitable for embedded devices with low memory)
- a build configured selecting the SLQB memory allocator (I modified the kernel sources to add it)
The kernel has not been overclocked, I used the default min and max cpu frequencies (245 Mhz - 1401 Mhz) with Performance cpu governor and Noop I/O scheduler.
To perform each kernel benchmark, I made at least 3 test runs for each of the following tools:
- CF-Bench developed by XDA Developer Chainfire
- Antutu Benchmark
Benchmark final results
(the image is uploaded on Media Fire, if you can't see it, could be due to your proxy configuration)
You can find more detailed results (ods, xlsx and pdf format) and screenshots inside the following Media Fire folder:
Results Folder
Conclusion:
In this case, if compared with each other, SLUB memory allocator (which is the default one) and SLQB memory allocator give the best results in terms of performance. The differences between these two are relatively small.
Patch:
If you want to add the SLQB memory allocator inside your JB kernel sources, you can use the following patch
SLQB_memory_allocator.patch
After downloading it inside your kernel sources folder:
Code:
[COLOR="Navy"]Show the changes inside the patch[/COLOR]
[B]git apply --stat SLQB_memory_allocator.patch[/B]
[COLOR="Navy"]Check if the patch could cause conflicts or other problems[/COLOR]
[B]git apply --check SLQB_memory_allocator.patch[/B]
[COLOR="Navy"]If all is ok, apply the patch[/COLOR]
[B]git am --signoff SLQB_memory_allocator.patch[/B]

Additional IO Scheduler Benchmarks
These are the results of the comparative benchmarks of the performance results that can be achieved selecting some of the additional IO scheduler we can find on many custom kernels.
I8150 - Samsung Galaxy W - ROW and SIO I/O schedulers comparison (Made by Hadidjapri)
These benchmarks have been made on I8150 by Hadidjapri.
For each I/O scheduler, he made 4 test runs using CF-Bench and setting the default min and max cpu frequencies (245 Mhz - 1024 Mhz) and the default SLUB memory allocator.
Benchmark final results
(the image is uploaded on Media Fire, if you can't see it, could be due to your proxy configuration)
I9001 - Samsung Galaxy S Plus - ROW, SIO and V(R) I/O schedulers comparison
For each I/O scheduler, I made 4 test runs using CF-Bench and setting Performance cpu governor with the default min and max cpu frequencies (245 Mhz - 1401 Mhz) and the default SLUB memory allocator.
Benchmark final results
(the images are uploaded on Media Fire, if you can't see them, could be due to your proxy configuration)
You can find more detailed results (ods, png and pdf format) inside the following Media Fire folder:
Results Folder
Conclusion:
For both devices, the best scheduler for I/O operations is SIO.

AW: Benchmarks, suggestions and researchs for kernel optimization
This is huge! Thank you so much!!
Sent from my GT-I9001 using xda app-developers app

WOW. This is extremely interesting! Thank you very much! These are awesome spreadsheets!
I'm going to publish these on my blog if it's okay with you. if it's not, Ill add the link of this XDA topic in Android Articles.
AWESOME. PERIOD
(I always secretly wanted to make spreadsheets like these regarding optimization, but I didn't knew how :silly

from what i see in the O3 optimization, i guess the additional build flags is a must otherwise it's only a fancy stuff which doesn't give anything
i have 2 questions regarding the benchmark
1. which one is more reliable, CF or antutu?
2. do you get large variance when testing for database IO? or is this happens only to me

broodplank1337 said:
WOW. This is extremely interesting! Thank you very much! These are awesome spreadsheets!
I'm going to publish these on my blog if it's okay with you. if it's not, Ill add the link of this XDA topic in Android Articles.
AWESOME. PERIOD
(I always secretly wanted to make spreadsheets like these regarding optimization, but I didn't knew how :silly
Click to expand...
Click to collapse
I'll be really happy if you publish these results inside your blog.
You can also use those spreadsheet as template, you're welcome!
Thank you!

hadidjapri said:
from what i see in the O3 optimization, i guess the additional build flags is a must otherwise it's only a fancy stuff which doesn't give anything
i have 2 questions regarding the benchmark
1. which one is more reliable, CF or antutu?
2. do you get large variance when testing for database IO? or is this happens only to me
Click to expand...
Click to collapse
I can say that no benchmarking tool is really reliable, I've tried many and I think it isn't actually possible to create or find a tool that always gives similar results on a system that still performs other operations without our control on them...
Antutu Benchmark is the benchmark tool every one of us know and use most, could not miss, though it is not the best in terms of reliability of the results. In particular, the first test should always be discarded because it is very different from the next, so each time I run 4/5 test and considered only the last 3 similar. In addition, I confirm that the database test and SD card tests often vary greatly.
CF-Bench is certainly more reliable and allowed me to make the detailed tests on memory allocation, but doesn't perform benchmarks on database and on the 2d and 3d graphics.

Even as a non-dev its intersting. Nice work

Nice job. Even though i dont own this phone anymore but im still interested of benchmarks. Saw one in antutu and the highest for your phones is about 7000+. The guy (I forgot who) said that he edited something in the build.prop to achieve that high score. Only thing I remember is that he posted the line in the q&a thread.

Juhan Jufri said:
Nice job. Even though i dont own this phone anymore but im still interested of benchmarks. Saw one in antutu and the highest for your phones is about 7000+. The guy (I forgot who) said that he edited something in the build.prop to achieve that high score. Only thing I remember is that he posted the line in the q&a thread.
Click to expand...
Click to collapse
debug.gr.swapinterval=0
which disables the 2d FPS cap, it preforms well on benchmark but sucks in reality, overheating your gpu for nothing, and all animations are bugged

Christopher, where did you get the custom Linaro toolchain optimized for Cortex-A?

R: Benchmarks, suggestions and researchs for kernel optimization
android1234567 said:
Christopher, where did you get the custom Linaro toolchain optimized for Cortex-A?
Click to expand...
Click to collapse
I've built myself, take a look here:
http://forum.xda-developers.com/showthread.php?t=2098133
Sent from my GT-I9001 using xda premium

i would like to benchmark ROW and SIO scheduler.
using CF Bench is enough right?

R: Benchmarks, suggestions and researchs for kernel optimization
hadidjapri said:
i would like to benchmark ROW and SIO scheduler.
using CF Bench is enough right?
Click to expand...
Click to collapse
Yes, you don't need to test 2d and 3d graphics, the results shouldn't change. So CF Bench is enough. Let me know your results, if you want...
Sent from my GT-I9001 using xda premium

Christopher83 said:
Yes, you don't need to test 2d and 3d graphics, the results shouldn't change. So CF Bench is enough. Let me know your results, if you want...
Sent from my GT-I9001 using xda premium
Click to expand...
Click to collapse
okay, i'll test it after i finish my class today. maybe 3-4hours later

hmm here's what i got
benchmark procedure and static variable used
1. test run at 1024MHz
2. both scheduler are tested for 4 times/batch
3. test is using CF Bench tool
4. memory allocator used is SLUB
link to pastebin
http://pastebin.com/SPrv1HVG

hadidjapri said:
hmm here's what i got
benchmark procedure and static variable used
1. test run at 1024MHz
2. both scheduler are tested for 4 times/batch
3. test is using CF Bench tool
4. memory allocator used is SLUB
link to pastebin
http://pastebin.com/SPrv1HVG
Click to expand...
Click to collapse
Thank you very much Hadidjapri!
I'm adding your benchmark results inside the first page...
Then I'll add my benchmark results too...

Christopher83 said:
Thank you very much Hadidjapri!
I'm adding your benchmark results inside the first page...
Then I'll add my benchmark results too...
Click to expand...
Click to collapse
Ok, first part completed.
Now, I'll collect my results and prepare the spreadsheet...

Christopher83 said:
Ok, first part completed.
Now, I'll collect my results and prepare the spreadsheet...
Click to expand...
Click to collapse
you mistype the clock sir. it's 245-1024 not 1401
Sent from my GT-I8150

[COMMIT] [AOSP] JustArchi's ArchiDroid Optimizations V4.1 - Unleash the power!

Hello dear developers.
I'd like to share with you effect of nearly 300 hours spent on trying to optimize Android and push it to the limits.
In general. You should be already experienced in setting up your buildbox, using git, building AOSP/CyanogenMod/OmniROM from source and cherry-picking things from review/gerrit. Solving git conflicts would also be nice. If you don't know how to build your own ROM from source, this is not a something you can apply to your ROM. Also, as you probably noticed, this is not a something you can apply to already prebuilt ROM (stocks), as these optimizations are applied during compilation, so only AOSP roms, self-compiled from source may use this masterpiece.
So, what is it about? As we know, Android contains a bunch of low-level C/C++ code, which is compiled and acts as a backend for our java's frontend and android apps. Unfortunately, Google didn't put their best at focusing on optimization, so as a result we're using the same old flags set back in 2006 for Android Donut or anything which existed back then. As you guess, in 2006 we didn't have as powerful devices as now, we had to sacrifice performance for smaller code size, to fit to our little devices and run well on very low amount of memory. However, this is no longer a case, and by using newest compilers and properly setting flags, we can achieve something great.
You probably may heard of some developers claiming using of "O3 Flags" in their ROMs. Well, while this may be true, they've applied only to low-level ARM code, mostly used during kernel compilation. Additionally it overwrites O2 flag, which is already fast, so as you may guess, this is more likely a placebo effect and disappears right after you change the kernel. Take a look at the most cherry-picked "O3 Flags commit". You see big "-Os" in "TARGET_thumb_CFLAGS"? This is what I'm talking about.
However, the commit I'm about to present you is not a placebo effect, as it applies flags to everything what is compiled, and mostly important - target THUMB, about 90% of an Android.
Now I'll tell you some facts. We have three interesting optimization levels. Os, O2, O3. O2 enables all optimizations that do not involve a space-speed tradeoff. Os is similar to O2, but it disables all flags that increase code size. It also performs further optimizations to reduce code size to the minimum. O3 enables all O2 optimizations and some extra optimizations that like to increase code size and may, or may not, increase performance. If you want to ask if there's something more like O4, there is - Ofast, however it breaks IEEE standard and doesn't work with Android, as i.e. sqlite3 is not compatible with Ofast's -ffast-math flag. So no go for us.
Now here comes the fun part. Android by default is compiled with O2 flag for target ARM (about 10% of Android, mostly low-level parts) and Os flag for target THUMB (about 90% of Android, nearly everything apart from low-level parfts). Some guys think that Os is better than O2 or O3 because smaller code size is faster due to better fitting in cpu cache. Unfortunately, I proven that it is a myth. Os was good back in 2006, as only with this flag Google was able to compile Dalvik and it's virtual machine while keeping good amount of free memory and space on eMMC cards. As or now, we have plenty of space, plenty of ram, plenty of CPU power and still good old Os flag for 90% of Android.
I've made countless tests to find out what is the most efficient in terms of GCC optimization, two selected tests I am about to present you right now.
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
As you may noticed, I compiled whetstone.c benchmark using three different optimization flags - Os, O2 and O3. I set CPU to performance, maximum frequency, and I repeated each test additional two times, just to make sure that Android doesn't lie to me. Source code of this test is available here and you may download it, compile for our beloved Android and try yourself. As you can see O3 > O2 >> Os, Os performs about 2.5x times worse than O2, and about 3.0x times worse than O3.
But, of course. Android is not a freaking benchmark, it's operating system. We can't tell if things are getting better or worse according to a simple benchmark. I kept that in mind and provided community with JustArchi's Mysterious Builds for test. I gave both mysterious builds and didn't tell my users what is the mysterious change. Both builds have been compiled with the same toolchain, same version, same commits. The one and only mysterious change was the fact that every component compiled as target thumb (major portion of an android) has been optimized for speed (O3) in build #1 (experimental), and optimized for size (Os) in build #2 (normal behaviour). Check poll yourself, 9 votes on build 1 in terms of performance, and 1 vote on build 2. I decided that this and benchmark is enough to tell that O2/O3 for target thumb is something that we want.
Now it doesn't matter that match if you wish to use O2 or O3, but here is some comparison:
1. Kernel compiled with O2 has 4902 KB, with O3 4944 KB, so O3 is 42 KB bigger.
2. ROM compiled with O3 is 3 MB larger than O2 after zip compression. Fast overview: 97 binaries in /system/bin and 2 binaries in /system/xbin + 283 libraries in /system/lib and other files, about 400 files in total. 3 MB / 400 = 7,5 KB per file size increase.
3. It's unlikely that code working properly with O2 level might break on O3 level, most issues are on the Os <-> O2 part.
4. If it doesn't cause any issues, and speeds up a binary by a little bit, why not use it?
5. The only real reason to not use O3 is potential higher memory usage due to oversized binaries.
In general, I doubt that this extra chunk of code may cause any significant memory usage or slower performance. I suggest to use O3 if it doesn't cause any issues to you compared to O2, but older devices may use O2 purely for saving on code size, similar way Google did it back in 2006 using Os flag.
[SIZE="+1"]Now let's get down to business[/SIZE].
Here is a list of important improvements:
- Optimized for speed yet more all instructions - ARM and THUMB (-O3)
- Optimized for speed also parts which are compiled with Clang (-O3)
- Turned off all debugging code (lack of -g)
- Eliminated redundant loads that come after stores to the same memory location, both partial and full redundancies (-fgcse-las)
- Ran a store motion pass after global common subexpression elimination. This pass attempts to move stores out of loops (-fgcse-sm)
- Enabled the identity transformation for graphite. For every SCoP we generate the polyhedral representation and transform it back to gimple. We can then check the costs or benefits of the GIMPLE -> GRAPHITE -> GIMPLE transformation. Some minimal optimizations are also performed by the code generator ISL, like index splitting and dead code elimination in loops (-fgraphite -fgraphite-identity)
- Performed interprocedural pointer analysis and interprocedural modification and reference analysis (-fipa-pta)
- Performed induction variable optimizations (strength reduction, induction variable merging and induction variable elimination) on trees (-fivopts)
- Didn't keep the frame pointer in a register for functions that don't need one. This avoids the instructions to save, set up and restore frame pointers; it also makes an extra register available in many functions (-fomit-frame-pointer)
- Attempted to avoid false dependencies in scheduled code by making use of registers left over after register allocation. This optimization most benefits processors with lots of registers (-frename-registers)
- Tried to reduce the number of symbolic address calculations by using shared “anchor” symbols to address nearby objects. This transformation can help to reduce the number of GOT entries and GOT accesses on some targets (-fsection-anchors)
- Performed tail duplication to enlarge superblock size. This transformation simplifies the control flow of the function allowing other optimizations to do a better job (-ftracer)
- Performed loop invariant motion on trees. It also moved operands of conditions that are invariant out of the loop, so that we can use just trivial invariantness analysis in loop unswitching. The pass also includes store motion (-ftree-loop-im)
- Created a canonical counter for number of iterations in loops for which determining number of iterations requires complicated analysis. Later optimizations then may determine the number easily (-ftree-loop-ivcanon)
- Assumed that loop indices do not overflow, and that loops with nontrivial exit condition are not infinite. This enables a wider range of loop optimizations even if the loop optimizer itself cannot prove that these assumptions are valid (-funsafe-loop-optimizations)
- Moved branches with loop invariant conditions out of the loop (-funswitch-loops)
- Constructed webs as commonly used for register allocation purposes and assigned each web individual pseudo register. This allows the register allocation pass to operate on pseudos directly, but also strengthens several other optimization passes, such as CSE, loop optimizer and trivial dead code remover (-fweb)
- Sorted the common symbols by alignment in descending order. This is to prevent gaps between symbols due to alignment constraints (-Wl,--sort-common)
Click to expand...
Click to collapse
Sound cool, doesn't it? Head over to my ArchiDroid project and see yourself how people react after switching to my ROM. Take a look at just one small example, or another one . No bullsh*t guys, this is not a placebo.
However, please read my commit carefully before you decide to cherry-pick it. You must understand that Google's flags weren't touched since 7 years and nobody can assure you that they will work properly for your ROM and your device. You may experiment with them a bit to find out if they're not causing conflicts or other issues.
I can assure you that my ArchiDroid based on CM compiles fine with suggested steps written in the commit itself. Just don't forget to clean ccache (rm -rf /home/youruser/.ccache or rm -rf /root/.ccache) and make clean/clobber.
You can use, modify and share my commit anyway you want, just please keep proper credits in changelogs and in the repo itself. If you feel generous, you may also buy me a coke for massive amount of hours put into those experiments.
Now go ahead and show your users how things should be done .
Cherry-picking time!
Android "Lollipop" (5.1.1 & 5.0.2 tested)
JustArchi's ArchiDroid Optimizations V4.1 for CyanogenMod (latest)
A set of commits you may want to pick to fix O3-related issues:
external_bluetooth_bluedroid | hardware_qcom_display | libcore | frameworks_av #1 | frameworks_av #2
Older entries are provided for reference only. I suggest using only latest commit above.
Android "Lollipop" (5.1.1 & 5.0.2 tested)
JustArchi's ArchiDroid Optimizations V4 for CyanogenMod
Android "Kitkat" 4.4.4:
JustArchi's ArchiDroid Optimizations V3 for CyanogenMod
JustArchi's ArchiDroid Optimizations V3 for OmniROM
JustArchi's ArchiDroid Optimizations V2
JustArchi's ArchiDroid Optimizations V1
AFTER applying above commit and AFTER EVERY CHANGE regarding flags, ALWAYS make clean/clobber AND empty ccache (rm -rf ~/.ccache)
Q: How to properly change toolchains used in local manifest?
Open from your source rootdir .repo/local_manifests/roomservice.xml (or create one). Here is a sample manifest that replaces default 4.8 toolchain (both eabi and androideabi) with 4.8 SaberMod and 4.9 ArchiToolchain:
Code:
<?xml version="1.0" encoding="UTF-8"?>
<manifest>
<remove-project name="platform/prebuilts/gcc/linux-x86/arm/arm-eabi-4.8" />
<project name="ArchiDroid/Toolchain" path="prebuilts/gcc/linux-x86/arm/arm-eabi-4.8" remote="github" revision="architoolchain-5.2-arm-linux-gnueabihf" />
<remove-project name="platform/prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.8" />
<project name="ArchiDroid/Toolchain" path="prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.8" remote="github" revision="uber-4.9-arm-linux-androideabi" />
</manifest>
This is only an example, you should use the toolchains that suit you best. My ArchiDroid/Toolchain github repo is a good start to test various different toolchains and decide which one you like the most, or which one causes the least problems for you. I do not suggest any other magic tricks to include custom toolchains, putting your selected one in proper path is enough, avoid magic android_build modifications.
[size=+1]Troubleshooting[/size]
Q: Compiler errror:
Code:
(...)/prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.8/bin/../libexec/gcc/arm-linux-androideabi/4.8.x-sabermod/cc1: error while loading shared libraries: libcloog-isl.so.4: cannot open shared object file: No such file or directory
This error can be fixed by installing missing library. libcloog-isl.so.4 is provided by libcloog-isl4 package, so on debian-like OSes, you should be able to fix it with:
Code:
apt-get install libcloog-isl4
Q: Compiler errror:
Code:
(...)/prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.8/bin/../libexec/gcc/arm-linux-androideabi/4.8.x-sabermod/cc1: error while loading shared libraries: libisl.so.13: cannot open shared object file: No such file or directory
This error is very similar to above, but considers other shared library. libisl.so.13 is provided by libisl13 package. Now the problem is that this package is in testing/sid, so we'll need to install it from there.
Add to your /etc/apt/sources.list following entries:
Code:
deb http://ftp.debian.org/debian testing main contrib non-free
deb-src http://ftp.debian.org/debian testing main contrib non-free
Then apt-get update && apt-get install libisl13.
Issues below are for older commits and should be used for reference only
Kitkat THUMB O2+ errors?
These are the most common issues.
* Change -O3 flag from TARGET_thumb_CFLAGS back to -Os, make clean/clobber, empty ccache and try again. This fixes most of the issues.
* RIL problems for for the Exynos 4210 family? Add -fno-tree-vectorize to TARGET_thumb_CFLAGS.
* Broken exFAT -> https://github.com/JustArchi/android_external_fuse/commit/78ebbc4404de260862dca5f0454bffccee650e0d
Errors caused by toolchain?
1. Try Google's GCC 4.8 if you used Linaro 4.8 or SaberMod 4.8
2. Fallback to Google's GCC 4.7 if above didn't help (change TARGET_GCC_VERSION back to 4.7)
Errors caused by GCC 4.8+?
* ART Fix (bootloop) -> https://github.com/JustArchi/android_art/commit/71a0ca3057cc3865bd8e41dcb94443998d028407
* Not booting kernel -> https://github.com/JustArchi/androi...mmit/9262707f4ea471acf40baa43ffe4bfb3cff64de9 and https://github.com/JustArchi/androi...mmit/41a70abcdad746d9415f3ee40f90528feb0c9bdd
Errors caused by GCC 4.9+?
* Graphical glitches in PlayStore -> https://github.com/JustArchi/android_external_webp/commit/36c6201fbb108d6b757f860e2cd57f3982191662
Errors caused by Linaro?
* error: unknown CPU architecture -> https://github.com/JustArchi/androi...mmit/5e5e158a7c147725beae1eeb6785174baacecb03 (Keep in mind that this is a sample fix for smdk4412 kernel, you may need to use similar solution in your own case. Also, this error happens only with Linaro toolchain, doesn't happen with Google's GCC)
Other errors?
* error: undefined reference to 'memmove' -> https://github.com/XperiaSTE/androi...mmit/679a4e571ef77f47892a785e852d8219c1e6807a
[size=+1]Credits[/size]
@IAmTheOneTheyCallNeo - For inspiration and first steps
@metalspring - For some nice commits
@sparksco - For SaberMod, some nice commits and support for the optimization idea

Bravo! I see flags in here that I am yet to try. Thanks for all your work and dedication to this effort.
Going to do a comparison this week against my "neofy" initiative
Thanks!
-Neo
Forum Moderator

This is amazing man!
Enviado desde mi Moto G mediante Tapatalk

Damn... Wish I had an international gs3. I'd be on SlimKat, find your thread and be like:
I'm gonna be 20 years old in about an hour... Should I ask?

Great work :highfive:
I saw you had already a custom 4.10 linaro. If I use it, do I need to change anything on your commit?

Thanks
Sent from my SAMSUNG-SGH-I747 using Tapatalk

_MarcoMarinho_ said:
Great work :highfive:
I saw you had already a custom 4.10 linaro. If I use it, do I need to change anything on your commit?
Click to expand...
Click to collapse
It won't boot, stl port has segfaults when using GCC 4.9+. If you want to use it, sure, change TARGET_GCC_VERSION and include proper toolchain.

JustArchi said:
It won't boot, stl port has segfaults when using GCC 4.9+. If you want to use it, sure, change TARGET_GCC_VERSION and include proper toolchain.
Click to expand...
Click to collapse
Is this the best toolchain [stability/performance] to compile a ROM?
https://github.com/JustArchi/Linaro/tree/4.8-androideabi

_MarcoMarinho_ said:
Is this the best toolchain [stability/performance] to compile a ROM?
https://github.com/JustArchi/Linaro/tree/4.8-androideabi
Click to expand...
Click to collapse
Yes.

JustArchi said:
Yes.
Click to expand...
Click to collapse
Just one last question. Do I need to configure envsetup.sh to the custom toolchain directory?

_MarcoMarinho_ said:
Just one last question. Do I need to configure envsetup.sh to the custom toolchain directory?
Click to expand...
Click to collapse
No, but you need to add Linaro entries to your local_manifest.
https://github.com/JustArchi/ArchiDroid/blob/2.X-EXPERIMENTAL/__dont_include/roomservice.xml#L3

Thank you @JustArchi
i make a build for Slimkat (nexus 5)
Everything is fine but just a noobish question,
I have to enable "TARGET_USE_O3=true" in BoardConfig.mk ?

micr0g said:
Thank you @JustArchi
i make a build for Slimkat (nexus 5)
Everything is fine but just a noobish question,
I have to enable "TARGET_USE_O3=true" in BoardConfig.mk ?
Click to expand...
Click to collapse
No, you shouldn't have any commits apart from mine. It already specifies O3 for everything.

Can I use a 4.7 toolchain with the ArchiDroid optimizations? I've tried compiling my kernel with different 4.8 toolchains before, and it's always resulted in boot loops.

Codename13 said:
Can I use a 4.7 toolchain with the ArchiDroid optimizations? I've tried compiling my kernel with different 4.8 toolchains before, and it's always resulted in boot loops.
Click to expand...
Click to collapse
Actually you can. Just change TARGET_GCC_VERSION back to 4.7.

Well done mate, I'll try contributing from my side to this project

great work archi !! also after porting a fully working 4.4.2 touchwiz to i9300 is it possible to make aosp for i9300 more stable now?

@JustArchi which flags could be used for kernel compiling? And where should I put it in in the Makefile? I don't compile the complete ROM because I have to low machine for that, but I am developing a custom kernel for my Xperia Z. BTW Fajnie trafić na innych Polaków

dragonnn said:
[B @JustArchi[/B] which flags could be used for kernel compiling? And where should I put it in in the Makefile? I don't compile the complete ROM because I have to low machine for that, but I am developing a custom kernel for my Xperia Z. BTW Fajnie trafić na innych Polaków
Click to expand...
Click to collapse
Ktoś tu mówił o Polakach?
Świetna robota @JustArchi :highfive:

dragonnn said:
@JustArchi which flags could be used for kernel compiling? And where should I put it in in the Makefile? I don't compile the complete ROM because I have to low machine for that, but I am developing a custom kernel for my Xperia Z. BTW Fajnie trafić na innych Polaków
Click to expand...
Click to collapse
If you have whole ROM tree then you cherry-pick this commit, lunch your target and make bootimage. This is enough.
If you have standalone kernel, take a look at main Makefile .

[WIP] Compile Android (5.x) with Qualcomm LLVM 3.7.1 for Qualcomm CPU

Use Qualcomm Snapdragon LLVM Compiler 3.7.1 (https://developer.qualcomm.com/mobi...-performance/snapdragon-llvm-compiler-android) for Android Lollipop 5.x for devices with Qualcomm CPU
As far as I know nobody has done that yet. Except me
This changes the toolchain for CLANG from the default one which comes with Android (aCLANG) to the Qualcomm Snapdragon LLVM Compiler 3.7.1 (qCLANG) mentioned above. The toolchain will be used for almost all places where the default Android CLANG (aCLANG) is used for compiling TARGET MODULES. BUT ONLY if "USE_CLANG_QCOM := true" is set. Otherwise it is using aCLANG. So you can merge this commit and the build will be the same as before except you specifically define "USE_CLANG_QCOM := true"!
WARNING READ CAREFULLY: It is possbile that using this toolchain, especially with the optimizations flags, will introduce BUGS!
DONT SPAM QUALCOMM FORUM PLEASE!
I would suggest: IF YOU USE IT DONT MAKE BUG REPORTS FOR YOUR ROM OR KERNEL OR ANY APPS. Make reports in this development thread. The ROM compiles, boots and starts. Also if Bionic and other modules are compiled with Qualcomm Snapdragon LLVM Compiler 3.7.1 (qCLANG).
Androbench, Antutu, Basemark 2.0 , Geekbench 3, Vellamo are 'completing' the benchmarks. This is a work in progress. There is confirmation that it works for Sony Z Ultra and Samsung Galaxy Note 4 and after some months of testing everything seems good on my personal device.
At the moment this is only tested with Qualcomm Snapdragon 800 but in theory this should also work with any Krait and even Scorpion CPU. According to Documentation there is also support for armv8/aarch64 (Snapdragon 810) and you should gain a real performance bump but support is not yet implemented! If you have a Snapdragon 810 device and want to try this out contact me please.
Install instructions:
You have to make an account in order to download the toolchain!
You need to download the toolchain and extract it to {your android dir}/prebuilts/clang/linux-x86/host/*
Merge/Cherry-Pick this commit: https://github.com/mustermaxmueller/compiler-rt/commit/774cabb50dced5b7acf62af331480397d3a0d12f
Merge/Cherry-Pick this commit: https://github.com/mustermaxmueller/android_build/commit/894f0793d998395e0c15b9ad7a39b277e3aca006
Merge/Cherry-Pick these commits:
https://github.com/mustermaxmueller...mmit/05e2a338b2b66f30d86687c3de4e48d30a2db8d4
https://github.com/mustermaxmueller...mmit/5f4eadfe1bbb5184d650cc9eef8f2306fefc7cdf
You need the Qualcomm Snapdragon LLVM Compiler 3.5 for this to work!!!
OR
Make changes to frameworks/rs/driver/runtime/build_bc_lib_internal.mk like stated here: http://forum.xda-developers.com/showpost.php?p=59626153&postcount=47
Merge/Cherry-Pick this commit: https://github.com/jgcaaprom/android_prebuilt_ndk/commit/98100b2cbeb681698be6093ef964bd1f4414a71f
Set "USE_CLANG_QCOM := true" e.g. in Boardconfig.mk
OPTIONAL (set these options in your Boardconfig.mk):
USE_CLANG_QCOM_VERBOSE := true (If you want to be certain that qclang is used. You will see extra verbose output while compiling.)
USE_CLANG_QCOM_LTO := true (use LTO; not working at the moment)
CLANG_QCOM_COMPILE_ART := true (compile art with qclang)
CLANG_QCOM_COMPILE_BIONIC := true (compile bionic with qclang)
CLANG_QCOM_COMPILE_MIXED := true (compile defined modules with qclang)
CLANG_QCOM_USER_MODULES += (Here you can overwrite which modules should be compiled with qCLANG instead of GCC)
CLANG_QCOM_FORCE_COMPILE_ACLANG_MODULES += (Here you can overwrite which modules should be compiled with aCLANG instead of qCLANG)
Make a clean compile! Also delete ccache!
Annotation:
First boot and every boot after cleaning cache will take up to 15min! Just be patient even if it seems that device is stuck at optimizing apps.
Should also work with Android 6.0 but not tested yet!
You should only get measurable performance benefits if you compile Bionic with CLANG!
Documentation for the toolchain: {your android dir}/prebuilts/clang/linux-x86/host/llvm-Snapdragon_LLVM_for_Android_3.7/prebuilt/linux-x86_64Snapdragon_LLVM_ARM_37_Compiler_User_Guide.pdf
The Flags "-muse-optlibc" and "-ffuse-loops" are not Documented. (http://pastebin.com/PZtk6WHB)
The implementation is not beautiful but it works
HOST is compiled with aCLANG because qCLANG does not understand x86
You can make performance and size comparisons by removing/commenting "USE_CLANG_QCOM := true"
I could have made the installation easier by uploading the toolchain to Github but I do not know if I am allowed to. And I am no lawyer so...
TODO:
-Make more performance comparisons to aCLANG and GCC.
-Compile Android with qCLANG wherever possible. http://www.linuxplumbersconf.org/20...ations/1131/original/LP2013-Android-clang.pdf and https://events.linuxfoundation.org/sites/events/files/slides/2014-ABS-LLVMLinux.pdf
Development Thread: http://forum.xda-developers.com/and...-compile-android-5-0-2-qualcomm-llvm-t3035162
Credits:
[email protected] for looking at the first commit
[email protected] people whom stuff i used
[email protected] where i learned alot
[email protected] for finding out that you need libllvm3.6 (obsolete)
XDA:DevDB Information
Compile Android with Qualcomm LLVM 3.7.1 for devices with Qualcomm CPU, Tool/Utility for all devices (see above for details)
Contributors
MusterMaxMueller
Version Information
Status: Testing
Created 2015-02-19
Last Updated 2015-11-22

reserved

another reserved

Let me be the first to say CONGRATULATIONS FOR THIS, u rock man =) lets try these things, and thank you for sharing!

I've disabled clang for art and bionic entirely, so it just compiles with GCC. I wonder which would be better, since compiling with GCC allows for more optimizations such as graphite.

frap129 said:
I've disabled clang for art and bionic entirely, so it just compiles with GCC. I wonder which would be better, since compiling with GCC allows for more optimizations such as graphite.
Click to expand...
Click to collapse
Yes, we have to measure and compare
I hope together we can get the manpower to verify/test what is better. That is why i started this thread.
According to linaro, graphite doesnt make that big difference. too lazy to search for the link right now.
I suck at c and the compiler stuff but in my understanding the LTO implementation of CLANG also sounds very interesting.
will update OP.

MusterMaxMueller said:
Yes, we have to measure and compare
I hope together we can get the manpower to verify/test what is better. That is why i started this thread.
According to linaro, graphite doesnt make that big difference. too lazy to search for the link right now.
I suck at c and the compiler stuff but in my understanding the LTO implementation of CLANG also sounds very interesting.
will update OP.
Click to expand...
Click to collapse
Well, Graphite does somewhat improve memory(RAM) use, and using the latest ISL and CLooG (as seen in sabermod) has only increased this. There are other options, for example Polly (The clang/LLVM equivalent of graphite) could be used for better optimization. And I didnt know that Clang implemented LTO differently, Ill look into that.

Interesting at least, will check it out...

fixing stuff
fixed some stuff and may have a solution for compiler-rt and libc++
if everything works well i will update in 3h.

MusterMaxMueller said:
fixed some stuff and may have a solution for compiler-rt and libc++
if everything works well i will update in 3h.
Click to expand...
Click to collapse
I haven't gotten around to trying the SnapDragon LLVM, but since Qualcomm didn't release their source code, I compiled my own LLVM using stock LLVM sources with AOSP patches and some extra features enabled such a polly and libffi. https://github.com/frap129/host-llvm_3.5-prebuilt.git There's the link if anyone wants to test.

Got around to implementing this into my builds, and it seems that Qcom LLVM has all the same problems as my LLVM that I compiled (host not working, compiler_rt problems).
EDIT: Also, testing Link Time Optimizations (LTO)

frap129 said:
Got around to implementing this into my builds, and it seems that Qcom LLVM has all the same problems as my LLVM that I compiled (host not working, compiler_rt problems).
EDIT: Also, testing Link Time Optimizations (LTO)
Click to expand...
Click to collapse
All fixed well except for HOST which wont work with qCLANG. look at my new commit. im sure will help you with your problems.
There were also some derps...
will update OP. already uploaded to github. took longer then i thought. my pc needs more ram.
also did some performance and stability testing. all seems good so far
will upload proof-of-concept rom with good performance. but building takes some time. need more ram
if you have llvmgold.so compiled would you mind sharing?

MusterMaxMueller said:
All fixed well except for HOST which wont work with qCLANG. look at my new commit. im sure will help you with your problems.
There were also some derps...
will update OP. already uploaded to github. took longer then i thought. my pc needs more ram.
also did some performance and stability testing. all seems good so far
will upload proof-of-concept rom with good performance. but building takes some time. need more ram
if you have llvmgold.so compiled would you mind sharing?
Click to expand...
Click to collapse
Nope, but I can compile it if you want. Also, for the LTO implementation, add the flag -lto as well, and make sure you add it to linker (LD ) flags

I have just compiled CM12 with QCOM llvm.. Build is booting and running fine, however the android progress circle indicator has some occasional animation issues that were not present before.
Might check the commit later.. Maybe an Ofast flag is the reason for this?

WhiteNeo said:
I have just compiled CM12 with QCOM llvm.. Build is booting and running fine, however the android progress circle indicator has some occasional animation issues that were not present before.
Might check the commit later.. Maybe an Ofast flag is the reason for this?
Click to expand...
Click to collapse
Thanks for trying and reporting back.
I have the same issues with animation circle. I think it might be fvectorize-loops but could be Ofast and i have no idea which module is causing this. Maybe libpng.
in the past i had the same problem with one of archidroids optimizations but cant remember which one it was
still having performance issues with art if compiled with qCLANG. made some changes and will test tonight or tomorrow.

frap129 said:
Nope, but I can compile it if you want. Also, for the LTO implementation, add the flag -lto as well, and make sure you add it to linker (LD ) flags
Click to expand...
Click to collapse
thx,
if you wouldnt mind compiling i would appreciate. i hope its compatible.

MusterMaxMueller said:
thx,
if you wouldnt mind compiling i would appreciate. i hope its compatible.
Click to expand...
Click to collapse
Im getting some random errors with GCC when im building LLVM now for some reason. I'll resync the source and try again later

MusterMaxMueller said:
Thanks for trying and reporting back.
I have the same issues with animation circle. I think it might be fvectorize-loops but could be Ofast and i have no idea which module is causing this. Maybe libpng.
in the past i had the same problem with one of archidroids optimizations but cant remember which one it was
still having performance issues with art if compiled with qCLANG. made some changes and will test tonight or tomorrow.
Click to expand...
Click to collapse
Got the animation issue fixed by removing all math related flags like -ffast-math and -funsafe-math-optimizations..
Also, I've replaced all -Ofast flags with -O3 (because Ofast includes the ffast-math flag, which is the issue here)
Will check later which flags may be reapplied without causing the issue again.
The build has been running smooth and stable for more than 24 hours. However, I can't report an extremely big difference in ui responsiveness. But the rom zip size has increased by 2mb, so the changes have indeed been applied. :good:

WhiteNeo said:
Got the animation issue fixed by removing all math related flags like -ffast-math and -funsafe-math-optimizations..
Also, I've replaced all -Ofast flags with -O3 (because Ofast includes the ffast-math flag, which is the issue here)
Will check later which flags may be reapplied without causing the issue again.
The build has been running smooth and stable for more than 24 hours. However, I can't report an extremely big difference in ui responsiveness. But the rom zip size has increased by 2mb, so the changes have indeed been applied. :good:
Click to expand...
Click to collapse
thanks for testing!
could you maybe test which module is causing the problems? so we can only disable fast_math where its needed. i guess it might be libpng?
regarding perfomance:
did you use my second commit? this should have some minor improvements in some cases. but actually decrease in some others... i need to look up how to make better and automated benchmarking. the way i do it at the moment is to time consuming.
anyone knows something i could use for automated benchmarking? for example just a zip i could copy/flash and then run via terminal? dont want to have to write my own
or something else, any suggestions?
i made some changes and am now compiling with the qcom libs which comes with the toolchain and finally got the special optimized copy function working. see "-arm-opt-memcpy" in the doc.
i hope this will give some performance improvements. wish some luck
at first it gave some problem because everytime i use the qcom libs i have to make sure libgcc isnt used.
i try to think about a cleaner solution because at the moment the way i do it is a dirty hack
then i will upload to github

MusterMaxMueller said:
thanks for testing!
could you maybe test which module is causing the problems? so we can only disable fast_math where its needed. i guess it might be libpng?
regarding perfomance:
did you use my second commit? this should have some minor improvements in some cases. but actually decrease in some others... i need to look up how to make better and automated benchmarking. the way i do it at the moment is to time consuming.
anyone knows something i could use for automated benchmarking? for example just a zip i could copy/flash and then run via terminal? dont want to have to write my own
or something else, any suggestions?
i made some changes and am now compiling with the qcom libs which comes with the toolchain and finally got the special optimized copy function working. see "http://forum.xda-developers.com/showthread.php?t=3038250" in the doc.
i hope this will give some performance improvements. wish some luck
at first it gave some problem because everytime i use the qcom libs i have to make sure libgcc isnt used.
i try to think about a cleaner solution because at the moment the way i do it is a dirty hack
then i will upload to github
Click to expand...
Click to collapse
I used the two commits in OP.
libpng can't be the reason for this, as many Lollipop drawables, including the animation were talking about, are actually vector drawables, not png ones.
So the module that these vector drawables refer to returns broken numbers that make the animation look weird.
I think the safest way would be to skip ffast-math and Ofast completely, because the animation might just be THE ONE bug we actually see and notice. I don't want to imagine what else might happen in certain background processes, including the risk of broken data and random bugs.
By the way, looking forward to testing the qcom libs, along with lto.
How to add lto to the build?

Database Info

welcome