[app port] [1/23] dosbox with thumb2 dynrec - Windows RT Development and Hacking

hi all,
I've updated DOSBOX's dynrec core to take advantage of the THUMB2 ARMv7 instructions that must be supported on Windows RT. Among other things, this includes being able to move immediates to registers without using a PC relative load, 32-bit instructions that allow access to all registers, etc. I also added a PLI instruction to preload a cacheblock into the instruction cache before it is executed, and PLD instructions to prefetch data for a minor speedup.
Testing on doom timedemo, I get 3721 realtics, which is a modest 5% speedup from the standard dynrec core. Probably not having directX is a significant bottleneck on video drawing, this can get fixed once I get a wrapper around directinput working. Hopefully these changes can also help out mamaich's x86 pe loader out a well.
-- old post below--
I managed to get dosbox compiled with mamaich's new updates (see below), so we now have dosbox with the dynamic recompiling core. I also included network support which seems to work.
The dynamic recompiling core does give better performance for newer apps. 3982 realtics vs 18088 realtics on doom timedemo, so about a 5x gain
The next step would be to get a better graphics core. There is an opengl wrapper for DX11 called gl2dx that I'm looking into.
I'm also looking at putting at optimizing dynrec for thumb2, for example, it seems that CBZ might be useful (except it only lets you branch forward!!!)

Hmm, I have to unfortunately note that DOS4GW doesn't seem to work in this build. I'm working on it!
In the mean time, enjoy commander keen 1 -6?...

no2chem said:
Hmm, I have to unfortunately note that DOS4GW doesn't seem to work in this build. I'm working on it!
In the mean time, enjoy commander keen 1 -6?...
Click to expand...
Click to collapse
So in the end, I think the dynamic recompiler has some issues.
I know that the VirtualAlloc function is correctly setting execute permissions on memory pages, but I don't know if FlushInstructionCache is actually flushing the instruction cache. It's also possible that the dynamic recompiler is emitting bad assembly, but inspecting the code... doesn't look like it!

no2chem said:
So in the end, I think the dynamic recompiler has some issues.
Click to expand...
Click to collapse
The same issue for me (but I'm using a bit obsolete DosBox sources, need to update). May be this is due to EABI changes (for example registers are not preserved), as DosBox dynamic core targets Linux.

Regarding DosBox. Look here - http://vogons.zetafleet.com/viewtopic.php?t=31787
This is the ARMv7 optimized dynamic core for DosBox by M-HT. It is not yet integrated into the official DosBox branch.

If you're concerned about FlushInstructionCache not being implemented correctly, let me know. As part of my work to port Chromium, I've learned how to do cache management in ARM a little closer to the metal (moving the relevant values to the coprocessor registers directly) and have both more control (for example, clearing the data cache lines as well) and more precise control over exactly what is done to the cache lines.
However, bear in mind that I haven't been able to properly test this yet, as too much of the Chromium core still doesn't want to compile for me... *sigh*.

GoodDayToDie said:
If you're concerned about FlushInstructionCache not being implemented correctly, let me know.
Click to expand...
Click to collapse
As far as I see - FlushInstructionCache works as expected (flushes icache). DosBox crashes due to some other reason, maybe due to the ARM-THUMB interworking problems (it is just a guess).

Good to know. It would be annoying if the official API were wrong...

I've found whu DosBox dynamic core crashes at random.
All existing dynamic cores use ARM code, even the thumb files are using the ARM prologue. Though our CPUs can run the ARM code, there is a bug (?) in RT that switches ARM code to be executed as a THUMB at random. For example you call an ARM procedure 100000 times, but at 100001 something happens and the code starts to execute as THUMB. Maybe a task switch occurs, or maybe some interrupt happens, and when OS returns to your code - it sets T bit in CPSR, so the code is interpreted as THUMB.
I was able to modify the prologue generator in risc_armv4le-thumb-iw.h so that it generates only THUMB code, and now everything started to work in my project. Attached. Code is ugly and unfinished and uses some dirty C hacks, I'll fix them later as it is deep night now. But you may look on the changes and implement similar things in a DosBox build.

mamaich said:
I've found whu DosBox dynamic core crashes at random.
All existing dynamic cores use ARM code, even the thumb files are using the ARM prologue. Though our CPUs can run the ARM code, there is a bug (?) in RT that switches ARM code to be executed as a THUMB at random. For example you call an ARM procedure 100000 times, but at 100001 something happens and the code starts to execute as THUMB. Maybe a task switch occurs, or maybe some interrupt happens, and when OS returns to your code - it sets T bit in CPSR, so the code is interpreted as THUMB.
I was able to modify the prologue generator in risc_armv4le-thumb-iw.h so that it generates only THUMB code, and now everything started to work in my project. Attached. Code is ugly and unfinished and uses some dirty C hacks, I'll fix them later as it is deep night now. But you may look on the changes and implement similar things in a DosBox build.
Click to expand...
Click to collapse
I'll take a look at getting that implemented in my build.
Edit: Haven't looked into it much, but DosBox crashes whenever I open any 32-bit stuff.

mamaich said:
I've found whu DosBox dynamic core crashes at random.
All existing dynamic cores use ARM code, even the thumb files are using the ARM prologue. Though our CPUs can run the ARM code, there is a bug (?) in RT that switches ARM code to be executed as a THUMB at random. For example you call an ARM procedure 100000 times, but at 100001 something happens and the code starts to execute as THUMB. Maybe a task switch occurs, or maybe some interrupt happens, and when OS returns to your code - it sets T bit in CPSR, so the code is interpreted as THUMB.
I was able to modify the prologue generator in risc_armv4le-thumb-iw.h so that it generates only THUMB code, and now everything started to work in my project. Attached. Code is ugly and unfinished and uses some dirty C hacks, I'll fix them later as it is deep night now. But you may look on the changes and implement similar things in a DosBox build.
Click to expand...
Click to collapse
thanks for figuring that out! its a bit odd though, don't you think - wouldn't normal ARM apps crash if this was the case? I'm sure most stuff is compiled with interworking, maybe ms interworking convention is different?
Anyway, I was thinking, doesn't homm3 use directx? Is your emulation layer also doing conversion of that code? It seems like it would be useful to have native wrappers for old versions of DX so they can take direct advantage of DX11 native. Alas, that's for another day.... Also, mfc would be nice too... (its there... MS just didn't provide the .lib, ordinals are [noname]'d out, and .pdb from debug server seems to be missing some exports....)

no2chem said:
hi all,
I managed to get dosbox compiled with mamaich's new updates (see below), so we now have dosbox with the dynamic recompiling core. I also included network support which seems to work.
The dynamic recompiling core does give better performance for newer apps.
The next step would be to get a better graphics core. There is an opengl wrapper for DX11 called gl2dx that I'm looking into.
I'm also looking at putting at optimizing dynrec for thumb2, for example, it seems that CBZ might be useful (except it only lets you branch forward!!!)
Click to expand...
Click to collapse
How big performance gains are you talking about?

bartekxyz said:
How big performance gains are you talking about?
Click to expand...
Click to collapse
Eh, 3982 realtics vs 18088 realtics on doom timedemo, so about a 5x gain

Nice! That should (eventually) allow quite a few legacy apps, if they're old enough and/or not to performance-sensitive (i.e. not realtime) to run acceptably.

no2chem said:
doesn't homm3 use directx? Is your emulation layer also doing conversion of that code?
Click to expand...
Click to collapse
No conversion, I just pass DX to the native OS libraries with some minimal wrappers. RT provides even DirectX 1.0 interfaces
The only problem is that RT prohibits non 32-bit color modes, though video driver supports them (at least 16 bit color is supported by tegra driver).

no2chem said:
I managed to get dosbox compiled with mamaich's new updates
Click to expand...
Click to collapse
I'm using ARM/Tumb dynamic code created by M-HT: http://vogons.zetafleet.com/viewtopic.php?t=31787
I've changed just one function.
I'm also looking at putting at optimizing dynrec for thumb2, for example, it seems that CBZ might be useful (except it only lets you branch forward!!!)
Click to expand...
Click to collapse
Please post your code changes somewhere, so that I could use them in my project too
don't you think - wouldn't normal ARM apps crash if this was the case? I'm sure most stuff is compiled with interworking, maybe ms interworking convention is different?
Click to expand...
Click to collapse
NTARM kernel is a THUM2-only kernel. There is no interworking it it. So it is possible that MS code intentionally sets the T bit to THUMB when returning from interrupt or a task-switch. Unfortunately this means that we can't use ARM code in our programs (I have not seen any ARM code in MS progs, only THUMB2). Anyway, THUMB2 code is very efficient now, its speed can be the same as of ARM code, and size is much smaller.

Generally, THUMB2 should outperform ARM, yes. Also, the state of Thumb vs. Arm is actually set in a kind of weird way: rather than manually setting a processor bit, it's controlled by the jump address. Since even THUMB2 instructions are always 16-bit (2-byte) aligned, the least significant bit of an instruction address is never 1. Therefore, that bit is used (on any jump-type instruction) as a flag to indicate whether the processor should use ARM or THUMB2 mode. In theory, this should mean that the OS doesn't care about the mode being used... but I've never written an interrupt handler for ARM (a couple other platforms, but not ARM) and there might be something weird going on with them.

GoodDayToDie said:
rather than manually setting a processor bit, it's controlled by the jump address
Click to expand...
Click to collapse
This is true for normal jumps. And that is why we have to set lowest bit to 1 manually on indirect branches inside autogenerated THUMB code. But for CPU itself the mode is kept in T bit in CPSR. And "something" sets it to 1 "sometimes". I have not done much testing, but I think that this should be an interrupt, as I've seen that bit changed in the middle of a generated ARM code.
Anyway the THUMB2 code is better then ARM for us - it is more compact, so more code may be kept in cache (both in CPU and DosBox dynamic recompilator's).

updated with a thumb2 dynrec.

Touch input for mouse
For anyone that wants to use their touchscreen for mouse input, navigate to their %AppData%\Local\DosBox folder and modify the dosbox-0.74.conf file so that "Autolock=true" is "Autolock=false"
It works pretty well with the touchscreen and an on-screen keyboard for gaming and such

Related

C#, VB, Ruby, F#, and Python on Android (via Mono and DLR)

This all works on stock/nonroot phones
I got Mono running on Android.
http://www.koushikdutta.com/2008/12/mono-on-android-success-at-last.html
Started working on Java/C# interop and found out that DLR works on Mono:
http://www.koushikdutta.com/2009/01/microsoft-dlr-and-mono-bring-python-and.html
As a result, you can write applications in Python and Ruby on Android too now.
Anyhow, if anyone else is interested in working on this project with me, please let me know! I've already gotten all the relevent source hosted at Google Code: http://code.google.com/p/androidmono. Basically the next bit of work involves implemeting a Java interop using the DLR.
Nifty.
As for Dalvik & JIT, I think dexopt already replaces some heavy usage calls with inline native code. Hopefully dalvik vm will get full JIT in the future?
This makes me very happy!!!
Thank you for your work!!
jashsu said:
Nifty.
As for Dalvik & JIT, I think dexopt already replaces some heavy usage calls with inline native code. Hopefully dalvik vm will get full JIT in the future?
Click to expand...
Click to collapse
It doesn't do any inline native replacements: dexopt optimizes the dex file. Which includes inline dex byte code replacements. There is no JIT at all, but Google said it would be something definitely on the horizon. Personally I think DEX is a pretty stupid move on Google's part; they could have just gone with CIL-- and write a Java compiler for that instead. The Mono JIT compiler works on many platforms; so V1 of Android could have been running JIT compiled native code with that route... which is an order of magnitude better in performance.
I'm going to be doing some performance comparisons of Mono vs Dalvik; Mono will obviously win, but it will be interesting to see the margin. I'll also experiment with binding Mono to the Android runtime to create Android applications in C#.
Optimization
Virtual machine interpreters typically perform certain optimizations the first time a piece of code is used. Constant pool references are replaced with pointers to internal data structures, operations that always succeed or always work a certain way are replaced with simpler forms. Some of these require information only available at runtime, others can be inferred statically when certain assumptions are made.
The Dalvik optimizer does the following:
For virtual method calls, replace the method index with a vtable index.
For instance field get/put, replace the field index with a byte offset. Also, merge the boolean / byte / char / short variants into a single 32-bit form (less code in the interpreter means more room in the CPU I-cache).
Replace a handful of high-volume calls, like String.length(), with "inline" replacements. This skips the usual method call overhead, directly switching from the interpreter to a native implementation.
Prune empty methods. The simplest example is Object.<init>, which does nothing, but must be called whenever any object is allocated. The instruction is replaced with a new version that acts as a no-op unless a debugger is attached.
Append pre-computed data. For example, the VM wants to have a hash table for lookups on class name. Instead of computing this when the DEX file is loaded, we can compute it now, saving heap space and computation time in every VM where the DEX is loaded.
All of the instruction modifications involve replacing the opcode with one not defined by the Dalvik specification. This allows us to freely mix optimized and unoptimized instructions. The set of optimized instructions, and their exact representation, is tied closely to the VM version.
Most of the optimizations are obvious "wins". The use of raw indices and offsets not only allows us to execute more quickly, we can also skip the initial symbolic resolution. Pre-computation eats up disk space, and so must be done in moderation.
There are a couple of potential sources of trouble with these optimizations. First, vtable indices and byte offsets are subject to change if the VM is updated. Second, if a superclass is in a different DEX, and that other DEX is updated, we need to ensure that our optimized indices and offsets are updated as well. A similar but more subtle problem emerges when user-defined class loaders are employed: the class we actually call may not be the one we expected to call.
These problems are addressed with dependency lists and some limitations on what can be optimized.
Click to expand...
Click to collapse
Koush said:
Personally I think DEX is a pretty stupid move on Google's part; they could have just gone with CIL-- and write a Java compiler for that instead. The Mono JIT compiler works on many platforms; so V1 of Android could have been running JIT compiled native code with that route... which is an order of magnitude better in performance.
I'm going to be doing some performance comparisons of Mono vs Dalvik; Mono will obviously win, but it will be interesting to see the margin. I'll also experiment with binding Mono to the Android runtime to create Android applications in C#.
Click to expand...
Click to collapse
Google had different objectives, they didn't go after maximum performance. Remember, that handsets have different constrains than desktops and laptops. So they went after minimizing RAM usage (byte code interpreter => maximum possible sharing of read-only memory pages among processes) and battery life. Performance had to be acceptable, not priority.
You would not be able to fit everything into RAM if you used Mono and you would get the patent problems with Net/Mono/etc as a bonus.
lu_tze said:
Google had different objectives, they didn't go after maximum performance. Remember, that handsets have different constrains than desktops and laptops. So they went after minimizing RAM usage (byte code interpreter => maximum possible sharing of read-only memory pages among processes) and battery life. Performance had to be acceptable, not priority.
You would not be able to fit everything into RAM if you used Mono and you would get the patent problems with Net/Mono/etc as a bonus.
Click to expand...
Click to collapse
.NET, C#, IL, et al are all ECMA standards. Mono is LGPL/GPL. There are no patent or licensing issues with it that is unfamiliar to the OHA. They reuse plenty of open source projects.
An interpreter is not power efficient OR performant, simply due to the fact is it doing the 10 times as much work to do the same thing as native code. In addition, Mono features an Ahead Of Time compiler (AOT) that would let you compile everything to native code before it even hits the phone (or just once, and cache it). Most of Android's power and memory optimizations currently comes from Google's application life cycle (activities can be killed and resumed at the system's whim) -- that has nothing to do with Dalvik. I'm not criticizing the API or the implementation, just the runtime.
They could have spent their time making the Mono runtime play nicely with the shared memory subsystem.
I'm rebuilding mono with a minimal configuration to check out the disk and memory footprint.
Koush said:
Not that most of you will care, but I got Mono running on Android.
Click to expand...
Click to collapse
I'm a noob.... how can I install this? Very good Job Kush!
pic.micro23 said:
I'm a noob.... how can I install this? Very good Job Kush!
Click to expand...
Click to collapse
I haven't released anything yet. I'm trying to figure out how to statically link all it's dependencies, minimize the size, bind to the Android runtime, convert DEX to CIL and then CIL to ARM, and all sorts of other goodness. Basically a lot of experimenting to do before anything is "released". It's just in proof of concept phase right now.
Dalvik sucks
http://www.koushikdutta.com/2009/01/dalvik-vs-mono.html
Koush said:
Dalvik sucks
http://www.koushikdutta.com/2009/01/dalvik-vs-mono.html
Click to expand...
Click to collapse
Lol. nice article Koush. It's surprising mono is that much faster
Koush said:
I haven't released anything yet. I'm trying to figure out how to statically link all it's dependencies, minimize the size, bind to the Android runtime, convert DEX to CIL and then CIL to ARM, and all sorts of other goodness. Basically a lot of experimenting to do before anything is "released". It's just in proof of concept phase right now.
Click to expand...
Click to collapse
Just read the dalvik vs mono article. It's certainly interesting work. I agree that by ignoring JIT they're certainly not going after the most beneficial optimizations. That said, I don't think it's something they've completely excluded from future implementation.
I think you should consider trying to get Mono added as an external project. If nothing, having unofficial support for a vm which supports C#/CIL could bring in a significant amount of developer interest from the WinMo dev community. The coretech team would be the folks to set up a new project.
I've now gotten mono working on all G1s. You don't need Debian OR root. Still a couple kinks to work out, but I have it on the market for anyone interested in playing with it. More information at the link below.
http://www.koushikdutta.com/2009/01/mono-for-android-now-available-on.html
I thought this was only a platform for development but its made my g1 much faster and reduced memory from steel and stock browsers as well. My market is still 12 mb and mono is about 11 mb. Is this normal?
Also everytime, I run mono, does it do the same thing it does the first time it was installed and opened?
Sorry, i know nothing about mono but i can tell its definitely optimizing the performance on the g1 though.
great work koushe,
hbguy
This awesome, I have been waiting to see how this all turned out after reading your first post about it a few days ago.
hbguy said:
I thought this was only a platform for development but its made my g1 much faster and reduced memory from steel and stock browsers as well. My market is still 12 mb and mono is about 11 mb. Is this normal?
Also everytime, I run mono, does it do the same thing it does the first time it was installed and opened?
Sorry, i know nothing about mono but i can tell its definitely optimizing the performance on the g1 though.
great work koushe,
hbguy
Click to expand...
Click to collapse
Koush would know more than I would, but installing mono shouldn't affect everything else on Android. It's not like everything is suddenly using mono instead of dalvik. I suspect you have a strong case of the placebo effect
hbguy said:
I thought this was only a platform for development but its made my g1 much faster and reduced memory from steel and stock browsers as well. My market is still 12 mb and mono is about 11 mb. Is this normal?
Also everytime, I run mono, does it do the same thing it does the first time it was installed and opened?
Sorry, i know nothing about mono but i can tell its definitely optimizing the performance on the g1 though.
great work koushe,
hbguy
Click to expand...
Click to collapse
Yeah, you're just imagining things I haven't even attempted like DEX->CIL yet.
The first APK of Mono is quite large though. I've updated it with a number of bug fixes and also am making it use eglib now. This trimmed the size by a few MB. Getting Mono to work with Bionic might not be possible... (that would trim off another 2MB).
Once again, the APK is just a developers release... something to play with and test.
I have been messing with mono on my G1.
Is it safe to say it will only work with command line apps?
I got my own hello world and a few other things running, but if I try and run any sort of gui I get errors.
Yes, only command line stuff will work. WinForms will not work on Android.
However, you should be able to get OpenGL ES working via PInvoke. I haven't tried it, but it should work just fine.
Koush said:
Yes, only command line stuff will work. WinForms will not work on Android.
Click to expand...
Click to collapse
That is what I thought, I think PInvoke is just a bit out of my skill set.
Thanks for the work none the less.
I got mono building in the Android build environment, and the Mono team accepted a patch to make it work on Android. There's also some changes external to Mono which can be found at the androidmono google code repository:
http://code.google.com/p/androidmono/

Graphics APIs for WM6

Hello,
I've spent the last month trying desperately to find a free 2D (or 3D, but not required) graphics API I can use for high performance games on Windows Mobile. I initially set about trying to find a managed API to use, but now I've broadened my search to include any API (that I can call from C++ or .NET), and I'm still struggling to find anything.
The options seem to be:
- GDI: not nearly fast enough for high performance games
- DirectDraw: probably OK, but doesn't seem possible to use this on my HTC Touch Pro 2 due to memory problems (see http://blogs.msdn.com/windowsmobile/archive/2009/04/17/twisted-pixels-3-memory-mysteries.aspx -- I've got the same problem and have not yet found any way to work around this)
- Direct3D: no hardware driver on my Touch Pro 2, this renders about 0.2 frames per second in the samples, which is not good enough
- OpenGL: I've tried and tried, and can't get any samples working for this. The closest I've found is the tutorials here: http://www.zeuscmd.com/tutorials/opengles/index.php, but these all fail with an error, "Unable to create OpenGL|ES context" as soon as I run them (or alternatively using the "Ug" version, I get no window appearing at all).
Does anyone have any suggestions as to how I can progress from here? I really want to write some Windows Mobile games, but I can't even get started. :-(
Thanks,
Adam.
For 'better' graphics performance have a look at the 2D/3D Driver Development for MSM720X devices!
After installation of this driver pack try my OpenGL test app (source also available) found in my sig! Installing this driver pack will increase d3d performance, too!
Hi heliosdev,
Many thanks for posting the sourcecode for your OpenGL test -- I have that compiling very nicely here and producing very promising results too. I think this may finally be the answer I've been looking for.
Do you have any idea how much of a performance hit using PInvoke to interact with OGLES is likely to be? I don't know whether PInvoke is slow to use or not, but it strikes me that it may be slower use it hundreds of times per frame compared to coding directly in C++ and not needing to PInvoke at all..?
Thanks again,
Adam.
For comparison NuShrike implemented torus test in C++. As you can see the difference is 'minimal' even NuShrike optimized it using vertex buffer objects (I'm using standard vertex arrays and just triangles, i.e. no triangle strips).

[WARNING] Compiler optimizations!

I just noticed this again and it worries me alot! Devs here are usually posting their changes as being different than others, more even about compiler optimizations and yet most of you make huge mistakes.
Personally I've been running Gentoo linux for about ten years now and what I learned from running different setups with space limitations is that one should NEVER use -Os and -O3 together as this will not only take much more time to compile it also GROWS executables.
-Os is basically the same as -O3, but it adds some flags that won't benefit executable size vs memory usage.
In general executables compiled with only -Os are nearly as optimized as -O3/2 and resulting in smaller executable. While compiled with -O3 uses more memory allocation for extra speed. While using both together will result in larger executables, more memory usage AND much longer compile times vs just -O2.
So devs can you all please remove -Os from your "optimizations" and leave just -O2? Bragging about "more compiler optimizations" really fails if you fail to understand why the -Ox optimizations levels have been added in the first place... Which is to NOT use them together!
Also -O3 might introduce more IO operations, which might have a negative effect on our snappies.
The choice: Summarised
-O2 is the default optimization level. It is ideal for desktop systems because of small binary size which results in faster load from HDD, lower RAM usage and less cache misses on modern CPUs. Servers should also use -O2 since it is considered to be the highest reliable optimization level.
-O3 only degrades application performance as it produces huge binaries which use high amount of HDD, RAM and CPU cache space. Only specific applications such as python and sqlite gain improved performance. This optimization level greatly increases compilation time.
-Os should be used on systems with limited CPU cache (old CPUs, Atom ..). Large executables such as firefox may become more responsive using -Os.
Click to expand...
Click to collapse
From: http://en.gentoo-wiki.com/wiki/CFLAGS
Quote from the wiki articale you provided:
"Note that only one of the above flags may be chosen. If you choose more than one, the last one specified will have the effect."
tincmulc said:
Quote from the wiki articale you provided:
"Note that only one of the above flags may be chosen. If you choose more than one, the last one specified will have the effect."
Click to expand...
Click to collapse
My point is that in general people here seem to use "-Os -O3", which is dumb really as it had no benefits. -O2 should be better performance wise on our phones.
great post, thank you man!
Now silently pray for our devs to change their -O3 -Os to -O2.
Or I might need to do my own branch of "optimized" nightlies.
For example what you declare with -march=A -mcpu=B will intact be -march=A -mtune=B.
edit: I think the gcc compiler even throws a warning about l2p etc.
Wouldn't the best choice be just -Os? I would have thought that optimising for systems with limited cache would be appropriate for portable devices.
Pipe Merchant said:
Wouldn't the best choice be just -Os? I would have thought that optimising for systems with limited cache would be appropriate for portable devices.
Click to expand...
Click to collapse
I highly doubt executable sizes on our phones are big enough to benefit from -Os.
In the case of my router at home -Os doesn't give any benefit at all, in fact packet inspection is slightly slower vs -O2.

Understanding Android platform in a nutshell (in layman's terms)

System.out.println("Hello peoples");
==>The purpose of this guide is to help people who don't know anything about programming,aren't modders,guys with knowledge about technology.
==>Initially I loved computers and their capabilities and have a little knowledge on the C and Java languages and just how computers (think).
You have to understand that computing tries to emulate human behaviors on how to solve problems.This is where programming kicks in.
So what is Android?
===================
1)Android in general is an operating system that was meant to run on mobile devices e.g. cell phones
and has expanded to tablets and notebooks.Android is divided into three language groups:
a)the system's framework and apps are written in java.
b)the Android's core [kernel] is pure C -language.
c)the Android's libraries are written in C++.
App libraries are called by apps that need more functionality that java can't provide.These are usually plugins like
decoders e.g. ffmpeg libraries that are used in video decoding,flash-players e.t.c. Here,java native methods are used and the android NDK platform
is used to enable the java apps call these libraries during execution.(Please read further on Android SDK,NDK and Jni).
So what happens when you have just flashed that new Rom.
====================================================================
First you have to understand that cell phones have their own embedded firmware not including the recovery and you will see why.
a)the recovery partition can be flashed to install aftermarket recovery roms.So even if you mess your recovery,you can still install again.(This varies with different phones).
b)their is that system chip which you cannot touch and there is a reason for it.Think of this partition like a PC BIOS.If you mess with your bios
your system is toast aka Bricking the system.Since phones are classed as embedded systems,manufactures don't want people messing with it as it would result into
cryptic errors and system vulnerability.
when the on button is pressed something simple yet complicated happens:
1==>The kernel which is compressed to save space usually in a (zImage) format is deflated or expanded.Since your NAND chips are partitioned,the kernel is given a very
special chunk in which is protected from user data.
2==>The kernel finds which base address it needs to start executing e.g.(0x00200000) and mostly when you put a wrong kernel base
address your phone enters into a boot-loop because arguments are being passed to invalid locations.It is important to know where your kernel base address
starts.You can try looking it up in your kernel sources(try searching for the mach msm folder and into the makefile) or just goggle it up or use Xda-Kitchen.
3==>with the correct kernel base address,execution starts.Usually the (init.rc) file in the (ramdisk folder) gives symlinks and creates structures i.e. folders that will house
modules and sets the correct paths to android files and framework.
4==>After arguments have successfully been passed,the handles are now passed to Android.Basically Android checks for (init.d) scripts that are available
this is true to GingerBread and Cyanogenmod 7.After that audio checks are done followed by camera services and then arguments are passed to (core.jar),a
critical framework file which is huge around 50Mb in size in CyanogenMod 7.
5==>Here the DalvikVM (Dalvik virtual machine) is called and the process of optimizing your system files starts.The framework get optimized first as this contains critical
code needed to run your device.Then your flingers and renderers and called.This are engines used by android e.g. (pixelflnger) which is used in touchscreen.
Your system's sensors are usually started around this point (your compass,light sensors e.t.c)your phone apps e.g. contacts,calendar get optimized around this point
and depending on the number of apps your manufacturer installed,it will take some time.
6==>your network get's activated around this point and is probably the time the capacitive buttons and lights on your gadget light up.This is usually a good indication
that your system has loaded.When your bootanimation ends the handles are passed to activate your home launcher.There is usually still a lot of activity going
on to fully ready your system and this is why if you try to use it immediately especially on low-end phones,your system lags or get (not responding) errors.
common misconceptions
=====================
1==>There is are reason why goggle gave minimum specifications needed to run Android because this is a full operating system(OS) unlike the past relic
phones that ran on 50MHz processors.
2==>please don't complain that some of your Ram is not the same as specified on the phones catalog(e.g. you have 256Mb of Ram but in the task manager it indicates you have 179Mb).
There is a reason to it.There are core processes that eat a chunk of your Ram and are hidden so that you don't try silly stuff like trying to kill them in thinking that your are trying to get more memory.
Think of it like this,it's like trying to kill (services.exe or svchost.exe) processes in Windows.You will just be trying to get system hangs,bluescreens
or just a system crash.
Will Add more info later.Please feel free to correct anything i might have not addressed properly or share your views.
Happy modding.
Kernel topic
============
In simple terms,the kernel is the core of any operating system e.g. windows, Mac,or any Linux distro like Ubuntu and Android.Android kernels come mostly
in the form of a compressed kernel (zImage).The kernel is written in pure C-language,which gives it direct access to memory and registers unlike java
which has to pass through the java VM(virtual machine).This makes code written in C-language to be very fast and robust but also dangerous.
==>Many Androids in the eclair regime ran on kernel 2.6.29. This was not a complete kernel and as by my experience there was alot of code missing from it.
2.6.29
======
==>a lot of androids did not have adb functionality due to the framework being embedded to allow USB mounting to PC.This was a very rigid method
of doing it(also a very old method).
==>In the case of other devices, when viewing the internal task manager,many processes were viewed as (0.00) byte files.In essence you could not determine
the amount of RAM your app was taking.This is true in the case of huawei u8120.
==>In the case of shutting down the phone,even in some cases under load,it did so very fast.It killed threads and handles mercilessly. Many people misunderstood
this concept and thought their phones ran faster as compared to 2.6.32 kernel.
2.6.32.9
========
Here there was a ton of improvements as developers and modders became more aware of support and tweaks.
==>Fixed issues like the internal task managers.It was now possible to accurately know how much RAM your apps were taking.
==>Resolved how the android system shutdown.Instead of merciless killing of handles and threads still running,it killed them appropriately.This is why
when shutting down your system takes a while.You can use adb to see these events.
==>Usb mounting to pc was also made somewhat generic and flexible across many devices.
==>It did change some methods on how the camera is being accessed mostly in eclair,donout and earlier versions of android phones.This issue made
cameras not function.
==>wifi methods were also changed and developers and modders had to re-write their code to allow compatibility.
==>This was also the year of many froyo phones and overclocking was a common thing.
extra notes
===========
During eclair era,many developers were in such a hurry to produce new android phones every few months that they never even thought of long-term support
to newer versions of android that would come much later.This is where i have to praise Iphones for standardisation of its OS across all of its devices.
==>If you try to copy paste code from 2.6.32.9 and paste it into 2.6.29 and expect it to work then expect tons of errors when compiling.The (g_android)
module was properly coded into 2.6.32.9 so if you try enabling it an older kernel,expect errors.
==>If you are a modder and looking to buy a new phone,please see if it has a fan base of coders or support.Avoid buying phones where you will not get
any support from the manufacturers or other devs e.g Huawei Technologies.This company sucks alot.After they produce a phone they forget about you the customer
so you will have to handle the upgrades all by yourself.
Do not use this command (make -i)when compiling kernels.It will skip errors and you may smile after it's finished but the end is just a tragedy.your kernel is bound
not to function properly or even function at all.
Happy modding.

Porting Chromium to Windows RT

So, I've been at this for about 48 hours now (not continuously, but closer than you might think) and I figured I should take a break from modifying project files and puzzling over alignment issues to discuss the project, share some of the problems I've been having and ask if anybody can help, and so on.
The general idea is "Chromium build for Windows (on x86/x64) and build on ARM (for Linux), so there must be a way to build it for Windows on ARM". For the most part, that even looks like it's true. Probably at least 80% of 654 Visual Studio projects (no, that's not a joke) either build just fine with only minor amounts of work, or are things that we don't actually need (I'll try building the test suites... once everything else builds!!)
Areas that have given me problems (caution: some chance of brief rants ahead):
v8. Less than you might think, though. Setting the flags for Arm seems to have been enough.
Sandbox. There's a fair bit of thunking coded in assembly going on in the sandbox for x86. Not sure what's up with it (I don't know exactly how the Chromium sandbox works) but it'll have to come out or be replaced. The Linux (including ARM) sandbox seems to be SELinux-based, which doesn't help at all.
Native Client (NaCl). I think all the assembly is in test code, though, so I may just boldly #ifdef if all away.
libjpg-turbo (libjpg). Piles of carefully optimized assembly... for x86 and x64. There is a set of ARM assembly (for Linux) that Visual Studio won't compile, but something else might... or I may tweak until it works. Of course, I could also just accept the speed hit and use the version of libjpg implemented in nice, portable C.
Anything where the developers tried to use some SSE to speed things up. I may be able to replace it with NEON code, or I may just remove it and hope **** doesn't break. We'll see.
Inline assembly in general. Even when it's ARM assembly, Visual Studio / CL.exe don't want anything to do with it (__asm is apparently now an invalid keyword). I suspect I'll have to just pull the assembly out into stand-alone functions in their own files, then compile them to object files and link them back in later. If I can figure out the best way to do this (for example, I'll want to inline the asm functions) then it shouldn't impact performance. Seriously though, I kind of hate inline assembly. I can read assembly just fine, but I'm usually staring at it in a debugger or disassembly tool, not in the middle of source code I'm trying to build...
Everywhere that the current state of the CPU is cared about (exception and crash handlers, in particular) because the CONTEXT structure is, of course, CPU-specific. They're pretty easy to get past, though.
Low-level functions, like MemoryBarrier. Fortunately, it's implemented in ntdll.h... but as a macro, which breaks at least half the places it's referenced. Solution: where it breaks things, undefine the macro and just have it be an inline function that does what the macro did.
Running out of memory. Not even joking... well, OK, a little bit. I've got 32GB; I won't actually run out. Both Visual Studio and cl.exe do at times, though!. Task Manager says VS is currently using 1,928 MB, and before I restarted it, it broke 2.5GB private working set. Pretty good for a program that for some reason is still 32-bit...
Goddamn compiler flags. Seriously, every single project (I mentioned there are over 600, right?) has its LIBPATHs hardcoded to point at x86. Several projects have /D:_X86_ or similar (that's supposed to be set by the build tools, not the user, you idiots...) which plays merry hell with the #ifdef guards. Everything has /SAFESEH specified, not in the actual property table where the IDE could have removed it (unneeded and invlaid on ARM) but in the "extra stuff we'll pass on the build command line" field, which means every single .EXE/.DLL project must be modified or the linker will fail.
My current biggest goal is the JPG library; nobody wants to use a browser without it. After that, I'll tackle the sandbox, leaving NaCl for last... well, last before whatever else crops up.
Anyhow, thoughts/comments/advice are welcome... in the mean time, I'm going to go eat something (for the first time in ~22 hours) and then get some sleep.
Kudos for having the patience to look though this monster.
It's my understanding that NaCl is still a pretty niche thing at the moment. Is it possible to easily either disable it or completely hack it out, or do other more critical parts of Chromium now depend on it?
I don't think anything truly depends on it. I'll look in the VS dependency hierarchy and see how many things list it, and how awful it would be to remove them.. after I get the other stuff working. I may pass on the sandbox as well, if possible; it makes the security guy in me cringe something awful, but as they say, shipping is a feature..
great
Please make that happen !
Working on it! I've gotten over half of the projects to build and link, but some other stuff is adamantly refusing to work. I'm beginning to suspect I'll need to work from the other direction - rather than starting at the bottom and building all the dependencies, then combining them into browser components, and then eventually combining all the components into a complete piece of software, I may have to work from the top, removing components until the whole thing builds (at which point it will likely be useless, or all-but) and then seeing what I can add back in. I thought it would be faster to just assume everything can be made to work and only exclude something if it proved intractable, but at this point I've got a ton of very small components and almost no ability to combine them.
It would also help if VS was better at managing such truly immense tasks. For example, I have no simple graph of what all is and is not building, so I'm being forced to manually map that onto the VS dependency tree and see what is blocking a given component from building successfully, and how much is dependent upon it, one erroring project at a time (and there are a *lot* of erroring projects - my last attempt to build any substantial part of the system saw 50 of 400 projects fail).
GoodDayToDie said:
Working on it! I've gotten over half of the projects to build and link, but some other stuff is adamantly refusing to work. I'm beginning to suspect I'll need to work from the other direction - rather than starting at the bottom and building all the dependencies, then combining them into browser components, and then eventually combining all the components into a complete piece of software, I may have to work from the top, removing components until the whole thing builds (at which point it will likely be useless, or all-but) and then seeing what I can add back in. I thought it would be faster to just assume everything can be made to work and only exclude something if it proved intractable, but at this point I've got a ton of very small components and almost no ability to combine them.
It would also help if VS was better at managing such truly immense tasks. For example, I have no simple graph of what all is and is not building, so I'm being forced to manually map that onto the VS dependency tree and see what is blocking a given component from building successfully, and how much is dependent upon it, one erroring project at a time (and there are a *lot* of erroring projects - my last attempt to build any substantial part of the system saw 50 of 400 projects fail).
Click to expand...
Click to collapse
I thinkt tht is a mutch better taktic and mutch less frustrading.
I would love to see just a minimal version of it. After that all the small componens can follow.
50 of 400 is pretty good i think. Better then i expected
Bear in mind that the entire thing is 650 projects. If 50 fail at that level, many of the higher-level ones (dependent upon the lower-level) will fail too. I'll see what I can do. I may or may not be able to get v8 actually working (without it, the JS speed will be very bad, think IE8 at best) and I may have to fall back to the legacy libjpeg (which will cut JPEG render speeds by at least a factor of 2). Skia (2D drawing library used by Chrome) has a bunch of assembly optimizations that I need to get it to use the Arm version of instead. There's a couple of total hacks with the library files I've had to pull, which may or may not result in a working final build. We'll see.
GoodDayToDie said:
Bear in mind that the entire thing is 650 projects. If 50 fail at that level, many of the higher-level ones (dependent upon the lower-level) will fail too. I'll see what I can do. I may or may not be able to get v8 actually working (without it, the JS speed will be very bad, think IE8 at best) and I may have to fall back to the legacy libjpeg (which will cut JPEG render speeds by at least a factor of 2). Skia (2D drawing library used by Chrome) has a bunch of assembly optimizations that I need to get it to use the Arm version of instead. There's a couple of total hacks with the library files I've had to pull, which may or may not result in a working final build. We'll see.
Click to expand...
Click to collapse
the v8 engine ( used in nodejs ) has been ported to ARM :
I still can't link : htt p://ww w.it-wars.com/article305/compiler-node-js-pour-arm-v5
perhaps it will help you
Edit : oups, I just see that another great user of this forum made the port of nodejs to RT
Yep... but they did it without v8. That's not an encouraging result, but I feel like I'm so close...
Is there a GitHub repo so we can help or track the progress of the project ?
Sorry, not at present. There probably should be. The sheer size of the codebase is incredible (about 2.4GB) and having some way to share it practically would be good.
Also, I suspect this would go a lot faster if I don't have to repeat the work of others. I know that there's a working Webkit DLL out there, for example (though with several features, including the V8 JS engine, missing) and if I could get my hands on that it would drastically reduce the number of additional components I need to build. Currently I'm working on the sandbux, but expect that I will need to rip the whole thing out and basically have the browser run as though it was always passed the --no-sandbox parameter, at least for the first build. Too damn much assembly.
http://www.engadget.com/2013/01/22/google-chrome-native-client-arm-support/
This wouldn't have any impact on this project, would it?
Sent from my SCH-I535 using xda-developers app, complete with annoying signatures.
It probably means that NaCl on Windows RT will be possible in the future. At present, I'm cutting it out of the build - too much x86-specific stuff there to port it over myself, and it owuldn't be able to run x86-compiled NaCl code anyhow.
You might have bit off more than you could chew. It'd better if you put your current progress under version control on some public site so that other people may be able to help you.
It's a big and complex project. You are taking a lot of time, and understandably so. But just open up to other people and you could get this done faster.
Yeah, this is probably true. My life also got unexpectedly *busy* in the last week; a couple weeks ago I had many times as much free time as I do now, and so porting has slowed down.
My upload speed would take ages (literally probably at least a day of solid activity; it's embarassingly slow) to push the full source anywhere, but I may make the effort anyhow. I'll have to post it somewhere for GPL compliance in any case...
You may upload only the diff files, they'll probably be smaller then the whole distribution.
Not to pour cold water on you however, IE10 is already faster than the latest Chrome build in Windows Phone, Windows 8.
I don't see the point of this.
I have personally jumped from IE8 > FF > Chrome and finally back to IE10 over the years depends on its usability, smoothness, speed, etc
Speed isn't the only reason to use a browser. I actually prefer IE myself, but there are some things that other browsers do better than it (in the case of Chrome, parts of HTML5, the syncing across Google services, etc.) Also, Chrome gets updated far more often than IE; IE9 was equal with Chrome on speed at its release, and was far behind by the time IE10 came out.
The reason for this project, though, is a mixture of interest in what it takes, and a desire to benefit the community. Microsoft has deeped that only software which they have blessed may run on the Windows RT desktop. I disagree, and have chosen (among several other things) to port a web browser because I feel that it's important for users to have choice.
LastBattle said:
Not to pour cold water on you however, IE10 is already faster than the latest Chrome build in Windows Phone, Windows 8.
I don't see the point of this.
Click to expand...
Click to collapse
Some websites do not get along with the trident rendering engine. Some webdevs are so "Oh f*** IE I don't care" and block access to features just because it is IE. I have experienced this first hand on IE10 on my surface where it tells me to come back when I have a decent browser, only to not have the choice to do that.
This really isn't the webdevs fault either, for years IE was the scum of the internet, only recently has IE caught up to the rest of the browsers (and in my opinion exceeded some) but the years of IE being bad have left a lot of disjointed webdevs who won't even consider giving the latest IE a chance.

Categories

Resources