LZX Compression - Wing, P4350 ROM development

I made a kitchen that creates ROMs in both XPR and LZX compressions.
I'm also trying to port it to the Artemis, the Trinity, and the Hermes.
The way the kitchen work is:
Run "RunMe.bat"
Choose compression algorithm. (XPR or LZX)
Follow the normal Bepe's kitchen process.
Wait as the kitchen creates the ROM (like Bepe's kitchen, but with whatever compression you chose.)
The kitchen will automatically open up the imgfs.bin in a hex editor and automatically adjust it for the wanted compression before it builds the ROM.
It automatically inserts the proper XIP drivers.
It will automatically set the Pagepool to 4MB but give you the option to change it to something else as it does.
It then automatically creates the NBH and then finally launches whatever flasher (CustomerRUU, FlashCenter, or whatever your devices use) to flash the ROM.
For those who don't know what LZX compression is:
It's a compression algorithm that, although slower (by 1-4% in real life use) gives a good amount of free storage space. In some case (like in the Herald) it makes the ROM so small that it has to be flashed through an SD card due to the Herald's flashing size requirements. On an average 50mb ROM, it takes off about 10mb. The actual cooking itself does take a LOT more CPU and RAM to do in your PC, though. Especially the RAM. (It's because the tools that actually do the compression weren't really optimized for the job.)
Anyhow, let me know if you want it.

I would love to use it if you're willing to share. I really want to get into cooking. I just don't have time these days.
Here is a bit that one of my buddies wrote on compression, for those interested. He's actually discussing LZMA (which is what 7zip uses by default), which is similar to the LZ1 dictionary (the same dictionary used in LZX compression algorithms). This should help people to understand what settings to use depending on the situation.
Hehe. I've long been fascinated by compression algorithms ever since studying the original Lempel-Ziv algorithm some years ago.
The key here is the dictionary, which, in a simplified nutshell, stores patterns that have been encountered. You encounter pattern A, compress/encode it, and then the next time you see pattern A, since it's already in the dictionary, you can just use that. With a small dictionary, what happens is that you'll encounter pattern A, then pattern B, C, D, etc., and by the time you encounter pattern A again, it's been pushed out of the dictionary so that you can't re-use it, and thus you take a hit on your size. All compression algorithms basically work on eliminating repetition and patterns, so being able to recognize them is vital, and for dictionary-based algorithms (which comprise most mainstream general-purpose algorithms), a small dictionary forces you to forget what you saw earlier, thus hurting your ability to recognize those patterns and repetitions; small dictionary == amnesia.
Solid compression is also important because without it, you are using a separate dictionary for every file. Compress file 1, reset dictionary, compress file 2, reset dictionary, etc. In other words, regular non-solid compression == amnesia. With solid compression, you treat all the files as one big file and never reset your dictionary. The downside is that it's harder to get to a file in the middle or end of the archive, as it means that you must decompress everything before it to reconstruct the dictionary, and it also means that any damage to the archive will affect every file beyond the point of damage. The first problem is not an issue if you are extracting the whole archive anyway, and the second issue is really only a problem if you are using it for archival backup and can be mitigated with Reed-Solomon (WinRAR's recovery data, or something external, like PAR2 files). Solid archiving is very, very important if you have lots of files that are similar, as is the case for DirectX runtimes (since you have a dozen or so versions of what is basically the same DLL). For example, when I was compressing a few versions of a DVD drive's firmware some years ago, the first file in the archive was compressed to about 40% or so of the original size, but every subsequent file was compressed to less than 0.1% of their original size, since they were virtually duplicates of the first file (with only some minor differences). Of course, with a bunch of diverse files, solid archiving won't help as much; the more similar the files are, the more important solid archiving becomes.
The dictionary is what really makes 7-Zip's LZMA so powerful. DEFLATE (used for Zip) uses a 32KB dictionary, and coupled with the lack of solid archiving, Zip has the compression equivalent of Alzheimer's. WinRAR's maximum dictionary size is 4MB. LZMA's maximum dictionary size is 4GB (though in practice, anything beyond 128MB is pretty unwieldy and 64MB is the most you can select from the GUI; plus, it doesn't make sense to use a dictionary larger than the size of your uncompressed data, and 7-Zip will automatically adjust the dictionary size down if the uncompressed data isn't that big). Take away the dictionary size and the solid compression, and LZMA loses a lot of its edge.
In the case of the DirectX runtimes, because of their repetitive nature and the large size of the uncompressed data, a large dictionary and solid compression really shines here, much more so than it would in other scenarios. My command line parameters set the main stream dictionary to 128MB, and the other much smaller streams to 16MB. For the 32-bit runtimes, where the total uncompressed size is less than 64MB, this isn't any different than the Ultra setting in the GUI, and 7-Zip will even reduce the dictionary down to 64MB. For the 64-bit runtimes, the extra dictionary size has only a modest effect because 64MB is already pretty big, and also because the data with the most similarity are already placed close to each other (by the ordering of the files). The other parameters (fast bytes and search cycles) basically makes the CPU work harder and look more closely when searching for patterns, but their effect is also somewhat limited. In all, these particular parameters are only slightly better than Ultra from the GUI, but I much rather prefer running a script than having to wade through a GUI (just as I prefer unattended installs over wading through installs manually).
Oh, and another caveat: Ultra from the GUI will apply the BCJ2 filter to anything that it thinks is an executable. This command line will apply it to everything. Which means that this command line should not be used if you are doing an archive with lots of non-executable code (in this case, the only non-executable is a tiny INF file, so it's okay). This also means that this command line will perform much better with files that don't get auto-detected (for example, *.ax files are executable, but are not recognized as such by 7-Zip, so an archive with a bunch of .ax files will do noticeably better with these parameters than with GUI-Ultra). If you want to use a 7z CLI for unattended compression but would prefer that 7-Zip auto-select BCJ2 for appropriate files, use -m0=LZMA:d27:fb=128:mc=256 (which is also much shorter than that big long line; for files that do get BCJ2'ed, it will just use the default settings for the minor streams).​

dumpydooby said:
I would love to use it if you're willing to share. I really want to get into cooking. I just don't have time these days.
Here is a bit that one of my buddies wrote on compression, for those interested. He's actually discussing LZMA (which is what 7zip uses by default), which is similar to the LZ1 dictionary (the same dictionary used in LZX compression algorithms). This should help people to understand what settings to use depending on the situation.
Hehe. I've long been fascinated by compression algorithms ever since studying the original Lempel-Ziv algorithm some years ago.
The key here is the dictionary, which, in a simplified nutshell, stores patterns that have been encountered. You encounter pattern A, compress/encode it, and then the next time you see pattern A, since it's already in the dictionary, you can just use that. With a small dictionary, what happens is that you'll encounter pattern A, then pattern B, C, D, etc., and by the time you encounter pattern A again, it's been pushed out of the dictionary so that you can't re-use it, and thus you take a hit on your size. All compression algorithms basically work on eliminating repetition and patterns, so being able to recognize them is vital, and for dictionary-based algorithms (which comprise most mainstream general-purpose algorithms), a small dictionary forces you to forget what you saw earlier, thus hurting your ability to recognize those patterns and repetitions; small dictionary == amnesia.
Solid compression is also important because without it, you are using a separate dictionary for every file. Compress file 1, reset dictionary, compress file 2, reset dictionary, etc. In other words, regular non-solid compression == amnesia. With solid compression, you treat all the files as one big file and never reset your dictionary. The downside is that it's harder to get to a file in the middle or end of the archive, as it means that you must decompress everything before it to reconstruct the dictionary, and it also means that any damage to the archive will affect every file beyond the point of damage. The first problem is not an issue if you are extracting the whole archive anyway, and the second issue is really only a problem if you are using it for archival backup and can be mitigated with Reed-Solomon (WinRAR's recovery data, or something external, like PAR2 files). Solid archiving is very, very important if you have lots of files that are similar, as is the case for DirectX runtimes (since you have a dozen or so versions of what is basically the same DLL). For example, when I was compressing a few versions of a DVD drive's firmware some years ago, the first file in the archive was compressed to about 40% or so of the original size, but every subsequent file was compressed to less than 0.1% of their original size, since they were virtually duplicates of the first file (with only some minor differences). Of course, with a bunch of diverse files, solid archiving won't help as much; the more similar the files are, the more important solid archiving becomes.
The dictionary is what really makes 7-Zip's LZMA so powerful. DEFLATE (used for Zip) uses a 32KB dictionary, and coupled with the lack of solid archiving, Zip has the compression equivalent of Alzheimer's. WinRAR's maximum dictionary size is 4MB. LZMA's maximum dictionary size is 4GB (though in practice, anything beyond 128MB is pretty unwieldy and 64MB is the most you can select from the GUI; plus, it doesn't make sense to use a dictionary larger than the size of your uncompressed data, and 7-Zip will automatically adjust the dictionary size down if the uncompressed data isn't that big). Take away the dictionary size and the solid compression, and LZMA loses a lot of its edge.
In the case of the DirectX runtimes, because of their repetitive nature and the large size of the uncompressed data, a large dictionary and solid compression really shines here, much more so than it would in other scenarios. My command line parameters set the main stream dictionary to 128MB, and the other much smaller streams to 16MB. For the 32-bit runtimes, where the total uncompressed size is less than 64MB, this isn't any different than the Ultra setting in the GUI, and 7-Zip will even reduce the dictionary down to 64MB. For the 64-bit runtimes, the extra dictionary size has only a modest effect because 64MB is already pretty big, and also because the data with the most similarity are already placed close to each other (by the ordering of the files). The other parameters (fast bytes and search cycles) basically makes the CPU work harder and look more closely when searching for patterns, but their effect is also somewhat limited. In all, these particular parameters are only slightly better than Ultra from the GUI, but I much rather prefer running a script than having to wade through a GUI (just as I prefer unattended installs over wading through installs manually).
Oh, and another caveat: Ultra from the GUI will apply the BCJ2 filter to anything that it thinks is an executable. This command line will apply it to everything. Which means that this command line should not be used if you are doing an archive with lots of non-executable code (in this case, the only non-executable is a tiny INF file, so it's okay). This also means that this command line will perform much better with files that don't get auto-detected (for example, *.ax files are executable, but are not recognized as such by 7-Zip, so an archive with a bunch of .ax files will do noticeably better with these parameters than with GUI-Ultra). If you want to use a 7z CLI for unattended compression but would prefer that 7-Zip auto-select BCJ2 for appropriate files, use -m0=LZMA:d27:fb=128:mc=256 (which is also much shorter than that big long line; for files that do get BCJ2'ed, it will just use the default settings for the minor streams).​
Click to expand...
Click to collapse
You have no idea how much the geek in me loved this post. ^_^ I have a thing about compression, too, but I'm just not very well versed in it. I only know the basic details of compression.

Answered 2 of my questions I tried finding out the other day. Thank you guys. I learn so much here.

Related

TUTORIAL: Periodically & automatically backing up an important file to a memcard

I’ve received the following question in the Smartphone & Pocket PC Magazine forums:
“I have a HX4700 with a Garmin CF620 Compact Flash GPSr unit.
This unit uses the Garmin QUE software for navigational purposes.
The active track-log is saved in a single file in the installation directory (program files) and on two occasions I've lost the tracklog and almost got myself in an emergancy situation due to either a battery issue or the necessity to do a hard reset.
My question is if you perhaps know of a 3rd party software that I can use to either in real-time or given time intervals mirror / sync the file to the Ipaq file store or external memory .
I'll even be happy if its a program that I have to run to copy the file to another location, then I can at least assign it to a button, and press it every so often.”
The answer to this question is a big YES, you can do this, without having to use a full system backup. What is more, you can do this with a free (!) and fully automatic tool – I’ve custom-written an nScriptm script which does exactly what you want.
I’ve already elaborated a lot on the possible usages of the excellent nScriptm; please see for example this and this article, along with the links. (Search for “nScriptm” with Ctrl-F if you don’t want to read the entire article. I, however, as usual, recommend reading these articles in their entirety if you want to know how Pocket PC screenshots can be taken and how calls can be (automatically) recorded.)
To solve the problem asked by my reader, as I've already stated, I’ve written an nScriptm script. It’s available here for download. After you edit it to point to the source and the target files (they are \Program Files\source.txt and \SD-MMCard\backup.txt by default; you can, in general, leave the latter filename intact and only change the storage card path), you MUST put it in the \Program Files\ns\ directory (you must create it at first) so that the executable link file, which is available here, can find it. You must do the same with the executable file (ns.exe) of nScriptm available here – that is, put it in \Program Files\ns\.
Note that, along with the source and, possibly, the target filenames, you can also modify the interval of the backup. It’s 120 seconds by default. If you want to set it to another value, just modify the parameter in sleep(120).
You can, of course, put the executable link file, PeriodicallyBackupAFile.lnk, to \Windows\Start Menu\Programs for easy access.
Now, just start the backup tool by executing the latter executable link file and minimize nScriptm. It’ll continue running in the background and backing up your file.
Other alternatives
Note that you can also do the same with the excellent SKScheMa, which is another product of the excellent S-K people also written (more precisely, ported) nScriptm and a lot of other, high-quality tools like SKTools. With it, you can for example backup your stuff every, say, hour. The advantage of the SKScheMa-based solution that it doesn’t need to be always run in the background. That way, you can lower the CPU / memory usage.
Also, if you know how you can manually add timed, recurring events with, say, SKTools, you can manually execute a simple filecopy (without periodicity – that is, modify script to the following:
function main() {CopyFile("\\Program Files\\source.txt","\\SD-MMCard\\backup.txt");}
and just configure your event queue to execute the link file it, say, every hour.)
For geeks
For programmers or anyone that would like to know how it works and how easy nScriptm is to use, the script is as follows:
function main()
{
while(1<2)
{CopyFile("\\Program Files\\source.txt","\\SD-MMCard\\backup.txt");
sleep(120);}
}
(Note that there is no “true” symbolic constant in nScriptm and, therefore, I couldn’t use while(true) and you must escape backslash characters as with all C-like languages / regexps; this is why there are "doubled" backslash characters.)

Question for Chefs (or Geeks)

Okay, for the sake of this question:
ROM = Read Only Memory
RAM = Random-Access Memory
When embedding programs, such as Mobile Notes, into an image the ROM size of the program is decreased due to digital compression. However, does it also decrease the RAM usage for that program?
Off the cuff, I would think so. Kinda like how an Assembler-language program can run on older machines faster than an optimized C-language program when compiled on the same machine. The tighter the binary conversion, the faster the program can run.
Not sure this is true though. I am trying to decide if it would be better to cook for myself and add the 3rd-party apps that I use, or to use something cooked by someone else and then adding via cab files the apps I want.
Bump.
Still wondering...

[Q] SLQ CE max length

I use SQL CE for DB. I fill a database from files, but one column "Description" have a length 8000 char. I divided it into two columns, but SubmitChanges an error: max size row 8000byte.
I try to make a separate table with a description and link them to the key. Will it work?
Much better option: store a filename in the database, and write the 16 kilobytes of sequential text to a storage system intended to hold such things. You'll lose a little access speed (though random access on the phone is very fast anyhow; it's all solid-state) but it'll actually work.
Yes, it will make your program logic a little more complex... but although databases *can* support "TEXT" fields (not all do, and this particular one appears not to) they really aren't intended as a way to store large bodies of text.

[DEV][M10] Decompiling M10 (Sense) images

The problem
Since HTC introduced Sense 3.5, themers faced a huge problem. The previously used software "M10Tools" wouldn't work with the new version of Sense.
Flemmard and me tried countless hours decoding the new image format, but without any success. The new image format is totally diferent to anything else previously seen.
I made this thread to search for help from all the awesome devs on XDA, hoping that we might find one who can help.
The history
Let me start this with some introduction to the m10 format itself.
The images I am talking about are parts of one big file - the m10 file. We usually have multiple images per m10 file, but the number doesn't really matter.
Together with the raw image data we get a set of meta information. We are not exactly sure what the values mean, but we can guess the meaning from the history of the old, decodable images.
We used to have information like width, height, payload of the image and an integer indicating what kind of image type we have. We know the actual image type for a few of these intergers, but with Sense 3.5, 3.6 and 4.0 HTC added at least two new types.
The facts
We don't have any hard facts for these image types but looking at the "old" image types, we can guess a few things:
The images are in a format the GPU can render directly (Like s3tc, ATC, QTC, etc) (At least this used to be the case, might have changed)
Images are most likely compressed. The ratio between assumed size (based on meta data) and the actual data size indicates some heavy compression. The data itself obviously looks compressed too.
There are no headers or any other help. It is just raw data.
We don't know exactly how the decoded images actually look like, so we can't say what the images display. However, due to latest archievements we "might" know this for images from Sense 3.5 and 3.6 if needed.
The handling software side is all in a few libs and NOT in smali / java, so we can't look for stuff there, however we have the libs, so if someone is pro with assembler he might find out something
I will provide a download which contains several chunks of image data and the according meta data.
If you consider working on this, please do not refrain from thinking about super simple solutions, we worked so long on this that we might be totally confused.
One thing though, this might sound arrogant, but this here really is only for people who have some decent knowledge about file formats, image compression or OpenGL.
The image types
Here is a list of image type we already know ( remember, we don't know where the numbers come from, might be some enum in native code or so)
Type 4: Raw RGB
Type 6: Raw RGBA (still used rather often)
Type 8: ATC RGB (doesn't seem to be used at all anymore)
Type 9: ATC RGBA Explicit (doesn't seem to be used at all anymore)
As you can see we got types WITH and WITHOUT alpha encoding.
Here is the list of UNKNOWN formats:
Type 13 (used way less than type 14, so maybe no alpha?)
Type 14 (this is the most used type, so I assume this one supports alpha encoding)
When thinking about what the data might be, don't throw away crazy ideas like "The data is S3TC /ATC /whatever but compressed again by some 'normal' compression algorithm". Maybe they just replaced type 8 and 9 with an additional compression on top of these types.
The meta data
Okay, so now lets talk about the meta data we get together with the actual data:
We get 4 more or less known chunks of information per image (plus a few unknown things)
Image type (described earlier) (Example: 6)
Image width (Example: 98)
Image height (Example: 78)
A more complex value containing multiple values at once.
Example: "98:78:0:30576"
We used to know the meaning of three of these values. However we are not sure for the new images. Lets explain the old meaning first:
98: Width, same value as the value above
78: Height, same value as the value above
0: It's always 0, we have no idea what it means, but since it's static we didn't care
30576: this used to be the data size. This image has a resolution of 98*78 = 7644 pixels. With a data size of 30576 that means we got 4bytes per pixel.
Lets take a look at the new images now. We still get the same information, however the meaning seems to have changed a bit:
Image type (described earlier) (Example: 14)
Image width (Example: 997)
Image height (Example: 235)
A more complex value containing multiple values at once.
Example: "1000:236:0:118000"
This is the assumed new meaning
1000: Width, but rounded up to a multiple of 4
236: Height, but rounded up to a multiple of 4
0: It's always 0, we have no idea what it means, but since it's static we didn't care
118000: this value is now exactly half of the rounded resolution (1000 * 236 / 2 = 118000)
This would mean only half a byte per pixel. One big problem here: the actual data size does not match this value at all!
The data is way smaller than this value, which indicates that it got compressed a second time
Now lets talk about some very important piece of information: HTC uses the SAME image formats on BOTH a Tegra 3 and a Qualcomm Snapdragon S4.
This obviously means that both Tegra and Snapdragon need to be able to handle this. However, also keep in mind that HTC bought S3 graphics and thefore might got some advantages here.
You can find a statistic on the used formats in the download, it's an Excel sheet with two diagrams showing the usage.
Now this was a long post, I hope someone is still reading this and might have some ideas about what's going on here.
Feel free to ask any questions concerning this.
I am also available in #virtuousrom on Freenode, per PM here or via email: diamondback [at] virtuousrom [dot] com
Download:
The download contains a bunch of unknown images of types 13 and 14 together with their meta data (like explained above)
Download image pack
Solution
After some digging I estimated that the data is compressed with fastlz [0]. Also so if you decompress it, you get exactly Width*Height bytes of data. I dont know the format this data is in, but i guess its the same the uncompressed data (type 8 or 9 or so?) was. Maybe someone could check up on that.
[0] http://fastlz.org/
onlyolli said:
After some digging I estimated that the data is compressed with fastlz [0]. Also so if you decompress it, you get exactly Width*Height bytes of data. I dont know the format this data is in, but i guess its the same the uncompressed data (type 8 or 9 or so?) was. Maybe someone could check up on that.
[0] http://fastlz.org/
Click to expand...
Click to collapse
You are indeed right. We actually found the same a few hours ago What a weird conincidence... :victory:
Type 4 and 6 are changed, they are zipped now too. Which actually breaks backwards compatibility with older Sense versions...
Inside of the zipped data are ETC images, which also explains how they can use the same on S4 and Tegra 3.
The type 14 actually contains TWO images, both ETC. Since ETC doesn't support alpha one is the image and one is an alpha mask...
Funny trick HTC!

Trident Encoder : Encryption for Windows RT

I implemented a browser based encryption solution which runs on Windows RT (and many other Windows computers). All I wrote was the HTML page, I am leveraging Crypto.JS javascript library for encryption algorithm. I am using the HTML 5 File API implementation which Microsoft provides for reading and writing files.
I make no claim on this but seems to work good for me. Feel free to feedback if you have any suggestions. The crypto.js library supports many different algorithms and configuration so feel free to modify it to your own purposes.
You can download the zip file to your surface, extract it and load the TridentEncode.htm file into Internet Explorer.
If you want to save to custom directory you probably need to load it from the Desktop IE instead of metro IE (to get the file save dialog). I usually drag and drop the file onto desktop IE and from there I can make favorite. This should work in all IE 11 and probably IE 10 browsers... if you use other browsers you may need to copy paste into the fields since the File API implementation seems rather browser specific. Running the html page from the local filesystem means that there is no man-in-the-middle which helps eliminate some of the vulnerabilities of using a javascript crypto implementation. You could also copy the attached zip file to your skydrive to decrypt your files from other computers.
Skydrive files in theory are secure (unless they are shared to public) so this might be useful for adding another layer of protection to certain info.
Again, use at your own risk, but feel free to play around and test it, and offer any suggestions or critiques of its soundness, or just use it as a template for your own apps.
Ok... this is really cool! Nice idea, and a good first implementation.
With that said, I have a few comments (from a security perspective). As an aside, minimized JS is the devil and should be annihilated with extreme prejudice (where not actually being used in a bandwidth-sensitive context). Reviewing this thing took way too long...
1) Your random number generation is extremely weak. Math.random() in JS (or any other language I'm aware of, for that matter) is not suitable for use in cryptographic operations. I recommend reading http://stackoverflow.com/questions/4083204/secure-random-numbers-in-javascript for suggestions. The answer by user ZeroG (bottom one, with three votes, as of this writing) gets my recommendation. Unfortunately, the only really good options require IE11 (or a recent, non-IE browser) so RT8.0 users are SOL.
NOTE: For the particular case in question here (where the only place I can see that random numbers are needed is the salt for the key derivation), a weak PRNG is not a critical failing so long as the attacker does not know, before the attack, what time the function is called at. If they do know, they can pre-compute the likely keys and possibly succeed in a dictionary attack faster than if they were able to generate every key only after accessing the encrypted file.
2) Similarly, I really recommend not using a third-party crypto lib, if possible; window.crypto (or window.msCrypto, for IE11) will provide operations that are both faster and *much* better reviewed. In theory, using a JS library means anybody who wants to can review the code; in practice, the vast majority of people are unqualified to either write or review crypto implementations, and it's very easy for weaknesses to creep in through subtle errors.
3) The default key derivation function (as used for CryptoJS.AES.encrypt({string}, {string})) is a single iteration of MD5 with a 64-bit salt. This is very fast, but that is actually a downside here; an attacker can extremely quickly derive different keys to attempt a dictionary attack (a type of brute-force attack where commonly used passwords are attempted; in practice, people choose fairly predictable passwords so such attacks often succeed quickly). Dictionary attacks can be made vastly more difficult if the key derivation process is made more computationally expensive. While this may not matter so much for large files (where the time to perform the decryption will dominate the total time required for the attack), it could matter very much for small ones. The typical approach here is to use a function such as PBKDF2 (Password-Based Key Derivation Function) with a large number of iterations (in native code, values of 20000-50000 are not uncommon; tune this value to avoid an undesirably long delay) although other "slow" KDFs exist.
4) There's no mechanism in place to determine whether or not the file was tampered with. It is often possible to modify encrypted data, without knowing the exact contents, in such a way that the data decrypts "successfully" but to the wrong output. In some cases, an attacker can even control enough of the output to achieve some goal, such as compromising a program that parses the file. While the use of PKCS7 padding usually makes naïve tampering detectable (because the padding bytes will be incorrect), it is not a safe guarantee. For example, a message of 7 bytes (or 15 or 23 or 31 or any other multiple of 8 + 7) will have only 1 byte of padding; thus there is about a 0.4% (1 / 256) chance that even a random change to the ciphertext will produce a valid padding. To combat this, use an HMAC (Hash-based Message Authentication Code) and verify it before attempting decryption. Without knowing the key, the attacker will be unable to correct the HMAC after modifying the ciphertext. See http://en.wikipedia.org/wiki/HMAC
5) The same problem as 4, but from a different angle: there's no way to be sure that the correct key was entered. In the case of an incorrect key, the plaintext will almost certainly be wrong... but it is possible that the padding byte(s) will be correct anyhow. With a binary file, it may not be possible to distinguish a correct decryption from an incorrect one. The solution (an HMAC) is the same, as the odds of an HMAC collision (especially if a good hash function is used) are infinitesimal.
6) Passwords are relatively weak and often easily guessed. Keyfiles (binary keys generated from cryptographically strong random number generators and stored in a file - possibly on a flashdrive - rather than in your head) are more secure, assuming you can generate them. It is even possible to encrypt the keyfile itself with a password, which is a form of two-factor authentication: to decrypt the data that an attacker wants to get at, they need the keyfile (a thing you have) and its password (a thing you know). Adding support for loading and using keyfiles, and possibly generating them too, would be a good feature.
The solutions to 3-5 will break backward compatibility, and will also break compatibility with the default parameters for openssl's "enc" operation. This is not a bad thing; backward compatibility can be maintained by either keeping the old version around or adding a decrypt-version selector, and openssl's defaults for many things are bad (it is possible, and wise, to override the defaults with more secure options). For forward compatibility, some version metadata could be prepended to the ciphertext (or appended to the file name, perhaps as an additional extension) to allow you to make changes in the future, and allow the encryption software to select the correct algorithms and parameters for a given file automatically.
Wow thanks GDTD that's great feedback
Not sure about his minified sources, the unminified aes.js in components is smaller than the minified version (which I am using) in rollups. I'll have to look into what his process for 'rollup' is to see if I can derive a functional set of non-minified script includes. If I can do that it would be easier to replace (what I would guess is) his reliance on Math.random.
His source here mirrors the unminified files in components folder : https://code.google.com/p/crypto-js/source/browse/tags/3.1.2/src
msCrypto that would be great, I had no idea that was in there. I found a few (Microsoft) samples so I will have to test them out and see if I can completely substitute that for crypto.js. Would be more keeping in line with the name I came up with.
Currently this version only works for text files, I am using the FileAPI method reader.readAsText(). I have been trying to devise a solution for binary files utilizing reader.readAsArrayBuffer but as yet I haven't been able to convert or pass this to crypto.js. I will need to experiment more with base64 or other interim buffer formats (which Crypto.js or msCrypto can work with) until I can get a better understanding of it.
Metadata is a great idea, maybe i can accommodate that with a hex encoded interim format.
You seem extremely knowledgeable in the area of encryption, hopefully i can refine the approach to address some of the issues you raised by setting up proper key, salt, and IV configuration... I'm sure I will understand more of your post as i progress (and after reading it about 20 times more as a reference).
Too bad we don't a web server for RT, that would at least open up localStorage for json serialization (mostly for other apps I had in mind). I guess they might not allow that in app store though. Could probably run one of a developers license though (renewed every 1-2 months)?
nazoraios said:
Too bad we don't a web server for RT, that would at least open up localStorage for json serialization (mostly for other apps I had in mind). I guess they might not allow that in app store though. Could probably run one of a developers license though (renewed every 1-2 months)?
Click to expand...
Click to collapse
I cant comment too much on the encryption, GoodDayToDie has covered anything I could contribute and more. But there is a functioning web server on RT. Apache 2.0 was ported: http://forum.xda-developers.com/showthread.php?t=2408106 I dont know if everything is working on it, I dont own an RT device and last time I tried I couldnt get apache to run on 64 bit windows 8 anyway (needed it at uni, spent hours going through troubleshooting guides and it never worked on my laptop, gave up and ran it under linux in virtualbox where it took 2 minutes to have functioning the way I needed it to).
Curious about the performance. Speaking of encryption, 7-Zip has it built-in, and from the discuss in StackExchange, it seems pretty good.
One of the neat things about this thing (local web app? Pseudo-HTA (HTml Application)? Not sure if there's a proper name for such things) is that it runs just fine even on non-jailbroken devices. That's a significant advantage, at least for now.
Running a web server should be easy enough. I wrote one for WP8 (which has a subset of the allowed APIs for WinRT) and while the app *I* use it in won't be allowed in the store, other developers have taken the HTTP server component (I open-sourced it) and packaged it in other apps which have been allowed just fine. With that said, there are of course already file crypto utilities in the store anyhow... but they're "Modern" apps so you might want to develop such a server anyhow so you can use it from a desktop web browser instead.
Web cryptography (window.crypto / window.msCrypto) is brand new; it's not even close to standardization yet. I'm actually kind of shocked MS implemented it already, even if they put it in a different name. It's pretty great, though; for a long time, things like secure random numbers have required plugins (Flash/Java/Silverlight/whatever). Still, bear in mind that (as it's still far from standardized), the API might change over time.
Yep, I think of them as Trident apps since trident is what Microsoft calls their IE rendering engine, but I guess they are sort of offline web apps (which come from null domain). Being from null domain you are not allowed to use localstorage which is domain specific. You also are not allowed to make ajax requests. You just have file api and json object serialization to make do with I/O.
Another app I am working on is a kind of Fiddler app similar to http://jsfiddle.net/ where you can sandbox some simple script programs.
Kind of turning an RT device into a modern/retro version of a commodore 64 or other on-device development environments. Instead of basic interpreter you've got your html markup and script.
I have an attached demo version which makes available jquery, jquery-ui, alertify javascript libraries in a sandbox environment that you can save as .prg files.
I put a few sample programs in the samples subfolder. Some of the animation samples (like solar system) set up timers which may persist even after cleared so you might need to reload the page to clear those.
It takes a while to extract (lots of little files for all the libraries) but once it extracts you can run the html page and I included a sample program 'Demo Fiddle.prg' you can load and run to get an idea.
I added syntax highlighting editors (EditArea) which seems to work ok and let's you zoom each editor full screen.
The idea would be to take the best third party javascript libraries and make them available and even make shortcuts or minimal API for making it easier to use them. Common global variable, global helper methods, ide manipulation. I'd like to include jqplot for charting graphs, maybe for mathematical programs and provide api for user to do their own I/O within the environment.
These are just rough initial demos, and obviously open source so if anyone wants to take the ideas and run with them i'd be interested in seeing what others do. Otherwise I will slowly evolve the demos and release when there are significant changes.

Categories

Resources