Post subject: How to heavily compress files?
GabCM
He/Him
Joined: 5/5/2009
Posts: 901
Location: QC, Canada
Some day, Ilari gave me a compressed 7z file. It was around 50 MB big, and once uncompressed, it was around 20 GB! I couldn't believe it! There are moments when I'd like to heavily compress files like that. I know such a compression is not possible for every type of files. The only thing I'd like to know is the best way to compress files, so the file size becomes the smallest possible. What software(s) should I use, and which parameters should I set? I'm using Windows 7 x64.
Publisher
Joined: 4/23/2009
Posts: 1283
Here you go: http://www.maximumcompression.com/index.html Mind you, the best compressor takes a long time to do...
Emulator Coder
Joined: 3/9/2004
Posts: 4588
Location: In his lab studying psychology to find new ways to torture TASers and forumers
7-Zip on ultra/maximum is generally pretty good. It all depends on the files though. If you want to see something really crazy, you can compress a 1GB file of all zeros down to a couple of bytes with bzip2. But that's just a special case.
Warning: Opinions expressed by Nach or others in this post do not necessarily reflect the views, opinions, or position of Nach himself on the matter(s) being discussed therein.
Editor
Joined: 3/10/2010
Posts: 899
Location: Sweden
Rule number one: The more content specific encoding the better. You wouldn't get good results if you just compressed a jpeg or an mp3 file as is. But if you unpack the compressed data and recompress it with a better algorithm, you should be able to get much better results. You can even skip irrelevant things in the file completly. For example, the mp3 frame header is largely useless except for the cases where the values change (not likely). Similarly, error correction data in disc images and shared decompression tables can be eliminated. You can also do things like throwing away data that will never be used, such as big metadata chunks (the same album picture embedded in every mp3 file?) and palette entries in pictures that are never used. But overall, the compression table is important for the size, pick a too small one and things wont work out. Pick a too big one and you might be carrying unused data.
GabCM
He/Him
Joined: 5/5/2009
Posts: 901
Location: QC, Canada
Thank you Aktan for that site that looks useful! I'll keep that link. But what about disc image files, such as ISO? I can't find any information about compressing that kind of file.
Emulator Coder
Joined: 3/9/2004
Posts: 4588
Location: In his lab studying psychology to find new ways to torture TASers and forumers
Mister Epic wrote:
But what about disc image files, such as ISO? I can't find any information about compressing that kind of file.
http://www.neillcorlett.com/ecm/
Warning: Opinions expressed by Nach or others in this post do not necessarily reflect the views, opinions, or position of Nach himself on the matter(s) being discussed therein.
GabCM
He/Him
Joined: 5/5/2009
Posts: 901
Location: QC, Canada
Nach wrote:
Mister Epic wrote:
But what about disc image files, such as ISO? I can't find any information about compressing that kind of file.
http://www.neillcorlett.com/ecm/
Nice! I'll try that! Thanks!
Joined: 7/2/2007
Posts: 3960
There is a limit to how well you can compress data. Eventually every bit becomes significant, and thus no bit can be removed; at that point, the file is maximally compressed. The file Ilari sent you might have been 20GB when uncompressed, but very few of the bits in that 20GB were actually significant, which is why compression was able to achieve such good results. Once something has already been compressed once, you are unlikely to get good results by compressing it again. For example, if you made a .zip file of your MP3s, you would make only marginal gains, because the MP3s are already close to their compression limit. The Zip compression can make a few minor gains from exploiting patterns that the MP3 compression doesn't recognize (and from compressing MP3 metadata and the like), but that accounts for only a minor amount of the data.
Pyrel - an open-source rewrite of the Angband roguelike game in Python.
Banned User, Former player
Joined: 3/10/2004
Posts: 7698
Location: Finland
Yeah, I vote for 7-zip as well. From all the common-use archivers, it's the one which gets the best results in most cases.
Senior Moderator
Joined: 8/4/2005
Posts: 5770
Location: Away
I recently started using NanoZip (w/self-extracting module) for large archives, the algorithm of choice being "optimum 2" with RAM usage of 1024 MB. Amazingly, it compresses better (in 100% of cases in my experience) and faster than 7-Zip's LZMA. Decompression is a bit on the slower side, but then again so is LZMA. One of the most useful things about heavy-duty compression is a large dictionary. Dictionary size basically indicates how large can a repeated character sequence be to be recognized as a repeated sequence. WinRAR, which is my general purpose archiver of choice (it's largely on par with 7-Zip in terms of compression ratio and speed, yet it still has a better interface), limits its window to 4 MB, which means the archive will be scanned for repeating sequences no longer than 4 MB, thus decreasing the amount of RAM required for compression. Legends says this is a means to ensure no problems with non-tech-savvy users working in corporate environments on computers with small amounts of RAM. I don't know whether this is true, but this is a major reason it falls short of 7-Zip's results in terms of compression ratio. It is better suited for compressing media files, though (PCM files in particular, akin to FLAC).
Warp wrote:
Edit: I think I understand now: It's my avatar, isn't it? It makes me look angry.
Joined: 2/19/2010
Posts: 248
RAR has the problem of being a proprietary format, unlike 7zip. Given the marginal difference in performance betwee the two standards, I prefer a free format.
Senior Moderator
Joined: 8/4/2005
Posts: 5770
Location: Away
Can we please not drag the free vs. not free debate into this? I fail to see how it has any relevance to the topic, and all the fears surrounding proprietary formats are pretty much always based on hypothetical "what if" scenarios that have about as much likelihood (or credibility) to them as your next conspiracy theory.
Warp wrote:
Edit: I think I understand now: It's my avatar, isn't it? It makes me look angry.
Joined: 2/19/2010
Posts: 248
moozooh wrote:
Can we please not drag the free vs. not free debate into this? I fail to see how it has any relevance to the topic,
The topic is file compression. Free vs proprietary is a factor in file compression choice. For some people, it's an important one, for others, it's not. It's of moderate importance to me; but if RAR was significantly better than 7z, I daresay I would consider using RAR. I don't see how any of this is either offtopic or controversial. I understand your reaction, given the prevalence of overzealous Stallman-types. Rest assured that I am not one of them, in two important ways: 1. I accept and use some proprietary software and data formats; 2. I accept that other people's choices of which proprietary products are worth using will be different from my own. Some people are willing to pay €29.95 for WinRAR, and think that this represents value for money. They can make their own decisions. I will not be telling them that they are wrong and evil for doing so.
and all the fears surrounding proprietary formats are pretty much always based on hypothetical "what if" scenarios that have about as much likelihood (or credibility) to them as your next conspiracy theory.
I won't continue this part of the discussion here (since it definitely is offtopic and will spiral out of control if I do answer it), but I can answer in another thread or PM if you wish to hear my views on this. If not, we can let this subthread die which will be no bad thing.
Senior Moderator
Joined: 8/4/2005
Posts: 5770
Location: Away
rhebus wrote:
The topic is file compression. Free vs proprietary is a factor in file compression choice. For some people, it's an important one, for others, it's not.
I meant that being free or proprietary doesn't in any way indicate how well does a compressor perform in terms of compression ratio, speed, or efficiency, and I don't remember the OP being concerned with the "problem" of commercialware*. Either way, I used WinRAR to illustrate my post as an example of software I use and know about, not an example of something I particularly recommend to use. * — For the record, UnRAR executable and libraries are free (hence why other free applications can decompress .rar files), and WinRAR itself merely has a nagscreen, otherwise remaining fully functional after the initial 30-day period, so it never was a problem for anybody, and especially not for pirates. Of course, it detracts from my previous statement of it having a better interface if you choose to use it that way. Yet it still remains one of the most feature-complete and ergonomically pleasing archivers/compressors, that much is a fact, and that is what money is asked for here — not being marginally more efficient than 7-Zip. It's not the first time I have to defend RAR here (one of my first posts back in 2005 was about that!), and pretty much the only reason it happens is because people are eager to dismiss it for being commercialware alone, ignoring its good sides as if paying for something useful per se is some kind of a blasphemy. If you wish to discuss this further in private, I don't mind, but it's really not as important (or at all relevant) in this topic as you might think.
Warp wrote:
Edit: I think I understand now: It's my avatar, isn't it? It makes me look angry.
Publisher
Joined: 4/23/2009
Posts: 1283
You can't get off topic in the off topic forum! Everything is off topic! *note the sarcasm
Banned User, Former player
Joined: 3/10/2004
Posts: 7698
Location: Finland
moozooh wrote:
I meant that being free or proprietary doesn't in any way indicate how well does a compressor perform in terms of compression ratio, speed, or efficiency, and I don't remember the OP being concerned with the "problem" of commercialware*.
I think that the idea was that if the two programs compress about equally well in average, the free program/format combo is preferable to the proprietary one.