Member of the VA Affiliate Underground

Ok, I know, Bitcomp is nothing more than an implementation of Psuedo-Huffman in PHP, and LZ encapsulation is not something new either. I am keeping this project is SF however, I still think that BitComp shows how powerful is PHP. Besides, some people might still find something usefull to do with this piece of code

Since the rise of the information era, human kind(especially slashdot nerds;) ) have always looked for ways to compress data of all kinds. even through storage space is not an issue as it was before, compression is still needed to optimize disk usage

various compression utilities exist for almost any aim. starting from simple Lampel-Ziv implementation, suited for repeative context like text, to complex image fractal based compression

None of these, however, have ever addressed the fact that many data formats, including text files, are stored in such a way which causes them to practicly use 14%-33% t han what they should!(in some rare cases, it might even be 60% more than needed!)

How's thats happends?

Simple. Because of convience reasons, every and every chars on file are represented by a combination of 8 bits, which gives us 256(0-255) different representations. but what if in a WHOLE file not more than 128 options are used? this means we need for each char one bit less, which might result in 12.5% less space, but in reality, we have to problems:

  1. While in theory we need one less bit, to represent a specific char we might still need the extra bit because it's location is in the high ascii(>127)
  2. Even if we solve the first problem, we still need to find a way to use only seven bits in the file itself(since every char MUST be represented by 8-bits)

BitComp addresses both issues.

First, it creates a custom "char map" for the file it compresses, so no more than the actual number of bits is used, and converts the file.

Second, it converts every byte in the file to binary format, and traunces every "virtual byte" to the number of bits which is actually needed, than it repacks it by putting every 8 bits into a byte, so the file now looks like a random grabage

Decompresison is done quite in the same way, but in reverse order, so it firsts converts it to bit format, expanding every 6 bits to 8 bits, repack it, and then convert it using the specific charmap

Currently(as expanded on the info page), BitComp, while being a new compression method, is currently easily beated even by the unix "compress" utility. this will change when LZ will be encapsulated inside BitComp(pipe compression doesn't give good results)


This site is hosted by :

Made by GIMP