What is a cruncher?

| No Comments | No TrackBacks
HISTORY
Cruncher is a term coined by coders just before data compression came of age, ie. when users used crunchers on a daily basis, and coders released software in compressed packages. Before this happened and, slowly, compression schemes became more and more standardized towards the ZIP, RAR, 7-Zip, and LHA of today, there were hundreds if not thousands of compression software alternatives. All with the aim to make the most of disks (and tapes), some made compressed executables to decompress on the fly, some were archivers, and some had custom uses, such as in game loaders. Apart from the slower archivers that were DOS commands ported from some ancient source code of the original inventor, the coders that made compressors to fit their own needs called their creations crunchers.


THEORY
Crunching is reducing in size. To reduce in size involves two major categories, that might be possible to use together:

1) encoding more efficiently. 

Efficient encoding is based on assumptions of the data chunk being processed. (A sound sample, for example, is digitized with a certain number of bits per time unit, and the frequency of the digitized sound waves is usually well below 44100 Hz. Also, the composite waves, at least in a non-chaotic sound, can usually be described with a transform based on sine waves of varying frequencies.)

As a simple example, lossless crunching of sound samples can be done with n delta transforms and bitchunk encoding, where each datum in the chunks is stored with a lesser number of bits than that of each original sample.


2) find redundant data; data that can be stored with fewer bits by any of 3 types of transform from known data.

The known data can be in an external dictionary or within the file, or it can be assumed, ie. included in the decrunch code.

One example of the former is a normal dictionary for text crunching, which itself can be crunched.

One example of the latter is constants, such as for pattern lengths or sine wave frequencies.

2a) Domain transform
The transform can be valid in any space or domain, so long as transcription to and from this domain are coded and adequate for the application. One example is going into the quaternion domain for fractal crunching. 

If, for simplicity, the crunching is kept in the normal data bits domain, there are two types of transforms:

2b) an aliased arithmetics transform, based on assumptions of the type and form or regularity of the data. If a precision higher than the bits present is required, one could provide floating-point functions and go into a (less aliased) arithmetics domain. Delta conversion is one form of aliased arithmetics transforms.

2c) a bitwise logic transform, also based on assumptions about the data. If, for example, an RGB-encoded picture or animation is stored as multiple bitmaps with 1 bpp (bits per pixel) in each, one such transform could be an exclusive-or transform from one pixel line to the next, from one bitmap to the next, or from one picture to the same bitmap in the next.

Examples of bitwise logic transforms are shifts, exclusive-or, and, or, and inversion. They may be unary, binary, or have any number of sources to produce the output bit or bits, ie. regarded from input to output, they may be serial-to-serial, serial-to-parallel, parallel-to-serial, or parallel-to-parallel.


The short text above concisely describes the complete theory of data compression, or crunching. As for all abstract theories, it's clear in it's purest generic and nondescript form. And as with all theories, they are proven with experiments and inventions ;) So go to it!


Full paper here.

Hi, I created the cruncher blog

| No Comments | No TrackBacks

My name is Henrik Erlandsson, and I am a coder in miscellaneous languages on a myriad of platforms, mainly 68000, ARM, and whatever they put in PCs. ;)

For those "...just scanning...", this blog is about data compression. Some programmers are bitten by the "kill a few more bytes" bug. I have been infected for nigh-on 3 decades, so if you are one of us zombies, I hope you will find some interesting ideas and inspiration here!

Find recent content on the main index or look in the archives to find all content.

Categories

Pages

Powered by Movable Type 4.21-en