This is the front page of The Cruncher Blog. To go beyond the basics, read my further articles and papers.

What is a cruncher?

HISTORY

Cruncher is a term coined by coders just before data compression came of age, ie. when users used crunchers on a daily basis, and coders released software in compressed packages. Before this happened and, slowly, compression schemes became more and more standardized towards the ZIP, RAR, 7-Zip, and LHA of today, there were hundreds if not thousands of such compression software alternatives.

All with the aim to make disks store more or tapes to load faster, some compressed executables to decompress on the fly, some were archivers, and many had special applications, such as game loaders or quick animation. Apart from the slower archivers, DOS commands ported from ancient source code of the original inventor, the coders that made compressors to fit their own needs called their creations crunchers.

THEORY

To crunch is to reduce in size. Reducing in size involves two methods that might or might not combine well:

  1. Encode more efficiently.

    Efficient encoding is based on assumptions of the data chunk being processed. (A sound sample, for example, is digitized with a certain number of bits per time unit, and the major harmonies of the digitized sound waves is usually well below 44100 Hz. Also, the composite waves, at least in a non-chaotic sound, can usually be described with a transform based on sine waves of varying frequencies.)

    As a simple example, lossless crunching of sound samples can be done with n delta transforms and bit-chunk encoding, where each datum in the chunks is stored with a lesser number of bits than that of each original sample.

    Encoding can be done within an assumed alphabet or unit size (such as 8-bit bytes for ANSI text), with a different alphabet, or no alphabet.

  2. Detect redundant data.

    That is, data that can be stored with fewer bits by any combination of the three types of transforms from known data listed below.

    The known data can be in an external dictionary or within the file, or it can be assumed, ie. included in the decrunch code. One example of the former is a normal dictionary for text crunching, which itself can be crunched. One example of the latter is numeric constants, such as for pattern lengths or sine wave frequencies.

    1. Domain transform

      The transform can be valid in any space or domain, so long as transcription to and from this domain are coded and adequate for the application. One example is going into the quaternion domain for fractal crunching.

      If, for simplicity or speed, the crunching is kept in the normal data bits domain, there are two types of transforms:

    2. Aliased arithmetics transform

      That is, a transform based on assumptions of the type and form or regularity of the data. If a precision higher than the bits present is required, one could provide floating-point functions and go into a (less aliased) arithmetics domain. Delta conversion is one form of aliased arithmetics transforms.

    3. Bitwise logic transform

      A transform that is also based on assumptions, as mentioned above. If, for example, an RGB-encoded picture or animation is stored as multiple bitmaps with 1 bpp (bits per pixel) in each, one such transform could be an exclusive-or transform from one pixel line to the next, from one bitmap to the next, or from one picture to the same bitmap in the next.

      Examples of bitwise logic transforms are shifts, exclusive-or, and, or, and inversion. They may be unary, binary, or have any number of sources to produce the output bit or bits, ie. regarded from input to output, they may be serial-to-serial, serial-to-parallel, parallel-to-serial, or parallel-to-parallel.

The short text above describes concisely the complete theory of data compression, or crunching. As with all theories, it's clear in its purest generic and nondescript form ;) Theories are tested with experiments and inventions. So go to it!

Full paper

Read or comment the blog post

Data Compression on Wikipedia