Introduction

It is common to store digitised waveforms on computers and the resulting files can often consume significant amounts of storage space. General compression algorithms do not perform very well on these files as they fail to take into account the structure of the data and the nature of the signal contained therein. Typically a waveform file will consist of signed 16 bit numbers and there will be significant sample to sample correlation. A compression utility for these file must be reasonably fast, portable, accept data in a most popular formats and give significant compression. This report describes ``shorten'', a program for the UNIX and DOS environments which aims to meet these requirements.

A significant application of this program is to the problem of compression of speech files for distribution on CDROM. This report starts with a description of this domain, then discusses the two main problems associated with general waveform compression, namely predictive modelling and residual coding. This framework is then extended to lossy coding. Finally, the shorten implementation is described and an appendix details the command line options.

Tony Robinson: ajr4@cam.ac.uk