Appendix: The shorten man page (version 1.22)



next up previous
Next: About this document Up: SHORTEN: Simple lossless and Previous: References

Appendix: The shorten man page (version 1.22)

SHORTEN(1)               USER COMMANDS                 SHORTEN(1)


NAME
     shorten - fast compression for waveform files

SYNOPSIS
     shorten [-hl] [-a #bytes] [-b #samples] [-c  #channels]  [-d
     #bytes]  [-m  #blocks]  [-n  #dB] [-p #order] [-q #bits] [-r
     #bits]   [-t   filetype]   [-v   #version]    [waveform-file
     [shortened-file]]

     shorten -x [-hl] [ -a #bytes] [-d  #bytes]   [shortened-file
     [waveform-file]]

DESCRIPTION
     shorten reduces the size of waveform files (such  as  audio)
     using  Huffman  coding  of prediction residuals and optional
     additional quantisation.  In lossless  mode  the  amount  of
     compression  obtained depends on the nature of the waveform.
     Those composing of low frequencies and low  amplitudes  give
     the  best  compression,  which  may be 2:1 or better.  Lossy
     compression operates by specifying a minimum acceptable seg-
     mental  signal to noise ratio or a maximum bit rate.   Lossy
     compression operates by zeroing the lower order bits of  the
     waveform, so retaining waveform shape.

     If both file names are specified then these are used as  the
     input and output files.  The first file name can be replaced
     by "-" to read from standard input and likewise  the  second
     filename can be replaced by "-" to write to standard output.
     Under UNIX, if only one file name is  specified,  then  that
     name is used for input and the output file name is generated
     by adding the suffix ".shn" on compression and removing  the
     ".shn"  suffix  on  decompression.  In these cases the input
     file is removed on completion.  The use  of  automatic  file
     name generation is not currently supported under DOS.  If no
     file names are specified, shorten reads from standard  input
     and  writes to standard output.  Whenever possible, the out-
     put file inherits the permissions, owner, group, access  and
     modification times of the input file.

OPTIONS
     -a align bytes
          Specify the number  of  bytes  to  be  copied  verbatim
          before  compression begins.  This option can be used to
          preserve fixed length ASCII headers on waveform  files,
          and  may  be  necessary  if the header length is an odd
          number of bytes.

     -b block size
          Specify the number of samples  to  be  grouped  into  a
          block  for  processing.  Within a block the signal ele-
          ments are expected to have the  same  spectral  charac-
          teristics.   The  default option works well for a large
          range of audio files.

     -c channels
          Specify the number of independent interwoven  channels.
          For two signals, a(t) and b(t) the original data format
          is assumed to be a(0),b(0),a(1),b(1)...

     -d discard bytes
          Specify the number of  bytes  to  be  discarded  before
          compression  or  decompression.   This  may  be used to
          delete header information from a file.  Refer to the -a
          option  for  storing  the  header  information  in  the
          compressed file.

     -h   Give a short message specifying usage options.

     -l   Prints the software license specifying  the  conditions
          for the distribution and usage of this software.

     -m blocks
          Specify the number of past blocks to be used  to  esti-
          mate  the  mean  and power of the signal.  The value of
          zero disables this prediction and the mean  is  assumed
          to  lie in the middle of the range of the relevant data
          type  (i.e.  at  zero  for  signed  quantities).    The
          default  value  is non-zero for format versions 2.0 and
          above.

     -n noise level
          Specify the  minimum  acceptable  segmental  signal  to
          noise  ratio  in  dB.  The signal power is taken as the
          variance of the samples  in  the  current  block.   The
          noise  power is the quantisation noise incurred by cod-
          ing the current block assuming that samples are unifor-
          mally  distributed over the quantisation interval.  The
          bit rate is dynamically changed to maintain the desired
          signal  to  noise  ratio.  The default value represents
          lossless coding.

     -p prediction order
          Specify the maximum  order  of  the  linear  predictive
          filter.   The default value of zero disables the use of
          linear prediction and a polynomial interpolation method
          is  used  instead.   The  use  of the linear predictive
          filter generally results  in  a  small  improvement  in
          compression  ratio  at  the  expense of execution time.
          This is the only option to use a significant amount  of
          floating    point    processing   during   compression.
          Decompression still uses a minimal number  of  floating
          point operations.

          Decompression time is normally about twice that of  the
          default polynomial interpolation.  For version 0 and 1,
          compression time is linear  in  the  specified  maximum
          order as all lower values are searched for the greatest
          expected compression (the number of  bits  required  to
          transmit   the  prediction  residual  is  monotonically
          decreasing with prediction order, but transmitting each
          filter  coefficient  requires about 7 bits).   For ver-
          sion 2 and above, the search is started at  zero  order
          and terminated when the last two prediction orders give
          a larger expected bit rate than the  minimum  found  to
          date.    This  is  a  reasonable strategy for many real
          world signals - you may revert back to  the  exhaustive
          algorithm  by  setting -v1 to check that this works for
          your signal type.

     -q quantisation level
          Specify the number of low order  bits  in  each  sample
          which  can  be discarded (set to zero).  This is useful
          if these bits carry no information,  for  example  when
          the signal is corrupted by noise.

     -r bit rate
          Specify the expected maximum number of bits per sample.
          The  upper bound on the bit rate is achieved by setting
          the low order bits of the sample to  zero,  hence  max-
          imising the segmental signal to noise ratio.

     -t file type
          Gives the type of the  sound  sample  file  as  one  of
          {ulaw,s8,u8,s16,u16,s16x,u16x,s16hl,u16hl,s16lh,u16lh}.
          ulaw is the natural file type  of  ulaw  encoded  files
          (such  as  the  default sun .au files).   All the other
          types have initial s or u for signed or unsigned  data,
          followed  by  8 or 16 as the number of bits per sample.
          No further extension means the data is in  the  natural
          byte  order,  a trailing x specifies byte swapped data,
          hl explicitly states the byte order as high  byte  fol-
          lowed  by low byte and lh the converse.  The default is
          s16, meaning signed 16 bit integers in the natural byte
          order.

          Specific optimisations are applied to ulaw files.    If
          lossless  compression is specified then a check is made
          that the whole dynamic range is used (useful for  files
          recorded  on  a  SparcStation  with  the volume set too
          high).   If lossy compression  is  specified  then  the
          data  is  internally  converted  to linear.   The lossy
          option "-r4" has been observed to give little  degrada-
          tion.

     -v version
          Specify the binary format version number of  compressed
          files.    Legal  values  are 0, 1 and 2, higher numbers
          generally  giving  better  compression.   The   current
          release  can  write  all format versions, although con-
          tinuation of this support is not  guaranteed.   Support
          for  decompression  of  all  earlier format versions is
          guaranteed.

     -x extract
          Reconstruct the original file.  All other command  line
          options except -a and -d are ignored.


METHODOLOGY
     shorten works by blocking the signal, making a model of each
     block  in  order to remove temporal redundancy, then Huffman
     coding the quantised prediction residual.


  Blocking
     The signal is read in a block of about 128 or  256  samples,
     and  converted  to  integers  with  expected  mean  of zero.
     Sample-wise-interleaved data is converted to separate  chan-
     nels, which are assumed independent.


  Decorrelation
     Four functions are computed, corresponding  to  the  signal,
     difference  signal, second and third order differences.  The
     one with the lowest variance  is  coded.   The  variance  is
     measured  by  summing absolute values for speed and to avoid
     overflow.


  Compression
     It is assumed the signal has the Laplacian probability  den-
     sity  function  of exp(-abs(x)).  There is a computationally
     efficient way of mapping this density to Huffman codes,  The
     code  is  in two parts, a run of zeros, a bounding one and a
     fixed number of bits mantissa.  The number of leading  zeros
     gives  the  offset  from zero.  Signed numbers are stored by
     calling the function for unsigned numbers with the  sign  in
     the lowest bit.  Some examples for a 2 bit mantissa:

  100  0
  101  1
  110  2
  111  3
  0100 4
  0111 7
  00100     8
  0000100   16
     This Huffman code was first used by Robert  Rice,  for  more
     details   see   the  technical  report  CUED/F-INFENG/TR.156
     included with the shorten distribution  as  files  tr154.tex
     and tr154.ps.


SEE ALSO
     compress(1),pack(1).

DIAGNOSTICS
     Exit status is normally 0.  A warning is issued if the  file
     is  not  properly  aligned,  i.e.  a whole number of records
     could not be read at the end of the file.

BUGS
     There are no known bugs.  An easy way to  test  shorten  for
     your  system is to use "make test", if this fails, for what-
     ever reason, please report it.

     No check  is  made  for  increasing  file  size,  but  valid
     waveform  files  generally  achieve  some compression.  Even
     compressing a file of random  bytes  (which  represents  the
     worst  case  waveform file) only results in a small increase
     in the file length (about 6% for 8 bit data and  3%  for  16
     bit data).

     There is no provision for different channels containing dif-
     ferent data types.  Normally, this is not a restriction, but
     it does mean that if lossy coding is selected for  the  ulaw
     type, then all channels use lossy coding.

     It would be possible for all options to be channel  specific
     as  in  the  -r  option.    I  could do this if anyone has a
     really good need for it.

     See also the file Change.log and README.dos for  what  might
     also be called bugs, past and present.

     Please mail me immediately at the address below  if  you  do
     find a bug.


AVAILABILITY
     The latest version can be obtained  by  anonymous  FTP  from
     svr-ftp.eng.cam.ac.uk,   in  directory  comp.speech/sources.
     The UNIX version is called shorten-?.??.tar.Z  and  the  DOS
     version is called short???.zip (where ? represents a digit).


AUTHOR
     Copyright (C) 1992-1994 by Tony Robinson (ajr4@cam.ac.uk)

     Shorten is available for  non-commercial  use  without  fee.
     See  the  LICENSE file for the formal copying and usage res-
     trictions.



Tony Robinson: ajr4@cam.ac.uk