CrLZH/LZHREL.DOC

From Just Solve the File Format Problem
Jump to: navigation, search

A document LZHREL.DOC including some information on the CrLZH format (buried in information about interfacing with a library file -- scroll down), extracted as LZHREL.DYC from CRLZH20.LBR (itself compressed with CrLZH):

                     UNLZH.REL and CRLZH.REL Documentation
              for Version 2.0 (with RUNTIME buffer allocation)
                                  July  1991

                                   Abstract
                                  ----------

  UNLZH.REL and CRLZH.REL contain assembly language routines for the decoding 
  and generation of LZH-encoded files, respectively.  The routines need only 
  be supplied with a pointer to a large scracth area and linkages to character 
  input and character output routines to be used.  There are a few easily met 
  functional requirements for the calling routine and I/O routine.

  No Z80 opcodes are used, so these routines may be used on 8080/8085/V20/Z80 
  based machines.

  The coding contained within UNLZH.REL and CRLZH.REL are Copyright (c) 1989, 
  1991 by Roger Warren and may not be used or reproduced on a for-profit 
  basis.  Non-profit use is freely permitted.  The author will not be 
  responsible for any damage, loss, etc.  caused by the use of these programs.

                       Version History and Compatibility
                       ---------------------------------

  Version 1.1 was released in Sept. '89 and was the first public offering of 
  LZH encoding for CP/M.

  Version 2.0 (July '91) introduces several improvements/changes:
      More efficient encoding
      Greater speed
      More compact object code

  There are NO interface changes from the 1.x version.

  Of greatest importance is the encoding improvement.  This change, while 
  generating even smaller output files, means that files compressed with 
  version 2.x programs cannot be decoded by old 1.x programs.

  However, the version 2.x UNLZH module DYNAMICALLY ADJUSTS for 1.x encoded 
  input files.  Thus, version 2.x of UNLZH can be used on all LZH-encoded 
  files regardless of which algorithm version was used to encode the files.

  By extensive rewriting for size and speed, a 20% improvement in performance 
  was achieved.  Since the LZH algorithm is intrinsically slow, this will be 
  of great interest to many.  The recoding project allowed the incorporation 
  of version 2.x extensions to the original algorithm while not appreciably 
  affecting the code module size.

                               Care and Feeding
                              ------------------

  The infomation that follows documents both CRLZH.REL and UNLZH.REL.  One or 
  both may be in the library this file is in, depending upon the nature of 
  the program(s) it's bundled with...so ignore the superfluous information 
  (if any).

  UNLZH.REL performs LZH decoding.  It's progamming interface is similar to 
  the UNCR.REL it was made to mimic.  The programmer must provide the program 
  with 8k of buffer space.  If RUNTIME buffer allocation is selected (it IS 
  selected in the version supplied with LT, FCRLZH, CRLZH, and UCRLZH), a 
  pointer to the buffer must be supplied in the H/L register pair when the 
  routine is invoked.  If RUNTIME buffer allocation is not selected, the user 
  must supply a PUBLIC symbol, UTABLE, which is the base of the provided 
  buffer area.

  Once invoked, the routine allocates its own stack and 'stays in control' 
  until the de-compression is completed (or an error is encountered).  The 
  programmer must supply two routines GLZHUN and PLZHUN, via which UNLZH 
  'GETS' bytes from the input stream and 'PUTS' bytes to the output stream, 
  respectively.


  UNLZH *DOES NOT* compute/process checksums, etc. on the  input file.  Any 
  support of such features must be handled externally.

  GLZHUN and PLZHUN should save all registers except the A register and 
  flags. GLZHUN must return the next character from the input stream in the A 
  register.  GLZHUN should return with the CARRY flag RESET for a valid 
  character, or with the CARRY flag SET when the end of the input stream is 
  encountered (the content of the A reg should be zero in that case).

  Upon exit (return to the caller), UNLZH returns the following information:

             Carry reset (or A reg = 0) - Success
             Carry set,  A reg = 1      - Newer version required
             Carry set,  A reg = 2      - File not LZH endocoded
             Carry set,  A reg = 3      - Bad or corrupt file
             Carry set,  A reg = 4      - Insufficient memory

  UNLZH has 2 entry points, to be used as the programmer needs:

    UNLZH is the 'normal' entry point which expects the file to be completely 
    REWOUND.  At this entry point, the entire file is processed - the 
    standard header is examined, but not reported or acted upon.  By 
    examining the return code, the programmer can discern if the file was, 
    indeed, an LZH-encoded file and act accordingly.

    UNL is a secondary entry point which can be used when the programmer 
    needs to process the standard header information (file name and stamp) 
    and cannot (or doesn't want to) rewind the file.  When this entry point 
    is invoked, the header (down to and including the stamps/comment 
    terminating zero) must have been processed (so the next byte in the input 
    stream will be the revision level).
     
  The revision level of UNLZH.REL performs is at the byte at UNLZH-1.  A hex 
  value of 11 indicates version 1.1, etc.

  CRLZH.REL performs LZH encoding.  It's progamming interface is similar to 
  the CRUNCH.REL it was made to mimic.	The programmer must provide the 
  program with 20k of buffer space.  If RUNTIME buffer allocation is selected 
  (it IS selected in the version supplied with LT, FCRLZH, CRLZH, and 
  UCRLZH),a pointer to the buffer must be supplied in the H/L register pair 
  when the routine is invoked.  If RUNTIME buffer allocation is not selected, 
  the user must supply a PUBLIC symbol, CTABLE, which is the base of the 
  provided buffer area.

  In addition, at invocation time the A register must contain a value for 
  CRLZH to install in the 'CHECKSUM FLAG' portion of the file header (see 
  below).  This byte, to be semi-compatible with C.B. Falconer's version of 
  CRN for the 8080, is a subset of CRN's strategy byte:

                     value (hex)   meaning
                     ----------------------------------------------------
                          00       Standard modulo 65536 checksum is used
                          10       CRC16 is used
                          20,30    Unassigned

  SUPPORT FOR CHECK INFORMATION MUST BE EXTERNALLY PROVIDED IN THE 
  USER-SUPPLIED I/O ROUTINES (see below).  THIS IS ALSO TRUE OF CRN...BUT WAS 
  NOT EMPHASIZED!  CRLZH merely provides the support for posting the value in 
  the output stream since it happens to 'follow' some of the information 
  posted by CRLZH (see the header description, below).

  CRLZH supports no other features of the CRN's strategy byte, all other bits 
  are ignored.

  Once invoked, the routine allocates its own stack and 'stays in control' 
  until the de-compression is completed (or an error is encountered).  The 
  programmer must supply two routines GLZHEN and PLZHEN, via which CRLZH 
  'GETS' bytes from the input stream and 'PUTS' bytes to the output stream, 
  respectively.

  CRLZH *DOES NOT* compute/process checksums, etc. on the  input file.  Any 
  support of such features must be handled externally.  Specifically, the 
  GLZHEN routine must provide for the accumulation of check information and 
  the caller must write that check information to the output stream when 
  CRLZH returns to the caller.

  GLZHEN and PLZHEN should save all registers except the A register and 
  flags. GLZHEN must return the next character from the input stream in the A 
  register.  GLZHEN should return with the CARRY flag RESET for a valid 
  character, or with the CARRY flag SET when the end of the input stream is 
  encountered (the content of the A reg should be zero in that case).  As a 
  service to the user's output processor, every 256th call to PLZHEN is made 
  with the Z flag set (for monitoring).  All other times the Z flag is reset.

  Upon exit (return to the caller), CRLZH returns the following information:

             Carry reset (or A reg = 0) - Success
             Carry set,  A reg = 1      - File already LZH-Encoded,CRUNCHed or
                                          SQueezed
             Carry set,  A reg = 2      - File empty
             Carry set,  A reg = 3      - Insufficient memory

  CRLZH has a single entry point at the label CRLZH.  The user must have 
  placed the standard header information in the output stream and must have 
  the input stream REWOUND prior to invoking CRLZH.

  The revision level of CRLZH.REL performs is at the byte at CRLZH-1.  A hex 
  value of 11 indicates version 1.1, etc.

  Since CRLZH and UNLZH allocate their own stacks, the user is reminded not 
  to make too large a use of that stack in the user-supplied I/O routines.  
  In addition, if the user-supplied I/O routines decide to abort the CRLZH or 
  UNLZH operation (due to operator keystrokes, for example), the user must 
  take steps to restore his own stack.  Upon a normal (or error) return from 
  CRLZH or UNLZH the user's stack is properly restored.

                          STANDARD HEADER information
                         -----------------------------

  LZH encoding follows Steve Greenberg's CRUNCH file format.  The header 
  contains information identifying compression format, original file name, 
  etc:
    field  size     value    Purpose
  -----------------------------------------------------------------------  
      1   1 byte    076h     Signifies compressed form
      2   1 byte    0FDh     Signifies LZH encoding (0ff is for squeezed and
                             0feh is for CRUNCHED)
      3   variable  User     Original file name in the form name.ext Trailing
                  supplied   blanks on the name portion should be suppressed,
                             but a full 3 characters following the '.' should
                             be used for the extension (i.e. no blank
                             suppression).
      4   variable  User     OPTIONAL.  Used for file comment/stamp.  If used
                             the convention is that the comment is placed in
                             square brackets [Like this].  Other information
                             may be placed here (e.g., date stamp).  The
                             logical restriction is that a binary zero must
                             not be part of the comment and/or other informa-
                             tion.
      5   1 byte     00h     Signifies end of STANDARD HEADER
  
  For use of CRLZH, the user must supply all of the information above.

  For UNLZH, use of the UNLZH entry point causes UNLZH to expect to process 
  the above information.  It will discard the file name and optional 
  comment/stamp, but will examine the general form (first 2 fields for a 
  match and general form of the rest of the header).  If the user chooses to 
  use the UNL entry point, UNL will expect to process the first byte 
  following the end of the standard header.

  What follows is the following:

    field  size     value    Purpose
  -----------------------------------------------------------------------  
      6   1 byte  variable   Identifies generating program revision level.
                             (11H signifies program generated by version 1.1
      7   1 byte  variable   Significant revision level.  Indicates major
                             revision level of algorithm for decoding program
                             compatability. (10h indicates significant
                             revision 1.0)
      8   1 byte  variable   Check type.  0=checksum, 1=CRC16, others
                             currently undefined.
      9   1 byte     05h     Currently a SPARE, set to 05H by convention.

  Following this is the compressed file, itself.

                What LZH compression does and how it compares
               -----------------------------------------------

  FIRST - It's SLOW.  Much slower than CRUNCH.  About even with the old 
  SQueeze.  It's the nature of the algorithm, but the current implementation 
  contributes somewhat (more on that later).

  The most impressive aspect of the algorithm is that it compresses further 
  than CRUNCH.  The nature of material being compressed is important - prose 
  and high level language code will compress further.  Since the algorithm 
  depends, in part, on patterns within the file being compressed, I was 
  somewhat surprised to discover that it does a better job (in general) on 
  .COM files than CRUNCH.  Personally, I was surprised to discover that LZH 
  compression of CRUNCHed files is possible (but I've disabled that ability 
  in this release)!

  Examples:
   CRUNCH of SLR180.COM   106% ratio   (actually made a larger file)
   CRLZH  of SLR180.COM    84% ratio
   CRUNCH of TYPELZ22.Z80  45% ratio
   CRLZH  of TYPELZ22.Z80  40% ratio
   CRUNCH of 'C' source    45% ratio   (typical 'C' src selected at random)
   CRLZH  of 'C' source    33% ratio   (same file as above)

                               A small history
                              -----------------

  I am NOT the originator of the LZH encoding.  The program that started my 
  whole involvement in the introduction of this method of compression to the 
  8-bit world bears the following opening comments:
     /*
     * LZHUF.C English version 1.0
     * Based on Japanese version 29-NOV-1988
     * LZSS coded by Haruhiko OKUMURA
     * Adaptive Huffman Coding coded by Haruyasu YOSHIZAKI
     * Edited and translated to English by Kenji RIKITAKE
     */

  This 'C' program implemented the compression algorithm of the LHARC program 
  which arrived on the US scene in the spring of '89.

  Being of a curious nature, I figured I'd play with the algorithm just to 
  understand it (the internal comments were, indeed, sparse - leaving MUCH to 
  the reader's contemplation/reverse engineering) while 'better minds' than I 
  tackled it in earnest.

  Months passed.  I found that I was 'mastering' the algorithm (read that as 
  demonstrating to myself that I understood it) by converting it piece-wise 
  to assembly language.  After a while, I was left with a 'C' language main 
  program, run time library, and I/O with the business end of the compression 
  and decompression implemented entirely in assembly language.  Since the 
  expected event of one of the 'heavies' in the PD and/or compression world 
  releasing a CP/M version of the compression algorithm hadn't come to pass, 
  I set about making a version myself.

  The natural choice was to prepare an analog to the CRUNCH.REL and UNCR.REL 
  of Mr. Steven Greenberg and Mr. C.B. Falconer and append to/substitute in 
  the existing, widely known programs for handling SQueezed and/or CRUNCHed 
  files.

  I saw no reason to tamper with the format CRUNCH uses on the output file.  
  Therefore, with the exception of taking the 'next' file type in sequence 
  (SQueezed files begin with a 76h,FFh sequence; Crunched files with 76h,FEh; 
  so LZH encoded files begin with 076h,FDh) and setting the revision levels 
  in the header to appropriate values , there's no difference in the output 
  file format.  So, you can probably coax your time/date stamping into 
  operating on LZH encoded files.

R. Warren
Sysop, The Elephant's Graveyard (Z-Node#9)
619-270-3148 (PCP area CASDI)
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox