Board Logo
« LZ77 Compression »

Welcome Guest. Please Login or Register.
Dec 20th, 2014, 11:14am


Conforums Terms of Service | Membership Rules | Home | Search | Recent Posts | Notification | Format Your Message | Installation FAQ

Please use the forums Search feature before asking.
Please post code using the code box described in Format Your Messages.
This will keep indentation, separate it better form the message and prevent gibberish.
If the code is too long for one post or additional files are needed, upload a ZIP archive to the Just BASIC Files Archive Site.

« Previous Topic | Next Topic »
Pages: 1  Notify Send Topic Print
 thread  Author  Topic: LZ77 Compression  (Read 1001 times)
AltBas
Full Member
ImageImageImageImage


member is offline

Avatar

JB 1.01 8086 4.77 MHz 640K Win 1.0 quad 160K drives


PM


Posts: 399
xx LZ77 Compression
« Thread started on: May 17th, 2011, 09:09am »

This is my take on LZ77 compression. LZ77 coding looks in a buffer (sliding window) of previously read characters for duplicates of the current search characters, replacing the duplicate characters with an offset to the characters and the number of characters to take. Technically, the offset should be from the end of the buffer, but I used the beginning of the buffer to cut out a step of converting to/from LEN(Buffer$) - Offset. This is a simple encoding scheme that is "tuneable". I chose 12 bits of offset (a sliding window of 4096 bytes) and 3 bits length (8 characters) biased by 3 (because we don't replace less than 3 characters) because this seems to give the best general compression. Other values that fit in the bit coding size are 11 bits Offset and 4 bits Length (2048 byte buffer / 16+3 length bytes copy) and 10 bit Offset and 5 bits Length (1024 byte buffer / 32+3 byte Length). The more redundant the data source, the smaller the buffer you can use, and larger Copy Length. The first part is an "8 Bit Lite" version that remaps ASCII char > 128 into unused characters. The second post of this thread is a full 8 bit compression, but it doesn't compress as much as the "Lite" version. It also shows how to encode a multiple choice run Length. Aren't bits wonderful?

Code Part I - 7 Bit LZ77 Compression
Code:
'*** LZ_77.Jb - An LZ77 Implementation ***
  '* Code is in the public domain *
  '* Marker of File is chr(22) - "L" xor "Z" = 22 *
  '* Followed by File Type byte *
  '* More info at http://en.wikipedia.org/wiki/LZ77_and_LZ78 *

GLOBAL k4K
dim Pwr(31), PwrM1(31), Cnt(255), FList$(1,5), Runs(4)

  k4K = 4096  '* Sliding window constant *

  for a = 0 to 31  '* Powers of 2 and bitmasks *
    Pwr(a) = 2^a : PwrM1(a) = Pwr(a) - 1
  next a

  k4K = 4096  '* Sliding window constant *

  EnCode = 1
  Src$ = "contest.txt"

  open Src$ for input as #1
  Buf$ = input$(#1, LOF(#1))
  CLOSE #1

  if EnCode = 1 then
    call CompressLZ Src$, Dst$
  else
    call ExpandLZ Src$, Dst$
  end if
END

'======================================================================
'*** Determine if a File Exists ***
'======================================================================
FUNCTION FileExist(Mask$)
  FILES "", Mask$, FList$()
  FileExist = val(FList$(0,0))
END FUNCTION

'======================================================================
'*** Determine which Procedure to Compress File with ***
'======================================================================
SUB CompressLZ Src$, Dst$

  if FileExist(Src$) = 0 then
    print "File ";Src$; " not found." : exit sub
  end if

  open Src$ for input as #1  '*** File ID Section ***
  Buf$ = input$(#1,LOF(#1)) : LB = len(Buf$)
  close #1

  for a = 1 to LB  '* Determine character distribution ***
    Tmp = asc(mid$(Buf$,a,1))
    Cnt(Tmp) = Cnt(Tmp) + 1
  next a
  for a = 0 to 127
    if Cnt(a) > 0 then LoCnt = LoCnt + 1
  next a
  for a = 128 to 255
    if Cnt(a) > 0 then HiCnt = HiCnt + 1
  next a

  if HiCnt = 0 then
    FID = 127  '* 7 bits, w/ 0 bit marker *
  else
  '* Can remap many char if HiCnt + LoCnt < 128 *
    if HiCnt + LoCnt < 128 then
      FID = 128  '* 7 bit re-map *
      Tmp = 1
      for a = 128 to 255  '* Re-map HiChars around LoChar *
        if Cnt(a) > 0 then
          DO while Cnt(Tmp) > 0
            Cnt(Tmp) = Tmp  '* "Re-map" LoChar to self Look-Up *
            Tmp = Tmp + 1  '* Find unused "Char" *
          LOOP
          Cnt(Tmp) = a : Cnt(a) = Tmp : Tmp = Tmp + 1
        end if
      next a
      ReMapCnt = Tmp - 1
    else  '*** 8 Bit File ***
      FID = 255
    end if
  end if

  select case FID
    case 127 : Size = EncodeLZ7( Dst$, FID, Buf$, 0)
    case 128 : Size = EncodeLZ7( Dst$, FID, Buf$, ReMapCnt)
    case 255 : Size = EncodeLZ8( Dst$, FID, Buf$)
  end select

END SUB

'======================================================================
'*** Determine which Procedure to Expand File with ***
'======================================================================
SUB ExpandLZ Src$, Dst$

  if FileExist(Src$) = 0 then
    print "File ";Src$; " not found." : exit sub
  end if
  redim Cnt(255)

  open Src$ for input as #1
  Buf$ = input$(#1,LOF(#1)) : LB = len(Buf$)
  close #1

  if asc(left$(Buf$,1)) = 22 then
    select case asc(mid$(Buf$, 2, 1))
      case 127 : Buf$ = mid$(Buf$,3)  '* Only LoAsc char *
        Expand = DecodeLZ7( Dst$, Buf$, 0)

      case 128 : ReMapCnt = asc(mid$(Buf$,3,1))  '* Re-mapped HiAsc *
        for a = 1 to ReMapCnt
          Cnt(a) = asc(mid$(Buf$,a+3,1))
        next a
        Buf$ = mid$(Buf$,a+3)
        Expand = DecodeLZ7( Dst$, Buf$, ReMapCnt)

      case 255 : Buf$ = mid$(Buf$,3)  '* Binary file *
        Expand = DecodeLZ8( OutFile$, Src$)
    end select
  end if

END SUB

'======================================================================
'*** Encode 7 Bit LZ, possibly with Character Re-Map ***
'======================================================================
FUNCTION EncodeLZ7( OutFile$, FID, BYREF Buf$, ReMapCnt)

  LB = len(Buf$)
  open OutFile$ for output as #1
  Hdr$ = chr$(22); chr$(FID)
  if ReMapCnt > 0 then
    Hdr$ = Hdr$; chr$(ReMapCnt)
    for a = 1 to ReMapCnt : Hdr$ = Hdr$; chr$(Cnt(a)) : next a
  end if
  print #1, Hdr$;  '* Suppress CR/LF *

  ptr = 4
  Out$ = mid$(Buf$, 1, 3)  '* Initial un-compressible string *
  Wndw$ = Out$
  DO
    C$ = mid$(Buf$, ptr, 3)  '* Get 3 chars to scan for *
    if len(C$) = 3 then
      optr = ptr
      ptr = ptr + 2
      pWndw = 1
      for a = 0 to 7  '* Look for string matching C$ in Wndw$ *
        p = instr(Wndw$, C$, pWndw)
        if p = 0 then
          if a > 0 then  '* Prev match, but not this char *
            LZ = 32768 + (pWndw * 8) + (len(C$) - 4)
            '* UnFound C$ has extra char, don't count it *
            ptr = ptr - 1  '* Back up ptr from miss *
          end if
          exit for
        end if
        pWndw = p
        if ptr < len(Buf$) AND len(C$) < 10 then
          '* Only replacing 3 bytes or more, so bias length info *
          '* 000 is 3 char; 111 is 10 char *
          ptr = ptr + 1
          C$ = C$ + mid$(Buf$, ptr, 1)
        else
          LZ = 32768 + (pWndw * 8) + (len(C$) - 3)  '* Maxed out *
          exit for
        end if
      next a
      if LZ > 0 then
        Tmp = (LZ AND 7) + 3
        Out$ = Out$ + chr$(int(LZ/256)) + chr$(LZ AND 255)
        Wndw$ = Wndw$ + mid$(Buf$, optr, Tmp)
        LZ = 0
      else
        if asc(C$) > 127 then
          Out$ = Out$ + chr$(Cnt(asc(C$)))
        else
          Out$ = Out$ + left$(C$,1)
        end if
        Wndw$ = Wndw$ + left$(C$,1)
        ptr = optr
      end if
      Wndw$ = right$(Wndw$, k4K)  '* Shift window right *
      if len(Out$) > k4K then print #1, Out$; : Out$ = ""
    else
      Out$ = Out$ + C$
      exit do  '* EOF - Exit *
    end if
    ptr = ptr + 1
  LOOP until ptr > LB

  print #1, Out$;  '* Suppress CR/LF *
  EncodeLZ7 = LOF(#1)
  CLOSE #1
END FUNCTION

'======================================================================
'*** Decode 7 Bit LZ, possibly with Character Re-Map ***
'======================================================================
FUNCTION DecodeLZ7( OutFile$, BYREF Src$, ReMapCnt)
  '* FID & ReMap$ read prior to entry, not in Src$ buffer *

  open OutFile$ for output as #1
  Buf$ = left$(Src$, 3)  '* 1st 3 char not compressed *
  LB = len(Src$)
  Wndw$ = Buf$
  ptr = 4
  DO
    C$ = mid$(Src$, ptr, 1)
    if asc(C$) > 127 then  '* If HiBit set, then LZ code *
      ptr = ptr + 1
      LZ = (asc(C$) * 256) + asc(mid$(Src$, ptr, 1))
      Offset = (int(LZ / 8) AND (k4K-1))  '* Loc to copy from *
      Cnt = (LZ AND 7) + 3  '* # Char to copy *
      C$ = mid$(Wndw$, Offset, Cnt)
    else
      if asc(C$) <= ReMapCnt then C$ = chr$(Cnt(asc(C$)))
        '* Look-up real HiAsc char *
    end if
    Buf$ = Buf$ + C$
    Wndw$ = right$(Buf$, k4K)  '* Shift window (if able) *
    if len(Buf$) > 8192 then
      print #1, left$(Buf$, k4K);
      Buf$ = mid$(Buf$, k4K+1)
    end if
    ptr = ptr + 1
  LOOP until ptr > LB

  print #1, Buf$;
  DecodeLZ7 = LOF(#1)
  CLOSE #1

END FUNCTION 

- AltBas
« Last Edit: May 18th, 2011, 1:41pm by AltBas » User IP Logged

AltBas
Full Member
ImageImageImageImage


member is offline

Avatar

JB 1.01 8086 4.77 MHz 640K Win 1.0 quad 160K drives


PM


Posts: 399
xx Re: LZ77 Compression
« Reply #1 on: May 17th, 2011, 09:11am »

Code Part II - 8 Bit LZ77 Compression
Put this together with Part I Code for a simple LZ77 encoder. The multiple choice run Length encoding was inspired by reading about LZMA compression on Wikipedia. There are many variations in the LZ family.
Code:
  '* Code is in the public domain *
'======================================================================
'*** Encode 8 Bit LZ ***
'======================================================================
FUNCTION EncodeLZ8( OutFile$, FID, BYREF Buf$)

  LB = len(Buf$)
  open OutFile$ for output as #1
'*  Hdr$ = chr$(22); chr$(FID)
  print #1, chr$(22); chr$(FID);  '* Suppress CR/LF *

  ptr = 4
  Out$ = mid$(Buf$, 1, 3)  '* Initial un-compressible string *
  Wndw$ = Out$
  DO
    C$ = mid$(Buf$, ptr, 3)  '* Get 3 chars to scan for *
    if len(C$) = 3 then
      optr = ptr
      ptr = ptr + 2
      pWndw = 1

      for a = 0 to 71  '* Look for string matches *
        p = instr(Wndw$, C$, pWndw)
        if p = 0 then
          if a > 0 then  '* Prev match, but not this char *
            C$ = left$(C$, len(C$)-1)  '* UnFound C$ has extra char *
            ptr = ptr - 1  '* Back up ptr *
            gosub [SetCode]
          end if
          exit for
        end if
        pWndw = p
        if ptr < LB AND a < 71 then
          ptr = ptr + 1
          C$ = C$ + mid$(Buf$, ptr, 1)  '* Get another char *
        else
          gosub [SetCode]  '* Max run length or EOB, exit loop *
          exit for
        end if
      next a

      if LZ > 0 then
        if Bits = 18 then
          Tmp = (LZ AND 7) + 3  '* Adjust buffer ptr *
        else
          Tmp = (LZ AND 63) + 11
        end if
        Wndw$ = Wndw$ + mid$(Buf$, optr, Tmp)
      else
        if asc(C$) > 127 then
          Bits = 9
          LZ = asc(C$) + 256  '* Set Bit 8 for HiAsc *
        else
          Bits = 8
          LZ = asc(left$(C$,1))
        end if
        Wndw$ = Wndw$ + left$(C$,1)
        ptr = optr
      end if

      call PushStack LZ, Bits, Stk, TOS
      LZ = 0
      DO while TOS > 7
        Out$ = Out$ + chr$(PopStack(8, Stk, TOS))
      LOOP

      Wndw$ = right$(Wndw$, k4K)  '* Shift window right *
      if len(Out$) > k4K then print #1, Out$; : Out$ = ""
    else
      '* EOF - Exit *
      for a = 1 to len(C$)  '* Len is 1 or 2, so no run *
        call PushStack asc(mid$(C$,a,1)), 8, Stk, TOS
        Out$ = Out$ + chr$(PopStack(8, Stk, TOS))
      next a
      DO while TOS > 7
        Out$ = Out$ + chr$(PopStack(8, Stk, TOS))
      LOOP
      exit do
    end if
    ptr = ptr + 1
  LOOP until ptr > LB-2

  if Stk > 0 or TOS > 0 then  '* Must ALWAYS 0 TOS when encoding *
    call PushStack 0, 8-TOS, Stk, TOS
    Out$ = Out$ + chr$(PopStack(8, Stk, TOS))
  end if

  print #1, Out$;  '* Suppress CR/LF *
  EncodeLZ8 = LOF(#1)
  CLOSE #1
EXIT FUNCTION

[SetCode]
  if a < 9 then  '* C$ is already 3 char when a = 0 *
    Bits = 18
    LZ = 2^17 + (pWndw * 16) + 0 + (len(C$) - 3)
    '* 10 dddddddddddd 0 lll = 18 bits *
  else
    Bits = 21
    LZ = 2^20 + (pWndw * 128) + 64 + (len(C$) - 11)
    '* 10 dddddddddddd 1 llllll = 21 bits *
  end if
  RETURN
END FUNCTION

'======================================================================
'*** Decode 8 Bit LZ ***
'======================================================================
FUNCTION DecodeLZ8( OutFile$, BYREF Src$)

  open OutFile$ for output as #1
  Buf$ = left$(Src$, 3)  '* 1st 3 char not compressed *
  LB = len(Src$)
  Wndw$ = Buf$
  ptr = 4
  DO
    DO while (ptr <= LB) and (TOS < 21)
      call PushStack asc(mid$(Src$, ptr, 1)), 8, Stk, TOS
      ptr = ptr + 1
    LOOP
    gosub [PopBytes]
    Wndw$ = right$(Buf$, k4K)  '* Shift window (if able) *
    if len(Buf$) > 8192 then
      print #1, left$(Buf$, k4K);
      Buf$ = mid$(Buf$, k4K+1)
    end if
  LOOP until ptr > LB
  if Stk > 0 or TOS > 7 then
    if TOS < 8 then call PushStack 0, 8-TOS, Stk, TOS
    gosub [PopBytes]
  end if
  print #1, Buf$;
  DecodeLZ8 = LOF(#1)
  CLOSE #1
EXIT FUNCTION

[PopBytes]
  if PopStack( 1, Stk, TOS) = 0 then
    C$ = chr$(PopStack( 7, Stk, TOS))  '* 7 bit char 0bbb bbbb *
  else
    if PopStack( 1, Stk, TOS) = 0 then  '* LZ code *
      Offset = PopStack( 12, Stk, TOS)
      if PopStack( 1, Stk, TOS) = 0 then
        Cnt = PopStack( 3, Stk, TOS) + 3
      else
        Cnt = PopStack( 6, Stk, TOS) + 11
      end if
      C$ = mid$(Wndw$, Offset, Cnt)
    else  '* 8 bit char *
      C$ = chr$(PopStack( 7, Stk, TOS) + 128)  '* 8 bit 1 1bbb bbbb *
    end if
  end if
  Buf$ = Buf$ + C$
  RETURN
END FUNCTION

'======================================================================
' Put a value on the Bit Stack
'======================================================================
SUB PushStack Char, NumBits, BYREF Stk, BYREF TOS
  '* Char is value to put on stack *
  '* NumBits is the bit size *
  '* Stk = Stack variable *
  '* TOS = Top of stack pointer *
  if Char<Pwr(NumBits) then
    Stk=(Stk * Pwr(NumBits))  '* Push up stack *
    Stk=Stk OR Char  '* Add Char to the stack *
    TOS=TOS+NumBits
  end if
END SUB

'======================================================================
' Get a value from the Bit Stack
'======================================================================
FUNCTION PopStack(NumBits, BYREF Stk, BYREF TOS)
  '* NumBits is the bit size *
  '* Stk = Stack variable *
  '* TOS = Top of stack pointer *
  if NumBits<=TOS then
    TOS=TOS-NumBits
    PopStack=int(Stk / Pwr(TOS))
    Stk=Stk AND PwrM1(TOS)
  else
    PopStack=-1  '* Error ! *
  end if
END FUNCTION
 

- AltBas
« Last Edit: May 17th, 2011, 09:20am by AltBas » User IP Logged

tomc
Senior Member
ImageImageImageImageImage


member is offline

Avatar




PM


Posts: 572
xx Re: LZ77 Compression
« Reply #2 on: May 17th, 2011, 10:50am »

I find every compression system to be fascinating !
User IP Logged

tomc
Senior Member
ImageImageImageImageImage


member is offline

Avatar




PM


Posts: 572
xx Re: LZ77 Compression
« Reply #3 on: May 17th, 2011, 4:06pm »

I have wrote the following LZW routine ===> http://www.dspguide.com/ch27/5.htm and run out of table entries around record 10,000. Another article informed me that LZW 'proper' does not address the table overflow problen. I know there are other articles on how to either flush the table (starting anew) or dropping 'bad' entries in favor of others (which is probably better).
.
Can anyone point me to a better link?
.
For now I am not interested in speed enhancements, only in compression and table modification.

Thanks !
User IP Logged

Pages: 1  Notify Send Topic Print
« Previous Topic | Next Topic »

Conforums Terms of Service | Membership Rules | Home | Search | Recent Posts | Notification | Format Your Message | Installation FAQ

Donate $6.99 for 50,000 Ad-Free Pageviews!

| |

This forum powered for FREE by Conforums ©
Sign up for your own Free Message Board today!
Terms of Service | Privacy Policy | Conforums Support | Parental Controls