Assignment Requirements
Due Date
Sunday, March 18th @ 11:59 PMSubmission Method
This assignment uses Codegrade for submission and grading. Everygit push triggers basic functionality tests.
Code Structure Requirements
Main Function Restriction
This restriction exists because the assignment uses the Criterion testing library, which requires this specific structure.File Organization
You may:- Add as many
.cfiles in thesrcdirectory as needed - Declare as many headers as needed
Some header and
.c files contain comments stating they should not be modified. DO NOT modify these files - they will be replaced with original versions during grading.Library and Function Restrictions
Allowed Libraries
You MAY use:glibc(GNU standard C library), including:- Dynamic memory allocation:
malloc,calloc,free - Standard I/O library:
fread,fwrite,fopen,fclose, etc. - String functions:
strcmp,strlen, etc. - Other standard C functions
- Dynamic memory allocation:
Prohibited Libraries
Academic Integrity Requirements
Original Work
For all assignments in this course:- You must write your own code
- You may not submit source code you did not write yourself
- Exception: Code distributed as part of the base repository
- Exception: Code for which explicit written permission has been given
Implementation Requirements
Part 1: Command-Line Interface
Implement a command-line utility that:- Opens a file specified by arguments
- Processes operations based on command-line flags
- Outputs results to a file or standard output
- Exits with
EXIT_SUCCESS(0) on success - Exits with
EXIT_FAILURE(1) on error
Part 2: LZ77 Compression
Implement inlz.c:
lz_compress_tokens()- Compress data into LZ77 tokens- LZ77 decompression logic (within
huffman_decode) - Helper function
find_match()for finding repeated sequences
- Find longest matching substring within 32,768 previous bytes
- Minimum match length: 3 bytes
- Maximum match length: 258 bytes
- Maximum distance: 32,768 bytes
- Return
NULLon invalid data
Part 3: Huffman Coding
Implement inhuff.c:
huffman_encode_tokens()- Encode LZ77 tokens with Huffman codinghuffman_decode()- Decode Huffman-encoded data- Support for both static and dynamic Huffman codes
- Helper functions for tree construction and code generation
- Support fixed Huffman codes (predefined)
- Support dynamic Huffman codes (tree in data)
- Perform LZ77 decoding within
huffman_decode() - Return
NULLon invalid data
Part 4: GZIP File Format
Implement inzlib.c:
parse_member()- Parse GZIP member headersdeflate()- Compress data using DEFLATE algorithminflate()- Decompress data using INFLATE algorithmskip_gz_header_to_compressed_data()- Navigate to compressed data
- Parse all GZIP header fields correctly
- Handle optional fields (extra, name, comment, HCRC)
- Support multiple block types (uncompressed, fixed, dynamic)
- Handle multi-byte integers in little-endian format
- Compute and verify CRC checksums
Output Format Requirements
Member Summary Format
PRINT_MEMBER_LINE macro in global.h ensures correct formatting.
Error Message Format
Use the provided macros inglobal.h:
PRINT_ERROR_BAD_HEADER()PRINT_ERROR_OPEN_FILE(filename)PRINT_ERROR_MISSING_I_FLAG()PRINT_ERROR_REQUIRE_ONE_OF_MCD()PRINT_ERROR_MISSING_O_FLAG()
stderr.
Debug Output
Use thedebug macro from debug.h for debugging messages. Debug output is only shown when compiled with make debug.
In production mode (compiled with
make), no debug output should appear.Block Size Requirements
Compression
For consistency during testing:- Compressed blocks should hold at most 65,535 bytes
- This is the same restriction as uncompressed blocks
- The DEFLATE spec allows the compressor to determine block boundaries
Decompression
Error Handling Requirements
Your functions must returnNULL or error codes when encountering:
- NULL pointers passed as arguments
- Invalid compression data (illegal distance codes, etc.)
- Malformed GZIP headers
- Invalid block types (BTYPE = 11)
- Distance values of 30 or 31 in LZ77 data
- File I/O errors
- Memory allocation failures
Memory Management Requirements
- Use
mallocorcallocto allocate memory - Always check allocation return values
- Free all allocated memory when no longer needed
- Caller is responsible for freeing returned buffers
- Avoid memory leaks
Functions like
deflate(), inflate(), and huffman_decode() return malloc’d buffers. The caller must free these buffers.Data Structure Requirements
Use the provided data structures:lz_token_t- LZ77 compression tokensgz_header_t- GZIP member headershuff_node_t- Huffman tree nodes
include/ for complete definitions.
Testing Requirements
Test your implementation with:- Valid GZIP files
- Files with different compression methods
- Files with optional header fields
- Edge cases (empty files, single bytes, maximum distances, etc.)
- Invalid data (corrupt headers, invalid block types, etc.)
Deliverables
Submit via git push to your Codegrade repository:- All source files in
src/directory - Any additional header files you created
- Working Makefile (do not modify unless necessary)
- Code that compiles without errors or warnings
- Code that passes basic Codegrade tests
Do not submit:
- Compiled binaries
- Object files (.o)
- Test data files (unless explicitly instructed)
- Your
.gitdirectory is automatically excluded