Grading Overview
Your assignment will be graded based on:- Correctness - Does your code produce the correct output?
- Output Format - Does output exactly match specifications?
- Functionality - Do all required features work?
- Error Handling - Does your code handle errors properly?
- Code Quality - Is your code well-structured and free of memory leaks?
Test Categories
Basic Functionality Tests
These tests verify:- Program compiles without errors
- Command-line argument processing works
- Help message format is correct
- Error messages match expected format
Compression Tests
These tests verify:- LZ77 compression finds matches correctly
- Huffman encoding produces valid trees
- DEFLATE algorithm produces valid compressed output
- Compressed files are valid GZIP format
- CRC checksums are computed correctly
Decompression Tests
These tests verify:- GZIP header parsing works correctly
- Uncompressed blocks (BTYPE=00) decompress correctly
- Fixed Huffman blocks (BTYPE=01) decompress correctly
- Dynamic Huffman blocks (BTYPE=10) decompress correctly
- LZ77 distance/length pairs decode correctly
- Multi-block files decompress correctly
- CRC validation works
Edge Case Tests
These tests verify your code handles:- Empty files
- Single-byte files
- Files with no matches (all literals)
- Files with maximum distance references (32,768)
- Files with maximum length matches (258)
- LZ77 references that wrap (distance < length)
- Optional GZIP header fields (name, comment, extra, HCRC)
- Multiple blocks in a single member
Error Handling Tests
These tests verify your code properly rejects:- Invalid GZIP magic numbers
- Corrupt headers
- Invalid block types (BTYPE=11)
- Invalid distance codes (30, 31)
- Malformed Huffman trees
- Files that end prematurely
- NULL pointer arguments
- Invalid command-line arguments
Output Format Grading
What is Checked
- Exact text of all output messages
- Spacing and punctuation
- Field ordering in member summary
- Number formatting (decimal, no leading zeros)
- Newline placement
- Error messages go to stderr, not stdout
- No extraneous output in production mode
Use Provided Macros
The assignment provides macros inglobal.h for all output:
Use these macros exactly as provided. Do not create your own printf statements for these messages.
Testing Approach
Automated Testing
Grading uses automated test scripts that:- Compile your code with the provided Makefile
- Run your program with various inputs
- Compare output to expected results (byte-for-byte)
- Check exit codes (0 for success, 1 for errors)
- Verify compressed files are valid GZIP
- Decompress your output and verify it matches original
Test Data
Your code will be tested with:- Text files (ASCII)
- Binary files
- Small files (< 100 bytes)
- Medium files (100 - 10,000 bytes)
- Large files (> 65,535 bytes, requiring multiple blocks)
- Highly compressible files (repetitive data)
- Low compressibility files (random data)
- Real-world files (images, documents, etc.)
Comparison Testing
Your output may be compared against:- Reference implementation results
- Standard
gziputility output - Other compression utilities
Your compressed output doesn’t need to be identical to
gzip, but it must be valid and decompressible.Memory Testing
Memory Leak Detection
Your code will be tested with Valgrind to detect:- Memory leaks
- Invalid memory access
- Use of uninitialized memory
- Double frees
- Invalid frees
Best Practices
- Always check malloc/calloc return values
- Free all allocated memory before returning
- Free memory in error paths too
- Don’t access memory after freeing it
- Initialize all variables before use
- Don’t read past end of buffers
Performance Considerations
While performance is not the primary grading criterion, your code should:- Complete within reasonable time limits (e.g., < 10 seconds for most files)
- Not use excessive memory
- Not have O(n²) or worse algorithms where O(n) is possible
The assignment mentions “optimize a program” in the introduction, but the focus is on correctness and debugging rather than extreme optimization.
Code Quality
What is Evaluated
- Code organization (functions in correct files)
- Proper use of data structures
- Appropriate function decomposition
- Consistent coding style
- Meaningful variable names
- No compiler warnings
What is NOT Graded
- Comment density (though some comments are helpful)
- Specific coding style (tabs vs spaces, etc.)
- Efficiency beyond reasonable bounds
Partial Credit
The grading tests are designed to award partial credit:- If compression doesn’t work but decompression does, you’ll get credit for decompression
- If fixed Huffman works but dynamic doesn’t, you’ll get credit for fixed
- If argument parsing works but compression fails, you’ll get credit for argument parsing
Implement features incrementally and test each one. Getting partial credit is better than trying to implement everything and having nothing work.
Common Mistakes to Avoid
Output Format Issues
- Using printf instead of provided macros
- Adding debug output in production builds
- Sending errors to stdout instead of stderr
- Extra newlines or spacing
Algorithm Issues
- Off-by-one errors in bit manipulation
- Incorrect byte/bit ordering (endianness)
- Not handling LZ77 wrapping (distance < length)
- Incorrect Huffman tree construction
- Not terminating blocks with code 256
Memory Issues
- Not checking malloc return values
- Memory leaks in error paths
- Buffer overruns
- Using freed memory
- Not freeing optional header fields
Error Handling Issues
- Not returning NULL on errors
- Not checking for invalid inputs
- Continuing after errors instead of aborting
- Not validating distance codes
Testing Strategy
Recommended Testing Process
Debugging Tips
- Use the
debugmacro liberally (only shows in debug builds) - Print intermediate values (bit positions, decoded symbols, etc.)
- Test with very small files first (1-10 bytes)
- Compare your output bit-by-bit with expected results
- Use hexdump to examine binary files:
hexdump -C file.gz - Use xxd to see bit patterns:
xxd -b file.gz
Criterion Testing Framework
The assignment uses Criterion for unit testing. While you don’t need to write Criterion tests yourself, understanding how they work helps:- Tests are in the
tests/directory - Each test calls your functions directly
- Tests check return values and output
- The reason main.c must only contain main() is so tests can link to your other functions
Final Checklist
Before submission, verify:- Code compiles with no errors or warnings
- All command-line argument combinations work
- Member summary displays correctly
- Compression produces valid GZIP files
- Decompression works for all block types
- Round-trip (compress then decompress) works
- Error cases are handled properly
- Output format exactly matches specifications
- No memory leaks (check with Valgrind)
- No extraneous output in production builds
- main.c only contains #includes, #defines, and main()
- You wrote comprehensive tests beyond the examples
Start early and test frequently. This assignment is challenging and requires experimentation with the data formats.