Overview
The Html reader class loads HTML files containing tables and converts them into spreadsheet format. This is useful for importing data from HTML reports, web pages, or HTML-formatted data exports.
Namespace: PhpOffice\PhpSpreadsheet\Reader\Html
Extends: BaseReader
Implements: IReader
Source: src/PhpSpreadsheet/Reader/Html.php:32
Basic Usage
Simple File Loading
use PhpOffice\PhpSpreadsheet\Reader\Html;
$reader = new Html();
$spreadsheet = $reader->load('data.html');
// Access worksheet data
$sheet = $spreadsheet->getActiveSheet();
$data = $sheet->toArray();
Using IOFactory
use PhpOffice\PhpSpreadsheet\IOFactory;
// Auto-detect and load
$spreadsheet = IOFactory::load('data.html');
// Or create specific reader
$reader = IOFactory::createReader('Html');
$spreadsheet = $reader->load('data.html');
Key Methods
__construct()
Creates a new Html reader instance.
public function __construct();
Example:
canRead()
Checks if the file can be read by this reader.
public function canRead(string $filename): bool;
Path to the file to check
Returns: bool - True if the file appears to be HTML
Example:
$reader = new Html();
if ($reader->canRead('data.html')) {
$spreadsheet = $reader->load('data.html');
}
Loads a spreadsheet from an HTML file.
public function load(string $filename, int $flags = 0): Spreadsheet;
Path to the HTML file to load
Optional flags (limited support for HTML format)
Returns: Spreadsheet object
Example:
$reader = new Html();
$spreadsheet = $reader->load('data.html');
HTML-Specific Configuration
Sets the input character encoding for the HTML file.
public function setInputEncoding(string $encoding): self;
Character encoding (e.g., ‘UTF-8’, ‘ANSI’, ‘ISO-8859-1’)
Example:
$reader = new Html();
$reader->setInputEncoding('UTF-8');
$spreadsheet = $reader->load('data.html');
setSheetIndex()
Sets which worksheet index to use when loading (for multiple tables).
public function setSheetIndex(int $sheetIndex): self;
The 0-based worksheet index
Example:
$reader = new Html();
$reader->setSheetIndex(0);
$spreadsheet = $reader->load('data.html');
setSuppressLoadWarnings()
Controls whether to suppress libxml load warnings.
public function setSuppressLoadWarnings(?bool $suppressLoadWarnings): self;
True to suppress warnings, false to show them, null for default behavior
Example:
$reader = new Html();
$reader->setSuppressLoadWarnings(true);
$spreadsheet = $reader->load('data.html');
// Check for any warnings
$warnings = $reader->getLibxmlMessages();
foreach ($warnings as $warning) {
echo $warning->message;
}
Supported HTML Features
The Html reader recognizes and converts the following HTML elements:
Table Structure
<table> - Converted to worksheet
<tr> - Converted to row
<td> - Converted to cell
<th> - Converted to cell (typically bold)
<thead>, <tbody>, <tfoot> - Structural elements
Text Formatting
<b>, <strong> - Bold text
<i>, <em> - Italic text
<u> - Underlined text
<s>, <strike> - Strikethrough text
<sup> - Superscript
<sub> - Subscript
<h1> to <h6> - Headers with different font sizes
<a> - Hyperlinks (blue, underlined)
<hr> - Horizontal rule (bottom border)
Table Attributes
colspan - Cell spanning multiple columns
rowspan - Cell spanning multiple rows
width - Column width
height - Row height
Style Attributes
The reader parses inline CSS styles:
font-family - Font name
font-size - Font size
font-weight - Bold text
font-style - Italic text
text-decoration - Underline, strikethrough
color - Text color
background-color - Cell background color
border - Cell borders
text-align - Horizontal alignment
vertical-align - Vertical alignment
width - Column width
height - Row height
Simple HTML Table
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Sales Report</title>
</head>
<body>
<table>
<thead>
<tr>
<th>Product</th>
<th>Quantity</th>
<th>Price</th>
</tr>
</thead>
<tbody>
<tr>
<td>Widget</td>
<td>100</td>
<td>$10.00</td>
</tr>
<tr>
<td>Gadget</td>
<td>50</td>
<td>$20.00</td>
</tr>
</tbody>
</table>
</body>
</html>
$reader = new Html();
$spreadsheet = $reader->load('report.html');
HTML with Inline Styles
<table style="border: 1px solid black;">
<tr>
<td style="font-weight: bold; background-color: #cccccc;">Header</td>
<td style="color: red;">Value</td>
</tr>
<tr>
<td style="text-align: center;">Center</td>
<td style="font-style: italic;">Italic</td>
</tr>
</table>
$reader = new Html();
$spreadsheet = $reader->load('styled.html');
HTML with Colspan and Rowspan
<table>
<tr>
<td colspan="2">Merged across 2 columns</td>
</tr>
<tr>
<td rowspan="2">Merged across 2 rows</td>
<td>Cell 1</td>
</tr>
<tr>
<td>Cell 2</td>
</tr>
</table>
$reader = new Html();
$spreadsheet = $reader->load('merged.html');
// Colspan and rowspan are converted to merged cells
Multiple Tables
If an HTML file contains multiple <table> elements, each table is loaded as a separate worksheet:
$reader = new Html();
$spreadsheet = $reader->load('multi-table.html');
// Access different tables
$sheet1 = $spreadsheet->getSheet(0); // First table
$sheet2 = $spreadsheet->getSheet(1); // Second table
$sheet3 = $spreadsheet->getSheet(2); // Third table
echo "Loaded {$spreadsheet->getSheetCount()} tables\n";
Handling Encoding
UTF-8 HTML
$reader = new Html();
$reader->setInputEncoding('UTF-8');
$spreadsheet = $reader->load('utf8.html');
Other Encodings
// ISO-8859-1 (Latin-1)
$reader = new Html();
$reader->setInputEncoding('ISO-8859-1');
$spreadsheet = $reader->load('latin1.html');
// Windows-1252
$reader->setInputEncoding('CP1252');
$spreadsheet = $reader->load('windows.html');
Working with Images
The Html reader can load images from HTML:
$reader = new Html();
// Allow external images (use with caution)
$reader->setAllowExternalImages(true);
$spreadsheet = $reader->load('report.html');
Be cautious when enabling external images as this can expose your application to security risks.
Error Handling
use PhpOffice\PhpSpreadsheet\Reader\Exception as ReaderException;
use PhpOffice\PhpSpreadsheet\Reader\Html;
$reader = new Html();
$reader->setSuppressLoadWarnings(true);
try {
if (!$reader->canRead('data.html')) {
throw new Exception('File is not valid HTML');
}
$spreadsheet = $reader->load('data.html');
// Check for warnings
$warnings = $reader->getLibxmlMessages();
if (!empty($warnings)) {
echo "Warnings during load:\n";
foreach ($warnings as $warning) {
echo "- {$warning->message}\n";
}
}
} catch (ReaderException $e) {
echo 'Error loading HTML file: ' . $e->getMessage();
} catch (\Exception $e) {
echo 'General error: ' . $e->getMessage();
}
Security Considerations
XML External Entity (XXE) Protection
The Html reader uses the XmlScanner security scanner to protect against XXE attacks.
External Resources
Be careful with external images and stylesheets:
$reader = new Html();
// Better: use a whitelist
$reader->setIsWhitelisted(function(string $path): bool {
return str_starts_with($path, 'https://trusted-domain.com/');
});
$reader->setAllowExternalImages(true);
$spreadsheet = $reader->load('report.html');
Complete Example
use PhpOffice\PhpSpreadsheet\Reader\Html;
use PhpOffice\PhpSpreadsheet\Reader\Exception as ReaderException;
// Create and configure reader
$reader = new Html();
$reader->setInputEncoding('UTF-8');
$reader->setSuppressLoadWarnings(true);
try {
// Verify file
if (!$reader->canRead('report.html')) {
throw new Exception('Invalid HTML file');
}
// Load file
$spreadsheet = $reader->load('report.html');
echo "Loaded {$spreadsheet->getSheetCount()} table(s)\n";
// Process each table
foreach ($spreadsheet->getAllSheets() as $index => $sheet) {
echo "\nTable " . ($index + 1) . ":\n";
$highestRow = $sheet->getHighestRow();
$highestColumn = $sheet->getHighestColumn();
echo "Rows: {$highestRow}, Columns: {$highestColumn}\n";
// Process data
$data = $sheet->toArray();
foreach ($data as $row) {
// Process row
print_r($row);
}
}
// Check for warnings
$warnings = $reader->getLibxmlMessages();
if (!empty($warnings)) {
echo "\n" . count($warnings) . " warning(s) encountered\n";
}
} catch (ReaderException $e) {
echo 'Reader error: ' . $e->getMessage();
}
Limitations
- Only processes
<table> elements; other HTML content is ignored
- CSS stylesheets are not fully supported (only inline styles)
- Complex HTML structures may not parse correctly
- JavaScript-generated content is not processed
- Some advanced CSS properties are not supported
- No support for formulas (everything is read as values)
- No support for charts
Tips for Best Results
- Use well-formed HTML - Valid HTML5 markup produces best results
- Use inline styles - External CSS stylesheets are not processed
- Specify encoding - Always set the correct character encoding
- Use simple table structures - Complex nested tables may not parse correctly
- Include charset meta tag - Add
<meta charset="UTF-8"> to HTML
- Test with sample data - Test the reader with a small sample first