Available Formats
BORME publishes its data in three different formats, each serving different purposes and containing different levels of information. Understanding these formats is crucial for effectively working with BORME data.Complete detailed data (bormeparser’s primary target)
XML
Metadata and document structure
HTML
Section C announcements only
PDF Format
PDF is the most important format for extracting detailed business information from BORME.What PDF Contains
PDF files contain the complete and detailed information about all registered acts, including:- Full company names
- Detailed act descriptions
- Names of appointed or revoked officers
- Capital amounts for increases/reductions
- Complete statutory change details
- Registry office information
- Official registration numbers
This is the format bormeparser is designed to parse. All the rich business data you want to extract is contained in these PDF files.
PDF File Structure
Downloading PDF Files
PDF files are available for Sections A and B, organized by province:bormeparser/download.py:47 for the PDF URL format.
PDFs require the bulletin number (nbo) which must be obtained from the XML file first. The bormeparser library handles this automatically.
Why PDF?
Due to agreements between the Spanish government and the Mercantile Register, the most detailed and valuable data is only available in PDF format. While this makes automated processing more challenging, it’s why bormeparser exists.XML Format
XML files serve as an index and metadata container for each day’s BORME publications.What XML Contains
XML files contain:- Bulletin metadata: date, bulletin number (nbo), previous/next dates
- Document structure: sections, provinces, announcement counts
- Download URLs: links to PDF, HTML, and XML files for each document
- File sizes: byte counts for each downloadable file
- CVE identifiers: unique identifiers for each BORME document
XML File Structure
XML URL Format
bormeparser/download.py:48 for implementation.
Using XML Files
The XML file is essential for discovering what BORME documents are available:XML Usage Example
XML Usage Example
XML Structure
The XML follows this hierarchy:bormeparser/borme.py:186-250 for the BormeXML implementation.
HTML Format
HTML files are available only for Section C announcements.What HTML Contains
Section C announcements in HTML format, which include:- Shareholder meeting announcements
- Capital increase/reduction notices
- Other public corporate announcements
HTML File Structure
HTML URL Format
bormeparser/download.py:49 for the HTML URL pattern.
Section C is handled differently because it contains announcements (anuncios) that are not tied to specific provinces. See the Sections documentation for more details.
Format Comparison
Detailed Format Comparison
Detailed Format Comparison
| Feature | XML | HTML | |
|---|---|---|---|
| Detailed business data | ✅ Yes | ❌ No | ⚠️ Section C only |
| Company names | ✅ Yes | ❌ No | ✅ Yes (Section C) |
| Officer names | ✅ Yes | ❌ No | ❌ No |
| Act details | ✅ Yes | ❌ No | ✅ Yes (Section C) |
| Document structure | ⚠️ Implicit | ✅ Yes | ⚠️ Basic |
| Download URLs | ❌ No | ✅ Yes | ❌ No |
| File sizes | ❌ No | ✅ Yes | ❌ No |
| Machine-readable | ❌ Requires parsing | ✅ Yes | ⚠️ Requires parsing |
| Available sections | A, B, C | All | C only |
| By province | ✅ Yes (A, B) | ✅ Yes | ❌ No |
When to Use Each Format
Start with XML
Use XML to discover what documents are available for a given date and get their download URLs.
Technical Implementation
The bormeparser library provides different backends for parsing different formats:- PDF Parsing: Uses PyPDF2 backend for extracting text from PDF files (
bormeparser/backends/pypdf2/) - XML Handling: Uses lxml for parsing XML structure (
BormeXMLclass) - Section C HTML: Uses lxml for HTML parsing (
bormeparser/backends/seccion_c/lxml/)
bormeparser/download.py:42-51 for URL patterns and bormeparser/backends/ for parser implementations.
File Naming Conventions
Sections A and B (PDF)
Section C
XML Summary
Next Steps
Understanding Sections
Learn about BORME Sections A, B, and C
Parsing PDFs
Start parsing BORME PDF files