Converting to JSON
bormeparser can convert parsed BORME data to JSON format, making it easier to work with the data in other applications, databases, or analysis tools.
Quick Start
import bormeparser
# Parse and convert to JSON
borme = bormeparser.parse('BORME-A-2015-123-29.pdf', 'A')
output_path = borme.to_json()
print(f'JSON created at: {output_path}')
# Output: BORME-A-2015-123-29.json
The to_json() Method
The to_json() method converts a Borme object to a JSON file:
Function Signature
borme.to_json(path=None, overwrite=True, pretty=True, include_url=True)
Parameters
path: Output path (file or directory). If None, uses the same name as the source file
overwrite: Overwrite existing file (default: True)
pretty: Use indentation for readability (default: True)
include_url: Include the official download URL (default: True, requires internet)
Return Value
Returns the path to the created JSON file, or False if the file exists and overwrite=False.
Basic Usage
Convert with default settings
import bormeparser
borme = bormeparser.parse('BORME-A-2015-123-29.pdf', 'A')
# Creates BORME-A-2015-123-29.json in current directory
json_path = borme.to_json()
print(f'Created: {json_path}')
Specify output directory
import bormeparser
borme = bormeparser.parse('BORME-A-2015-123-29.pdf', 'A')
# Save to specific directory (uses CVE as filename)
json_path = borme.to_json(path='output/')
print(f'Created: {json_path}')
# Output: output/BORME-A-2015-123-29.json
Specify output filename
import bormeparser
borme = bormeparser.parse('BORME-A-2015-123-29.pdf', 'A')
# Save with custom filename
json_path = borme.to_json(path='output/malaga-2015-06-01.json')
print(f'Created: {json_path}')
JSON Output Structure
The generated JSON file contains complete BORME data in a structured format:
{
"cve": "BORME-A-2015-123-29",
"date": "2015-06-01",
"seccion": "A",
"provincia": "Málaga",
"num": 123,
"from_anuncio": 1234,
"to_anuncio": 1380,
"num_anuncios": 147,
"url": "https://boe.es/borme/dias/2015/06/01/pdfs/BORME-A-2015-123-29.pdf",
"version": "2001",
"raw_version": "1",
"anuncios": {
"1234": {
"empresa": "EXAMPLE COMPANY SL",
"registro": "MÁLAGA",
"sucursal": false,
"liquidacion": false,
"datos registrales": "T 1234 F 567 S 8 H MA 12345 I/A 1",
"num_actos": 3,
"actos": [
{
"Constitución": "2015-05-15"
},
{
"Objeto social": "Servicios de consultoría empresarial"
},
{
"Nombramientos": {
"Administrador único": [
"DOE SMITH JOHN"
]
}
}
]
}
}
}
Top-Level Fields
cve: CVE identifier (document ID)
date: Publication date (ISO format)
seccion: Section (A, B, or C)
provincia: Province name
num: BORME number for this date
from_anuncio: First announcement ID
to_anuncio: Last announcement ID
num_anuncios: Total number of announcements
url: Official download URL
version: File format version
raw_version: Raw parser version
Announcement Fields
empresa: Company name
registro: Commercial registry
sucursal: Branch office indicator (boolean)
liquidacion: Liquidation status (boolean)
datos registrales: Registry data string
num_actos: Number of commercial acts
actos: Array of commercial acts
Pretty vs Compact JSON
Pretty Printing (Default)
Easy to read, larger file size:
import bormeparser
borme = bormeparser.parse('BORME-A-2015-123-29.pdf', 'A')
borme.to_json(pretty=True) # Default
Output:
{
"cve": "BORME-A-2015-123-29",
"date": "2015-06-01",
"anuncios": {
"1234": {
"empresa": "EXAMPLE COMPANY SL"
}
}
}
Compact Printing
Minified, smaller file size:
import bormeparser
borme = bormeparser.parse('BORME-A-2015-123-29.pdf', 'A')
borme.to_json(pretty=False)
Output:
{"cve":"BORME-A-2015-123-29","date":"2015-06-01","anuncios":{"1234":{"empresa":"EXAMPLE COMPANY SL"}}}
Including or Excluding URLs
Control whether to include the official download URL:
import bormeparser
borme = bormeparser.parse('BORME-A-2015-123-29.pdf', 'A')
# Include URL (requires internet connection)
borme.to_json(include_url=True) # Default
# Exclude URL (works offline)
borme.to_json(include_url=False)
Setting include_url=True requires an internet connection to fetch the BORME number from the XML file.
Handling Existing Files
Control behavior when the output file already exists:
import bormeparser
import os
borme = bormeparser.parse('BORME-A-2015-123-29.pdf', 'A')
# Overwrite existing file (default)
path = borme.to_json(overwrite=True)
print(f'Created: {path}')
# Don't overwrite existing file
path = borme.to_json(overwrite=False)
if not path:
print('File already exists, not overwriting')
Complete Example: borme_to_json.py
Here’s the complete script from scripts/borme_to_json.py:
import bormeparser
import bormeparser.backends.pypdf2.parser
from bormeparser.backends.defaults import OPTIONS
# Enable company name sanitization
OPTIONS['SANITIZE_COMPANY_NAME'] = True
import argparse
import logging
import os
if __name__ == '__main__':
parser = argparse.ArgumentParser(
description='Convert BORME A PDF files to JSON.'
)
parser.add_argument('filename', help='BORME A PDF filename')
parser.add_argument(
'--debug',
action='store_true',
default=False,
help='Debug mode'
)
parser.add_argument(
'-o', '--output',
help='Output directory or filename (default is current directory)'
)
args = parser.parse_args()
# Enable debug logging
if args.debug:
bormeparser.borme.logger.setLevel(logging.DEBUG)
bormeparser.backends.pypdf2.parser.logger.setLevel(logging.DEBUG)
print(f'\nParsing {args.filename}')
borme = bormeparser.parse(args.filename, bormeparser.SECCION.A)
path = borme.to_json(args.output)
if path:
print(f'Created {os.path.abspath(path)}')
else:
print(f'Error creating JSON for {args.filename}')
Batch Processing Multiple Files
Convert multiple BORME files to JSON:
import bormeparser
import os
import glob
def batch_convert_to_json(input_dir, output_dir):
"""Convert all PDF files in a directory to JSON"""
# Create output directory
os.makedirs(output_dir, exist_ok=True)
# Find all BORME PDF files
pdf_files = glob.glob(os.path.join(input_dir, 'BORME-A-*.pdf'))
print(f'Found {len(pdf_files)} PDF files')
success_count = 0
error_count = 0
for pdf_file in pdf_files:
try:
print(f'Processing {os.path.basename(pdf_file)}...')
# Parse the PDF
borme = bormeparser.parse(pdf_file, 'A')
# Convert to JSON
json_path = borme.to_json(
path=output_dir,
pretty=True,
include_url=False # Skip URL for faster processing
)
if json_path:
print(f' Created {os.path.basename(json_path)}')
success_count += 1
else:
print(f' Skipped (already exists)')
except Exception as e:
print(f' Error: {e}')
error_count += 1
print(f'\nCompleted: {success_count} successful, {error_count} errors')
# Usage
batch_convert_to_json('downloads/pdfs/', 'downloads/json/')
Loading JSON Back to Borme Object
You can load a JSON file back into a Borme object:
import bormeparser
from bormeparser.borme import Borme
# Load from JSON file
borme = Borme.from_json('BORME-A-2015-123-29.json')
# Access data as normal
print(f'Date: {borme.date}')
print(f'Province: {borme.provincia}')
print(f'Announcements: {len(borme.get_anuncios())}')
# Iterate over companies
for anuncio in borme.get_anuncios():
print(f'{anuncio.id}: {anuncio.empresa}')
Loading from File Object
import bormeparser
from bormeparser.borme import Borme
# Load from file object
with open('BORME-A-2015-123-29.json', 'r') as f:
borme = Borme.from_json(f)
print(f'Loaded {len(borme.get_anuncios())} announcements')
Working with JSON Data
Once you have JSON files, you can process them with standard tools:
Using Python’s json module
import json
# Read JSON data
with open('BORME-A-2015-123-29.json', 'r') as f:
data = json.load(f)
print(f"Province: {data['provincia']}")
print(f"Date: {data['date']}")
print(f"Total announcements: {data['num_anuncios']}")
# Find companies by name
for anuncio_id, anuncio in data['anuncios'].items():
if 'TECHNOLOGY' in anuncio['empresa']:
print(f"{anuncio_id}: {anuncio['empresa']}")
Using jq command line
# Get total number of announcements
jq '.num_anuncios' BORME-A-2015-123-29.json
# Extract all company names
jq '.anuncios | to_entries | .[].value.empresa' BORME-A-2015-123-29.json
# Find announcements with "Constitución"
jq '.anuncios | to_entries | .[] | select(.value.actos | .[] | has("Constitución"))' BORME-A-2015-123-29.json
JSON File Naming
The output filename depends on the path parameter:
import bormeparser
borme = bormeparser.parse('BORME-A-2015-123-29.pdf', 'A')
# No path: uses source filename with .json extension
borme.to_json() # -> BORME-A-2015-123-29.json
# Directory path: uses CVE as filename
borme.to_json('output/') # -> output/BORME-A-2015-123-29.json
# File path: uses specified filename
borme.to_json('output/malaga.json') # -> output/malaga.json
Error Handling
import bormeparser
import os
def safe_convert_to_json(pdf_file, output_dir):
"""Convert PDF to JSON with error handling"""
try:
# Check if PDF exists
if not os.path.isfile(pdf_file):
print(f'Error: File not found - {pdf_file}')
return False
# Parse the file
borme = bormeparser.parse(pdf_file, 'A')
# Convert to JSON
json_path = borme.to_json(
path=output_dir,
overwrite=False,
include_url=False
)
if json_path:
print(f'Success: {json_path}')
return True
else:
print(f'Skipped: File already exists')
return False
except Exception as e:
print(f'Error converting {pdf_file}: {e}')
return False
# Usage
safe_convert_to_json('BORME-A-2015-123-29.pdf', 'output/')
Next Steps
With JSON files, you can:
- Import data into databases (MongoDB, PostgreSQL, etc.)
- Analyze trends using data science tools (pandas, numpy)
- Build web applications with the data
- Create data visualizations
- Integrate with business intelligence tools