Overview
Data management commands help you inspect articles, reset processing state, and launch the web interface for browsing extracted entities.Article Statistics
just check
Check article database statistics and processing status.
just stats
Source: scripts/check_articles_parquet.py
Display details of a sample article from the dataset, including content preview and processing metadata.Default:
falseUsage Examples
Output Format
Basic Statistics
Statistics Breakdown
| Metric | Description |
|---|---|
| Total Articles | All articles in the Parquet file |
| Processed | Articles that have completed entity extraction |
| Unprocessed | Articles awaiting processing |
| Relevance Checked | Articles that have been evaluated for domain relevance |
Sample Article Output
Processing Status Reset
just reset
Reset processing status of all articles, allowing them to be reprocessed.
scripts/reset_processing_status.py
Interactive Confirmation
The command prompts for confirmation before resetting:Use Cases
After configuration changes
After configuration changes
When you’ve modified extraction prompts or entity schemas and want to reprocess all articles with the new configuration.
After LLM improvements
After LLM improvements
When you’ve upgraded your LLM model or improved extraction logic and want to regenerate all entities.
Testing and development
Testing and development
During development when you need to test the full pipeline repeatedly.
Partial processing errors
Partial processing errors
If a previous processing run was interrupted and you want to start fresh.
The reset command only affects article metadata. Your extracted entities in
data/<domain>/output/*.parquet are not deleted. To start completely fresh, manually delete the entity Parquet files as well.Web Interface
just frontend
Start the web interface for browsing extracted entities.
just web, just ui
Source: src/frontend/
Port: 5001
Accessing the Interface
After starting the server:Open Web Interface
Click to open the Hinbox web interface (when running locally)
Web Interface Features
Home Page
- Entity counts by type (people, events, locations, organizations)
- Recent entities grid with confidence indicators
- Domain switcher to view different projects
Entity Browse Pages
- People
- Events
- Locations
- Organizations
Browse all extracted people with:
- Name and aliases
- Roles and affiliations
- Tag filters (military, political, legal, etc.)
- Confidence scores
- Related article counts
Entity Detail Pages
Each entity has a detail page showing:- Profile text: AI-generated summary from source articles
- Profile versions: Historical versions with timestamps
- Confidence score: Extraction quality indicator
- Aliases: Alternative names found in sources
- Tags: Categorization labels
- Related articles: Source articles with citations
- Grounding report: Citation verification scores
Design System
The interface uses the “Archival Elegance” design system:- Fonts: Crimson Pro (headings), IBM Plex Sans (body)
- Colors: Warm teal-slate primary, amber accents
- Layout: Sidebar filters + main content area
- Style: Minimalist, research-focused aesthetic
The frontend is built with FastHTML for fast, server-rendered HTML with minimal JavaScript.
Data Inspection Workflow
File Locations
Data management commands operate on these files:| File | Purpose | Modified By |
|---|---|---|
data/<domain>/raw_sources/articles.parquet | Source articles | just reset |
data/<domain>/output/processing_status.json | Processing sidecar | just process |
data/<domain>/output/*.parquet | Extracted entities | just process |
Additional Data Commands
Miami Herald Specific (Legacy)
These commands are specific to the Guantanamo Bay / Miami Herald dataset:just fetch-miami
just fetch-miami
Fetch Miami Herald articles from the source API.Source:
scripts/get_miami_herald_articles.pyThis is a legacy command for the original Guantanamo domain. For custom domains, you’ll provide your own article sources.
just import-miami
just import-miami
Import Miami Herald articles from JSONL format to Parquet.Source: Converts JSONL article exports into the Parquet format used by the pipeline.
scripts/import_miami_herald_articles.pyMonitoring Processing Progress
Combine commands to monitor progress:Command Reference Summary
| Command | Purpose | Interactive |
|---|---|---|
just check | Show article statistics | No |
just check --sample | Show stats + sample article | No |
just stats | Alias for just check | No |
just reset | Reset processing status | Yes (confirms) |
just frontend | Start web interface | Yes (server) |
just web | Alias for just frontend | Yes (server) |
just ui | Alias for just frontend | Yes (server) |
Troubleshooting
Articles file not found
Articles file not found
Error:
ERROR: Articles file not found at data/.../articles.parquetSolution: Ensure your articles Parquet file exists at the path specified in your domain config, or provide a custom path with --articles-path.Frontend port already in use
Frontend port already in use
Error:
OSError: [Errno 48] Address already in useSolution: Another process is using port 5001. Either:- Stop the other process
- Kill the existing frontend:
pkill -f "python -m src.frontend" - Change the port in
src/frontend/app_config.py
No entities showing in frontend
No entities showing in frontend
Cause: No articles have been processed yet, or entity Parquet files are missing.Solution:
- Run
just checkto verify processing status - Run
just process --limit 10to process some articles - Refresh the frontend browser page
Reset cancelled by accident
Reset cancelled by accident
Solution: Simply run
just reset again and enter y when prompted.See Also
- Process & Extraction - Process articles and extract entities
- Domain Management - Create and configure domains
- Frontend Configuration - Customize the web interface