Skip to main content
Artifact Miner uses SQLite with SQLAlchemy ORM for all persistent data storage. The database schema is managed through Alembic migrations to ensure safe schema evolution.

Database Location

  • File: artifactminer.db (created in project root)
  • Engine: SQLite 3
  • ORM: SQLAlchemy 2.x
  • Migrations: Alembic

Schema Management

Models Definition

All models are defined in src/artifactminer/db/models.py. Key imports:
from sqlalchemy import Column, Integer, Float, String, DateTime, Date, Boolean, JSON, ForeignKey, Text, UniqueConstraint
from sqlalchemy.orm import relationship
from .database import Base

Alembic Migrations

Migrations are stored in alembic/versions/ and automatically applied on API startup. Apply migrations:
uv run alembic upgrade head
Create new migration:
uv run alembic revision --autogenerate -m "Description"
Downgrade one revision:
uv run alembic downgrade -1
Important: Always use migrations instead of manually recreating artifactminer.db.

Core Tables

questions Table

Stores configuration questions presented to users. Model: src/artifactminer/db/models.py:22-41
ColumnTypeDescription
idIntegerPrimary key
keyStringStable identifier (e.g., “email”, “end_goal”)
question_textStringQuestion displayed to user
orderIntegerDisplay order
is_activeBooleanWhether question is currently used
requiredBooleanWhether answer is required
answer_typeStringType of answer (“text”, “email”, “choice”)
created_atDateTimeWhen question was created
Relationships:
  • answersUserAnswer[]
Seeding: Questions are seeded on API startup when table is empty (see db/seeders.py).

user_answers Table

Stores user responses to configuration questions. Model: src/artifactminer/db/models.py:116-144
ColumnTypeDescription
idIntegerPrimary key
question_idIntegerForeign key to questions.id
answer_textTextUser’s answer
answered_atDateTimeWhen answer was submitted
Relationships:
  • questionQuestion
Common Queries:
# Get user email (question_id=1)
email_answer = db.query(UserAnswer).filter(UserAnswer.question_id == 1).first()
user_email = email_answer.answer_text if email_answer else None

# Get all user config
answers = db.query(UserAnswer).order_by(UserAnswer.question_id).all()
config = {ans.question_id: ans.answer_text for ans in answers}

consents Table

Tracks user consent for LLM usage. Model: src/artifactminer/db/models.py:43-50
ColumnTypeDescription
idIntegerPrimary key
consent_levelString”none”, “local”, “local-llm”, or “cloud”
LLM_modelString”ollama” or “chatGPT”
accepted_atDateTimeWhen consent was granted

ZIP Uploads & Portfolios

uploaded_zips Table

Tracks uploaded ZIP files and their extraction paths. Model: src/artifactminer/db/models.py:146-169
ColumnTypeDescription
idIntegerPrimary key
filenameStringOriginal filename
pathStringServer filesystem path
uploaded_atDateTimeUpload timestamp
extraction_pathStringPath to extracted contents
portfolio_idStringUUID linking multiple ZIPs
Portfolio Support: Multiple ZIPs with the same portfolio_id form a portfolio for multi-project analysis.

Repository Statistics

repo_stats Table

Repository-level statistics and metadata. Model: src/artifactminer/db/models.py:51-81
ColumnTypeDescription
idIntegerPrimary key
project_nameStringRepository name (from directory)
project_pathStringFilesystem path to repository
is_collaborativeBooleanMultiple authors or remotes detected
languagesJSONList of file extensions/languages
language_percentagesJSONPercentage breakdown by language
primary_languageStringMost common language
first_commitDateTimeEarliest commit date
last_commitDateTimeMost recent commit date
total_commitsIntegerTotal commit count
frameworksJSONDetected frameworks (Django, React, etc.)
collaboration_metadataJSONCollaboration statistics
ranking_scoreFloatProject ranking (higher = more impressive)
ranked_atDateTimeWhen ranking was calculated
deleted_atDateTimeSoft delete timestamp
health_scoreFloatRepository health (0-100)
thumbnail_urlStringProject thumbnail path or URL
created_atDateTimeWhen record was created
Relationships:
  • project_skillsProjectSkill[]
  • user_project_skillsUserProjectSkill[]
  • resume_itemsResumeItem[]
  • evidenceProjectEvidence[]
Populated by: getRepoStats() in src/artifactminer/RepositoryIntelligence/repo_intelligence_main.py:134-189

user_repo_stats Table

User-specific contribution statistics for each repository. Model: src/artifactminer/db/models.py:83-102
ColumnTypeDescription
idIntegerPrimary key
project_nameStringRepository name
project_pathStringFilesystem path
first_commitDateTimeUser’s first commit
last_commitDateTimeUser’s last commit
total_commitsIntegerUser’s commit count
userStatspercentagesFloatUser’s contribution %
commitFrequencyFloatCommits per week
activity_breakdownJSONActivity classification (code, test, docs, config, design)
user_roleStringUser’s role (e.g., “Lead Developer”)
created_atDateTimeWhen record was created
Populated by: getUserRepoStats() in src/artifactminer/RepositoryIntelligence/repo_intelligence_user.py:32-73 Activity Breakdown Format (JSON):
{
  "code": {"commits": 10, "lines_added": 500, "percentage": 60},
  "test": {"commits": 3, "lines_added": 150, "percentage": 18},
  "docs": {"commits": 2, "lines_added": 100, "percentage": 12},
  "config": {"commits": 1, "lines_added": 50, "percentage": 6},
  "design": {"commits": 1, "lines_added": 33, "percentage": 4}
}

Skills

skills Table

Global catalog of all detected skills. Model: src/artifactminer/db/models.py:172-184
ColumnTypeDescription
idIntegerPrimary key
nameStringSkill name (unique)
categoryStringSkill category (languages, frameworks, tools, etc.)
created_atDateTimeWhen skill was first detected
Relationships:
  • project_skillsProjectSkill[]
  • user_project_skillsUserProjectSkill[]
Example skills: Python, React, Docker, REST API Design, Test-Driven Development

project_skills Table

Links skills to projects with proficiency scores and evidence. Model: src/artifactminer/db/models.py:187-206
ColumnTypeDescription
idIntegerPrimary key
repo_stat_idIntegerForeign key to repo_stats.id
skill_idIntegerForeign key to skills.id
weightFloatSkill weight/importance
proficiencyFloatProficiency level (0.0-1.0)
evidenceJSONEvidence snippets supporting skill
created_atDateTimeWhen link was created
Constraints: Unique constraint on (repo_stat_id, skill_id) Relationships:
  • repo_statRepoStat
  • skillSkill

user_project_skills Table

User-scoped skills for collaborative repositories. Model: src/artifactminer/db/models.py:208-228
ColumnTypeDescription
idIntegerPrimary key
repo_stat_idIntegerForeign key to repo_stats.id
skill_idIntegerForeign key to skills.id
user_emailStringUser’s email (for attribution)
proficiencyFloatUser’s proficiency (0.0-1.0)
evidenceJSONUser-specific evidence
created_atDateTimeWhen link was created
Constraints: Unique constraint on (repo_stat_id, skill_id, user_email) Relationships:
  • repo_statRepoStat
  • skillSkill
Purpose: In collaborative repos, skills are attributed only to the user who demonstrated them.

Evidence

project_evidence Table

Structured evidence items supporting project skills and insights. Model: src/artifactminer/db/models.py:262-276
ColumnTypeDescription
idIntegerPrimary key
repo_stat_idIntegerForeign key to repo_stats.id
typeStringEvidence type (metric, testing, documentation, evaluation, etc.)
contentTextEvidence description
sourceStringSource of evidence (git_stats, repo_quality_signals, etc.)
dateDateDate associated with evidence
created_atDateTimeWhen evidence was created
Relationships:
  • repo_statRepoStat
Evidence Types:
  • metric - Quantitative metrics (commit count, contribution %, etc.)
  • testing - Test coverage and frameworks
  • documentation - Documentation presence
  • code_quality - Linting, type checking, etc.
  • evaluation - Derived insights
Example Evidence Items:
EvidenceItem(type="metric", content="Contributed 65.3% of repository commits", source="git_stats")
EvidenceItem(type="testing", content="Has 42 test files (pytest)", source="repo_quality_signals")
EvidenceItem(type="evaluation", content="API design and architecture: Clean API design with validation and DI shows architectural maturity.", source="insight")
Populated by: Evidence orchestrator in src/artifactminer/evidence/orchestrator.py:27-105

Resume & Portfolio

resume_items Table

Generated resume bullet points for projects. Model: src/artifactminer/db/models.py:231-246
ColumnTypeDescription
idIntegerPrimary key
titleStringItem title
contentTextResume bullet point
categoryStringCategory (experience, projects, etc.)
repo_stat_idIntegerForeign key to repo_stats.id
created_atDateTimeWhen item was created
Relationships:
  • repo_statRepoStat

representation_prefs Table

Stores representation preferences per portfolio. Model: src/artifactminer/db/models.py:279-290
ColumnTypeDescription
portfolio_idStringPrimary key (UUID)
prefs_jsonTextJSON preferences
updated_atDateTimeLast update timestamp
Purpose: Allows users to customize how each portfolio is represented (which projects to highlight, formatting preferences, etc.).

AI Summaries

user_intelligence_summaries Table

LLM-generated summaries of user contributions. Model: src/artifactminer/db/models.py:104-113
ColumnTypeDescription
idIntegerPrimary key
repo_pathStringRepository path
user_emailStringUser’s email
summary_textTextGenerated summary
generated_atDateTimeWhen summary was created
Populated by: generate_summaries_for_ranked() in src/artifactminer/RepositoryIntelligence/repo_intelligence_user.py:210-330 Consent Gated: Only populated when user grants LLM consent.

Legacy Tables

artifacts Table

Basic artifact tracking (legacy, may be deprecated). Model: src/artifactminer/db/models.py:8-19
ColumnTypeDescription
idIntegerPrimary key
nameStringArtifact name
pathStringFilesystem path (unique)
typeString”file” or “directory”
scanned_atDateTimeScan timestamp

exports Table

Export job tracking. Model: src/artifactminer/db/models.py:249-259
ColumnTypeDescription
idIntegerPrimary key
export_typeStringType of export
pathStringOutput path
statusString”pending”, “completed”, “failed”
created_atDateTimeCreation timestamp
completed_atDateTimeCompletion timestamp

Entity Relationship Diagram

Database Access Patterns

Session Management

from artifactminer.db.database import SessionLocal, get_db

# Direct session usage
db = SessionLocal()
try:
    # queries here
    db.commit()
finally:
    db.close()

# FastAPI dependency injection
from fastapi import Depends
from sqlalchemy.orm import Session

@app.get("/endpoint")
def endpoint(db: Session = Depends(get_db)):
    # db is automatically managed
    pass

Common Query Patterns

Get user email:
email_answer = db.query(UserAnswer).filter(UserAnswer.question_id == 1).first()
user_email = email_answer.answer_text if email_answer else None
Get top-ranked projects:
top_projects = db.query(RepoStat).order_by(RepoStat.ranking_score.desc()).limit(5).all()
Get skills for a project:
project_skills = (
    db.query(Skill)
    .join(ProjectSkill)
    .filter(ProjectSkill.repo_stat_id == project_id)
    .all()
)
Get evidence for a project:
evidence_items = (
    db.query(ProjectEvidence)
    .filter(ProjectEvidence.repo_stat_id == project_id)
    .order_by(ProjectEvidence.created_at.desc())
    .all()
)

Best Practices

  1. Always use migrations: Never manually edit artifactminer.db
  2. Use relationships: Leverage SQLAlchemy relationships for joins
  3. Avoid N+1 queries: Use joinedload() or selectinload() for eager loading
  4. Respect unique constraints: Check before inserting skills or project_skills
  5. Use UTC timestamps: All datetime columns use UTC (via datetime.now(UTC).replace(tzinfo=None))
  6. Soft deletes: Use deleted_at instead of hard deletes for repos
  7. JSON validation: Validate JSON structure before persisting to JSON columns

Build docs developers (and LLMs) love