Portfolio Analysis

Overview

Portfolio Analysis enables you to analyze collections of git repositories from multiple ZIP uploads, intelligently discover projects, rank them by contribution significance, and generate comprehensive portfolio views. The system links multiple ZIPs through a portfolio_id and provides flexible filtering and ordering controls.

Multi-ZIP Workflow

Upload and Link ZIPs

Portfolio analysis supports incremental uploads where multiple ZIP files contribute to a single portfolio session:

# Upload first ZIP (generates portfolio_id automatically)
POST /zip
Content-Type: multipart/form-data

file: my-projects-2024.zip
portfolio_id: "optional-custom-id"

Response:

{
  "zip_id": 1,
  "filename": "my-projects-2024.zip",
  "portfolio_id": "550e8400-e29b-41d4-a716-446655440000"
}

Portfolio ID Linkage

The portfolio_id serves as the linking key across multiple uploads:

Auto-generated: If not provided, a UUID is generated automatically
Custom: Provide your own ID to group related uploads
Persistent: All ZIPs with the same portfolio_id are analyzed together

# Database schema (UploadedZip table)
portfolio_id = Column(String, nullable=True, index=True)

Analyze Portfolio Projects

Once ZIPs are uploaded, trigger analysis to extract project data:

POST /analyze/{zip_id}

The analysis pipeline:

Extracts ZIP to ./extracted/{zip_id}/
Discovers git repositories recursively
Analyzes each repository (stats, skills, insights)
Stores extraction path in database for portfolio grouping

From src/artifactminer/api/analyze.py:351-354:

extraction_path = extract_zip_to_persistent_location(uploaded_zip.path, zip_id)

# Update the UploadedZip record with extraction path
uploaded_zip.extraction_path = str(extraction_path)

Repository Discovery

Automatic Discovery

The system recursively scans extraction directories to find git repositories:

def discover_git_repos(base_path: Path) -> List[Path]:
    """Recursively find all directories containing a .git folder."""
    git_repos = []
    if base_path.is_dir() and isGitRepo(base_path):
        git_repos.append(base_path)
    
    # Walk through all directories
    for path in base_path.rglob("*"):
        if path.is_dir() and isGitRepo(path):
            # Avoid nested .git directories
            is_nested = any(
                isGitRepo(parent)
                for parent in path.parents
                if parent != base_path and parent.is_relative_to(base_path)
            )
            if not is_nested:
                git_repos.append(path)
    
    return git_repos

Multi-Path Discovery

For portfolios with multiple ZIPs, the system discovers repositories across all extraction paths:

def discover_git_repos_from_multiple_paths(base_paths: List[Path]) -> List[Path]:
    """Discover git repositories across multiple extraction paths.
    
    Deduplicates repositories by name to avoid analyzing the same project twice.
    """
    all_repos = []
    seen_repo_names = set()
    
    for base_path in base_paths:
        repos = discover_git_repos(base_path)
        for repo in repos:
            if repo.name not in seen_repo_names:
                all_repos.append(repo)
                seen_repo_names.add(repo.name)
            else:
                print(f"[analyze] Skipping duplicate repo: {repo.name}")
    
    return all_repos

Selective Directory Analysis

Optionally specify which directories to analyze within a ZIP:

POST /analyze/{zip_id}
Content-Type: application/json

{
  "directories": ["work-projects", "open-source"]
}

The system resolves directory names relative to the extraction root:

def resolve_selected_dirs(extraction_path: Path, directories: list[str]) -> list[Path]:
    """Resolve user-selected directories into valid paths under extraction root."""
    extraction_root = extraction_path.resolve()
    base_paths: list[Path] = []
    
    for raw in directories:
        candidate = extraction_root / Path(raw)
        if candidate.exists() and candidate.is_dir():
            base_paths.append(candidate)
    
    return base_paths

Project Ranking

Ranking Algorithm

Projects are ranked based on user contribution significance:

def rank_projects(extraction_path: str, user_email: str) -> List[Dict]:
    """Rank projects by user contribution percentage and commit volume."""
    rankings = []
    for repo in discover_repos(extraction_path):
        user_stats = getUserRepoStats(repo, user_email)
        score = calculate_ranking_score(
            user_commits=user_stats.total_commits,
            total_commits=repo.total_commits,
            contribution_pct=user_stats.userStatspercentages
        )
        rankings.append({
            "name": repo.project_name,
            "score": score,
            "total_commits": repo.total_commits,
            "user_commits": user_stats.total_commits
        })
    return sorted(rankings, key=lambda x: x["score"], reverse=True)

Ranking Persistence

Ranking scores are stored in the RepoStat table:

# From analyze.py:536-538
repo_stat.ranking_score = rank_info["score"]
repo_stat.ranked_at = datetime.now(UTC).replace(tzinfo=None)
db.commit()

Database Schema:

ranking_score = Column(Float, nullable=True)
ranked_at = Column(DateTime, nullable=True)

Default Project Sorting

Projects are sorted by ranking score, then by recency:

def _project_sort_key(project: RepoStat) -> tuple:
    ts = project.last_commit.timestamp() if project.last_commit else 0.0
    return (
        project.ranking_score is None,  # Unranked projects last
        -(project.ranking_score or 0.0),  # Higher scores first
        project.last_commit is None,  # Projects without commits last
        -ts,  # More recent first
        project.id  # Tie-breaker
    )

Generate Portfolio View

Basic Generation

Generate a complete portfolio view from analyzed projects:

POST /portfolio/generate
Content-Type: application/json

{
  "portfolio_id": "550e8400-e29b-41d4-a716-446655440000"
}

Response:

{
  "success": true,
  "portfolio_id": "550e8400-e29b-41d4-a716-446655440000",
  "consent_level": "none",
  "generated_at": "2026-03-05T12:34:56",
  "preferences": {
    "showcase_project_ids": [],
    "project_order": [],
    "hidden_skills": [],
    "highlighted_skills": []
  },
  "projects": [
    {
      "id": 1,
      "project_name": "artifact-miner",
      "project_path": "./.extracted/1/artifact-miner",
      "languages": ["Python", "JavaScript"],
      "frameworks": ["FastAPI", "React"],
      "first_commit": "2024-01-15T10:00:00",
      "last_commit": "2024-12-20T18:30:00",
      "ranking_score": 0.87,
      "health_score": 92.5,
      "thumbnail_url": "/uploads/thumbnails/project-1.png",
      "user_role": "Lead Developer",
      "evidence": [
        {
          "id": 1,
          "type": "evaluation",
          "content": "API design and architecture: Clean API design with validation and DI shows architectural maturity.",
          "source": "FastAPI decorators; Pydantic models; Dependency injection",
          "date": "2024-12-20"
        }
      ]
    }
  ],
  "resume_items": [],
  "summaries": [],
  "skills_chronology": [
    {
      "date": "2024-01-15T10:00:00",
      "skill": "FastAPI",
      "project": "artifact-miner",
      "proficiency": 0.85,
      "category": "Web Frameworks"
    }
  ],
  "errors": []
}

Portfolio Path Filtering

The system filters projects by extraction path prefixes:

# From portfolio.py:195-208
extraction_prefixes = sorted({
    z.extraction_path.rstrip("/")
    for z in zips
    if z.extraction_path and z.extraction_path.strip()
})

projects = (
    db.query(RepoStat)
    .filter(RepoStat.deleted_at.is_(None))
    .filter(_build_path_boundary_filter(RepoStat.project_path, extraction_prefixes))
    .all()
)

def _build_path_boundary_filter(column, paths: list[str]):
    return or_(*[or_(column == path, column.like(f"{path}/%")) for path in paths])

Representation Preferences

Showcase Project Selection

Limit portfolio to specific showcase projects:

POST /portfolio/{portfolio_id}/edit
Content-Type: application/json

{
  "showcase_project_ids": [1, 3, 5]
}

Projects are matched by ID or name:

def _project_tokens(project: RepoStat) -> set[str]:
    return {str(project.id), project.project_name}

showcase = set(_normalize_tokens(prefs.showcase_project_ids))
matching = [p for p in ordered if _project_tokens(p) & showcase]

Custom Project Ordering

Override default ranking with manual order:

POST /portfolio/{portfolio_id}/edit
Content-Type: application/json

{
  "project_order": ["my-app", "side-project", "open-source-contrib"]
}

Ordering logic:

order_index = {token: idx for idx, token in enumerate(order_tokens)}

return sorted(
    ordered,
    key=lambda project: (
        min(
            (
                order_index[token]
                for token in _project_tokens(project)
                if token in order_index
            ),
            default=len(order_index) + 1,
        ),
        _project_sort_key(project),
    ),
)

Skill Highlighting and Filtering

POST /portfolio/{portfolio_id}/edit
Content-Type: application/json

{
  "highlighted_skills": ["FastAPI", "React", "PostgreSQL"],
  "hidden_skills": ["Basic HTML/CSS"]
}

Skill Extraction

Learn how skills are detected and tracked across projects

Evidence Tracking

Understand how project evidence is extracted and stored

Resume Generation

Generate resume items from portfolio analysis

API Reference

Complete API documentation for portfolio endpoints

Error Handling

Portfolio Not Found

{
  "status_code": 404,
  "detail": "Portfolio not found."
}

No Analyzed Projects

{
  "status_code": 400,
  "detail": "Portfolio has no analyzed ZIPs yet. Run /analyze/{zip_id} for uploaded ZIPs first."
}

Preference Validation Errors

Errors are non-blocking and returned in the response:

{
  "success": true,
  "errors": [
    "No projects matched showcase_project_ids; returning all projects.",
    "project_order references unknown projects: ['deleted-project']"
  ]
}

Get Started

Core Features

User Interfaces

Guides

Overview

Multi-ZIP Workflow

Upload and Link ZIPs

Portfolio ID Linkage

Analyze Portfolio Projects

Repository Discovery

Automatic Discovery

Multi-Path Discovery

Selective Directory Analysis

Project Ranking

Ranking Algorithm

Ranking Persistence

Default Project Sorting

Generate Portfolio View

Basic Generation

Portfolio Path Filtering

Representation Preferences

Showcase Project Selection

Custom Project Ordering

Skill Highlighting and Filtering

Skill Extraction

Evidence Tracking

Resume Generation

API Reference

Error Handling

Portfolio Not Found

No Analyzed Projects

Preference Validation Errors

Build docs developers (and LLMs) love

Get Started

Core Features

User Interfaces

Guides

​Overview

​Multi-ZIP Workflow

​Upload and Link ZIPs

​Portfolio ID Linkage

​Analyze Portfolio Projects

​Repository Discovery

​Automatic Discovery

​Multi-Path Discovery

​Selective Directory Analysis

​Project Ranking

​Ranking Algorithm

​Ranking Persistence

​Default Project Sorting

​Generate Portfolio View

​Basic Generation

​Portfolio Path Filtering

​Representation Preferences

​Showcase Project Selection

​Custom Project Ordering

​Skill Highlighting and Filtering

​Related Features

Skill Extraction

Evidence Tracking

Resume Generation

API Reference

​Error Handling

​Portfolio Not Found

​No Analyzed Projects

​Preference Validation Errors

Build docs developers (and LLMs) love

Overview

Multi-ZIP Workflow

Upload and Link ZIPs

Portfolio ID Linkage

Analyze Portfolio Projects

Repository Discovery

Automatic Discovery

Multi-Path Discovery

Selective Directory Analysis

Project Ranking

Ranking Algorithm

Ranking Persistence

Default Project Sorting

Generate Portfolio View

Basic Generation

Portfolio Path Filtering

Representation Preferences

Showcase Project Selection

Custom Project Ordering

Skill Highlighting and Filtering

Related Features

Error Handling

Portfolio Not Found

No Analyzed Projects

Preference Validation Errors