Query: Who has touched these files most frequently?
UNWIND $paths AS pathMATCH (pr:PullRequest {repo: $repo})-[:TOUCHES]->(f:File {repo: $repo, path: path})MATCH (pr)-[:AUTHORED_BY]->(d:Developer)RETURN d.login AS login, count(*) AS touch_countORDER BY touch_count DESCLIMIT 5
Python wrapper:
# app/services/graph_builder.py:294-324async def get_file_experts( repo_full_name: str, file_paths: list[str], top_k: int = 5,) -> list[dict]: async with get_session() as session: result = await session.run( """ UNWIND $paths AS path MATCH (pr:PullRequest {repo: $repo})-[:TOUCHES]->(f:File {repo: $repo, path: path}) MATCH (pr)-[:AUTHORED_BY]->(d:Developer) RETURN d.login AS login, count(*) AS touch_count ORDER BY touch_count DESC LIMIT $top_k """, repo=repo_full_name, paths=file_paths, top_k=top_k, ) return [{"login": r["login"], "touch_count": r["touch_count"]} async for r in result]
Query: Which PRs touched the same files (structural similarity)?
UNWIND $paths AS pathMATCH (pr:PullRequest {repo: $repo})-[:TOUCHES]->(f:File {repo: $repo, path: path})WHERE ($exclude IS NULL OR pr.number <> $exclude) AND pr.verdict IS NOT NULLWITH pr, count(DISTINCT f) AS overlapORDER BY overlap DESCLIMIT $top_kRETURN pr.number AS number, pr.title AS title, pr.author AS author, pr.verdict AS verdict, overlap
UNWIND $nums AS numOPTIONAL MATCH (i:Issue {repo: $repo, number: num})OPTIONAL MATCH (pr:PullRequest)-[:CLOSES]->(i)RETURN num, i IS NOT NULL AS found, collect(pr.number) AS closed_by
Query: Which files are touched most frequently (high churn)?
MATCH (pr:PullRequest {repo: $repo})-[:TOUCHES]->(f:File)RETURN f.path AS path, f.language AS language, count(pr) AS pr_countORDER BY pr_count DESCLIMIT 10
Query: Which files repeatedly get REQUEST_CHANGES verdict (fragile code)?
MATCH (pr:PullRequest {repo: $repo, verdict: "REQUEST_CHANGES"})-[:TOUCHES]->(f:File)RETURN f.path AS path, f.language AS language, count(pr) AS risk_countORDER BY risk_count DESCLIMIT 8
Query: Which files in the repo have never been touched by any PR?
MATCH (r:Repository {full_name: $repo})-[:CONTAINS]->(f:File)WHERE NOT (f)<-[:TOUCHES]-()RETURN f.path AS path, f.language AS languageORDER BY f.pathLIMIT 12
Query: For each high-churn file, who is the dominant contributor?
MATCH (pr:PullRequest {repo: $repo})-[:AUTHORED_BY]->(d:Developer), (pr)-[:TOUCHES]->(f:File)WITH f.path AS path, d.login AS dev, count(*) AS touchesORDER BY path, touches DESCWITH path, collect({dev: dev, touches: touches})[0] AS top_owner, sum(touches) AS total_touchesWHERE total_touches >= 2RETURN path, top_owner.dev AS owner, top_owner.touches AS owner_touches, total_touchesORDER BY total_touches DESCLIMIT 10
Query: For each developer, which top-level directories do they contribute to most?
MATCH (pr:PullRequest {repo: $repo})-[:AUTHORED_BY]->(d:Developer), (pr)-[:TOUCHES]->(f:File)WITH d.login AS dev, CASE WHEN size(split(f.path, '/')) > 1 THEN split(f.path, '/')[0] ELSE '(root)' END AS directory, count(*) AS touchesORDER BY dev, touches DESCWITH dev, collect({directory: directory, touches: touches})[0..4] AS top_dirs, sum(touches) AS total_touchesRETURN dev, top_dirs, total_touchesORDER BY total_touches DESCLIMIT 8
Nectr creates constraints and indexes on startup for performance:
-- app/core/neo4j_schema.py:10-45-- Unique constraints (also create indexes)CREATE CONSTRAINT repo_full_name_unique IF NOT EXISTSFOR (r:Repository) REQUIRE r.full_name IS UNIQUE;CREATE CONSTRAINT file_composite_unique IF NOT EXISTSFOR (f:File) REQUIRE (f.repo, f.path) IS UNIQUE;CREATE CONSTRAINT pr_composite_unique IF NOT EXISTSFOR (pr:PullRequest) REQUIRE (pr.repo, pr.number) IS UNIQUE;CREATE CONSTRAINT developer_login_unique IF NOT EXISTSFOR (d:Developer) REQUIRE d.login IS UNIQUE;CREATE CONSTRAINT issue_composite_unique IF NOT EXISTSFOR (i:Issue) REQUIRE (i.repo, i.number) IS UNIQUE;-- Indexes for common query patternsCREATE INDEX pr_verdict IF NOT EXISTSFOR (pr:PullRequest) ON (pr.verdict);CREATE INDEX file_language IF NOT EXISTSFOR (f:File) ON (f.language);
Constraints are created idempotently using IF NOT EXISTS so they’re safe to run on every deployment.