Coding Practice

Overview

The Coding Practice feature provides LeetCode and StrataScratch-style Data Science coding challenges. Practice SQL and Pandas problems with realistic datasets, get instant AI evaluation, and learn from model answers in both languages.

Getting Started

Select Difficulty Level

Choose from Easy, Medium, or Hard problems. Start with Easy to build fundamentals before tackling harder challenges.

Pick Question Count

Generate 1-5 problems per practice session. For focused learning, stick to 2-3 problems at a time.

Read Problem Statement

Each problem includes:

Clear business problem description
Table schemas with column names and types
Sample data (4-5 rows per table)
Expected output columns

Choose Your Language

Solve using either SQL or Python (Pandas). You can switch languages and compare approaches.

Submit Solution

Write your code in the editor and submit. The AI evaluates against multiple test cases and provides detailed feedback.

Question Generation

Problems are dynamically generated by AI to ensure variety:

# Source: backend/coding_engine.py:32
generate_coding_questions(question_count=3, difficulty="medium")

What Makes Questions Realistic

Based on Real Interview Platforms:

Mirrors LeetCode Database problems
Inspired by StrataScratch data science challenges
Reflects actual technical screening questions from top tech companies

Problem Structure:

{
  "id": "q1",
  "title": "Find Top Salaries by Department",
  "difficulty": "medium",
  "description": "Write a query to find the top 3 highest paid employees in each department. Return department_name, employee_name, and salary.",
  "tables": [
    {
      "name": "employees",
      "schema": "id INT, name VARCHAR, salary INT, department_id INT",
      "sample_data": "[{'id':1, 'name':'Alice', 'salary':100000, 'department_id':1}, ...]"
    },
    {
      "name": "departments",
      "schema": "id INT, name VARCHAR",
      "sample_data": "[{'id':1, 'name':'Engineering'}, ...]"
    }
  ],
  "expected_output_columns": ["department_name", "employee_name", "salary"]
}

Concepts Tested

SQL Skills:

Window Functions: ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE()
Joins: INNER, LEFT, RIGHT, FULL OUTER, CROSS, SELF
Aggregations: GROUP BY, HAVING, COUNT(), SUM(), AVG(), MAX(), MIN()
CTEs: WITH clauses for complex multi-step queries
Subqueries: Correlated and non-correlated
String/Date Functions: SUBSTRING(), CONCAT(), DATE_DIFF(), etc.

Pandas Skills:

Data Manipulation: groupby(), merge(), apply(), transform()
Filtering: loc[], iloc[], boolean indexing, query()
Aggregation: agg(), pivot_table(), crosstab()
Window Operations: rolling(), expanding(), rank()
String Operations: str.contains(), str.split(), str.replace()
DateTime: to_datetime(), dt accessor methods

Problems are designed to be solvable in both SQL and Pandas. This helps you understand how database operations translate to DataFrame manipulations.

Difficulty Levels

Easy Problems

Concepts: Basic SELECT, simple WHERE clauses, single aggregations Example: “Find all employees with salary > 50000. Return name and salary.” SQL: Simple SELECT with WHERE Pandas: df[df['salary'] > 50000][['name', 'salary']]

Medium Problems

Concepts: JOINs, GROUP BY with HAVING, window functions, CTEs Example: “Find departments with average salary > 75000. Return department name and average salary, sorted by avg salary descending.” SQL: JOIN + GROUP BY + HAVING + ORDER BY Pandas: merge() + groupby() + filter() + sort_values()

Hard Problems

Concepts: Complex window functions, self-joins, multiple CTEs, advanced aggregations Example: “Find employees whose salary is in the top 10% within their department. For each, show their salary percentile rank.” SQL: Window functions with PERCENT_RANK(), partitioning, CTEs Pandas: groupby() + transform() + quantile() + complex filtering

AI Evaluation System

Your code is evaluated by Mistral AI against comprehensive test cases:

# Source: backend/coding_engine.py:92
evaluate_coding_answer(question, user_code, language="sql")

Evaluation Criteria

1. Correctness (60%)

Does the query return the correct results?
Are edge cases handled (nulls, duplicates, empty results)?
Is the output structure correct (column names, types)?

2. Code Quality (20%)

Is the solution efficient?
Are best practices followed (proper JOINs vs subqueries, avoiding SELECT *)?
Is the code readable and well-structured?

3. Edge Case Handling (20%)

Null values handled appropriately?
Duplicate records managed correctly (DISTINCT, RANK vs DENSE_RANK)?
Empty table scenarios considered?

Test Cases

Each submission is tested against:

[
  {
    "description": "Basic exact match",
    "passed": true/false
  },
  {
    "description": "Handles null/empty values",
    "passed": true/false
  },
  {
    "description": "Handles duplicate/tied records appropriately",
    "passed": true/false
  }
]

The AI simulates execution against standard edge cases. Even if your code “looks right,” you might fail tests for missing DISTINCT, using RANK instead of DENSE_RANK, or not handling nulls.

Scoring

Passed: All test cases passed (score 85-100)
Partial: Some test cases passed (score 40-84)
Failed: Most/all test cases failed (score 0-39)

Detailed Feedback

Every submission receives:

1. Pass/Fail Status

Clear verdict: ”✅ Passed all test cases” or ”❌ Failed 2/3 test cases”

2. Specific Feedback (2-3 sentences)

Not generic - tells you EXACTLY what’s wrong: Good Feedback Examples:

“You forgot to handle ties by using DENSE_RANK() instead of RANK(). This causes gaps in ranking when salaries are equal.”
“Your Pandas solution doesn’t handle null department_ids. Use dropna() or fillna() before the merge.”
“The query is correct but inefficient. Use a window function instead of a correlated subquery for better performance.”

What You Won’t See:

Generic: “Try harder”
Vague: “Something is wrong with your query”
Unhelpful: “Read the problem again”

3. Test Case Breakdown

See which specific cases passed/failed:

✅ Basic exact match: Passed
❌ Handles null/empty values: Failed
✅ Handles duplicate/tied records appropriately: Passed

This pinpoints where your logic fails.

4. Model Answers (Both Languages!)

Regardless of which language you used, you get optimal solutions in BOTH SQL and Pandas: SQL Model Answer:

-- Find top 3 salaries per department
WITH RankedSalaries AS (
  SELECT 
    d.name AS department_name,
    e.name AS employee_name,
    e.salary,
    DENSE_RANK() OVER (PARTITION BY e.department_id ORDER BY e.salary DESC) AS rank
  FROM employees e
  JOIN departments d ON e.department_id = d.id
)
SELECT department_name, employee_name, salary
FROM RankedSalaries
WHERE rank <= 3
ORDER BY department_name, salary DESC;

Pandas Model Answer:

# Find top 3 salaries per department
import pandas as pd

# Merge employees with departments
df = employees.merge(departments, left_on='department_id', right_on='id', how='inner')

# Rank within each department
df['rank'] = df.groupby('department_id')['salary'].rank(method='dense', ascending=False)

# Filter top 3 and select columns
result = df[df['rank'] <= 3][['name_y', 'name_x', 'salary']]
result.columns = ['department_name', 'employee_name', 'salary']
result = result.sort_values(['department_name', 'salary'], ascending=[True, False])

Studying both solutions helps you understand the conceptual mapping between SQL operations and Pandas methods. This is crucial for data engineering roles.

Example Workflow

Generate Problems

Request 3 medium difficulty problems. System generates:

“Most Profitable Companies” (aggregation + ORDER BY)
“Employee Salary Ranking” (window functions)
“Department Average Above Company Average” (subqueries + JOIN)

Attempt Problem 1 (SQL)

Problem: Find 3 most profitable companiesYour SQL:

SELECT company, profits 
FROM forbes_global 
ORDER BY profits DESC 
LIMIT 3;

Result: ✅ Passed (Score: 92/100)Feedback: “Correct solution! Minor note: consider using TOP(3) for SQL Server compatibility instead of LIMIT.”

Attempt Problem 2 (SQL)

Problem: Rank employees by salary within each departmentYour SQL (first attempt):

SELECT name, salary, RANK() OVER (PARTITION BY department_id ORDER BY salary DESC)
FROM employees;

Result: ❌ Failed 1/3 tests (Score: 58/100)Feedback: “You used RANK() which creates gaps for ties (1, 2, 2, 4). The problem expects DENSE_RANK() for consecutive ranking (1, 2, 2, 3).”

Revise and Resubmit

Your SQL (revised):

SELECT 
  d.name AS department_name,
  e.name AS employee_name, 
  e.salary,
  DENSE_RANK() OVER (PARTITION BY e.department_id ORDER BY e.salary DESC) AS rank
FROM employees e
JOIN departments d ON e.department_id = d.id;

Result: ✅ Passed (Score: 95/100)

Compare with Pandas Solution

View the model Pandas answer to see how the same logic translates:

JOIN → merge()
PARTITION BY → groupby()
DENSE_RANK() → rank(method=‘dense’)

Practice Strategies

For SQL Mastery

Beginner Level (Weeks 1-2)

Focus on: SELECT, WHERE, ORDER BY, basic JOINs
Do 5 easy problems daily
Goal: 90%+ pass rate on easy problems

Intermediate Level (Weeks 3-6)

Focus on: Window functions, GROUP BY + HAVING, CTEs
Do 3 medium problems daily
Compare your solution to model answer efficiency
Goal: Understand RANK vs DENSE_RANK vs ROW_NUMBER

Advanced Level (Weeks 7+)

Focus on: Complex window functions, self-joins, query optimization
Do 2 hard problems daily
Try solving same problem multiple ways
Goal: Write production-quality, optimized queries

For Pandas Mastery

Beginner Level

Master: boolean indexing, loc/iloc, basic groupby()
Translate easy SQL problems to Pandas
Goal: Understand DataFrame as in-memory SQL table

Intermediate Level

Master: merge operations, transform(), apply()
Practice: multi-level groupby, pivot tables
Goal: Solve without looking at SQL first

Advanced Level

Master: Custom aggregations, method chaining, performance optimization
Challenge: Solve hard SQL problems in Pandas
Goal: Know when Pandas is better than SQL (and vice versa)

Cross-Training Strategy

Best Practice: Solve each problem in BOTH languages even if you only need one for your role. This deepens your understanding of data manipulation concepts.

Workflow:

Solve in your primary language (SQL or Pandas)
Get it passing all test cases
Look at model answer in OTHER language
Re-solve using the other language
Compare execution logic

Key Mappings to Learn:

SELECT cols FROM table WHERE condition ↔ df[condition][cols]
GROUP BY col HAVING aggregate > X ↔ df.groupby('col').filter(lambda x: x.agg > X)
JOIN table2 ON table1.id = table2.id ↔ df1.merge(df2, on='id')
RANK() OVER (PARTITION BY col ORDER BY val) ↔ df.groupby('col')['val'].rank()

Common Mistakes

SQL Mistakes

1. RANK vs DENSE_RANK

-- Wrong for "top 3" problems with ties
RANK() OVER (ORDER BY salary DESC) -- produces 1,2,2,4,5...

-- Correct
DENSE_RANK() OVER (ORDER BY salary DESC) -- produces 1,2,2,3,4...

2. Forgetting NULL Handling

-- Fails when department_id is NULL
JOIN departments d ON e.department_id = d.id

-- Better
LEFT JOIN departments d ON e.department_id = d.id
WHERE e.department_id IS NOT NULL

3. Inefficient Correlated Subqueries

-- Slow: runs subquery for every row
SELECT name, (SELECT AVG(salary) FROM employees e2 WHERE e2.dept = e1.dept)
FROM employees e1;

-- Fast: window function
SELECT name, AVG(salary) OVER (PARTITION BY dept)
FROM employees;

Pandas Mistakes

1. Chained Indexing (SettingWithCopyWarning)

# Wrong: may not work as expected
df[df['salary'] > 50000]['bonus'] = 1000

# Correct
df.loc[df['salary'] > 50000, 'bonus'] = 1000

2. Not Resetting Index After Groupby

# Returns Series with multi-index
df.groupby('dept')['salary'].mean()

# Returns DataFrame with regular index
df.groupby('dept')['salary'].mean().reset_index()

3. Forgetting to Sort Before Ranking

# Rank values without sorting first (rank != position)
df['rank'] = df['salary'].rank(ascending=False)

# For leaderboard, you need both rank AND sort
df['rank'] = df['salary'].rank(method='dense', ascending=False)
df = df.sort_values('salary', ascending=False)

Technical Details

AI Model

Engine: Mistral Large (via Mistral API)
Evaluation: Simulated execution with test case generation
Timeout: 30 seconds per evaluation
Fallback: If API fails, provides basic syntax feedback

Code Execution

Important: The system does NOT execute your code directly. Instead:

AI analyzes your code syntax and logic
Simulates execution against generated test cases
Predicts output based on code understanding
Compares predicted output to expected results

This is safe (no arbitrary code execution) but means:

Very novel syntax might confuse the AI
Stick to standard SQL/Pandas idioms for best results

Problem Generation

Prompt Engineering: Detailed prompts ensure realistic schemas and sample data
Validation: Generated problems checked for valid JSON structure
Fallback Problems: Pre-written problems used if generation fails
Variety: Session seed ensures different problems each time

Performance

Question generation: ~3-5 seconds for 3 problems
Code evaluation: ~2-4 seconds per submission
Model answer generation: Included in evaluation response

Overview

Getting Started

Core Features

User Guide

Overview

Getting Started

Question Generation

What Makes Questions Realistic

Concepts Tested

Difficulty Levels

Easy Problems

Medium Problems

Hard Problems

AI Evaluation System

Evaluation Criteria

Test Cases

Scoring

Detailed Feedback

1. Pass/Fail Status

2. Specific Feedback (2-3 sentences)

3. Test Case Breakdown

4. Model Answers (Both Languages!)

Example Workflow

Practice Strategies

For SQL Mastery

For Pandas Mastery

Cross-Training Strategy

Common Mistakes

SQL Mistakes

Pandas Mistakes

Technical Details

AI Model

Code Execution

Problem Generation

Performance

Build docs developers (and LLMs) love

Overview

Getting Started

Core Features

User Guide

​Overview

​Getting Started

​Question Generation

​What Makes Questions Realistic

​Concepts Tested

​Difficulty Levels

​Easy Problems

​Medium Problems

​Hard Problems

​AI Evaluation System

​Evaluation Criteria

​Test Cases

​Scoring

​Detailed Feedback

​1. Pass/Fail Status

​2. Specific Feedback (2-3 sentences)

​3. Test Case Breakdown

​4. Model Answers (Both Languages!)

​Example Workflow

​Practice Strategies

​For SQL Mastery

​For Pandas Mastery

​Cross-Training Strategy

​Common Mistakes

​SQL Mistakes

​Pandas Mistakes

​Technical Details

​AI Model

​Code Execution

​Problem Generation

​Performance

Build docs developers (and LLMs) love

Overview

Getting Started

Question Generation

What Makes Questions Realistic

Concepts Tested

Difficulty Levels

Easy Problems

Medium Problems

Hard Problems

AI Evaluation System

Evaluation Criteria

Test Cases

Scoring

Detailed Feedback

1. Pass/Fail Status

2. Specific Feedback (2-3 sentences)

3. Test Case Breakdown

4. Model Answers (Both Languages!)

Example Workflow

Practice Strategies

For SQL Mastery

For Pandas Mastery

Cross-Training Strategy

Common Mistakes

SQL Mistakes

Pandas Mistakes

Technical Details

AI Model

Code Execution

Problem Generation

Performance