Skip to main content

Overview

The Coding Practice feature provides LeetCode and StrataScratch-style Data Science coding challenges. Practice SQL and Pandas problems with realistic datasets, get instant AI evaluation, and learn from model answers in both languages.

Getting Started

1

Select Difficulty Level

Choose from Easy, Medium, or Hard problems. Start with Easy to build fundamentals before tackling harder challenges.
2

Pick Question Count

Generate 1-5 problems per practice session. For focused learning, stick to 2-3 problems at a time.
3

Read Problem Statement

Each problem includes:
  • Clear business problem description
  • Table schemas with column names and types
  • Sample data (4-5 rows per table)
  • Expected output columns
4

Choose Your Language

Solve using either SQL or Python (Pandas). You can switch languages and compare approaches.
5

Submit Solution

Write your code in the editor and submit. The AI evaluates against multiple test cases and provides detailed feedback.

Question Generation

Problems are dynamically generated by AI to ensure variety:
# Source: backend/coding_engine.py:32
generate_coding_questions(question_count=3, difficulty="medium")

What Makes Questions Realistic

Based on Real Interview Platforms:
  • Mirrors LeetCode Database problems
  • Inspired by StrataScratch data science challenges
  • Reflects actual technical screening questions from top tech companies
Problem Structure:
{
  "id": "q1",
  "title": "Find Top Salaries by Department",
  "difficulty": "medium",
  "description": "Write a query to find the top 3 highest paid employees in each department. Return department_name, employee_name, and salary.",
  "tables": [
    {
      "name": "employees",
      "schema": "id INT, name VARCHAR, salary INT, department_id INT",
      "sample_data": "[{'id':1, 'name':'Alice', 'salary':100000, 'department_id':1}, ...]"
    },
    {
      "name": "departments",
      "schema": "id INT, name VARCHAR",
      "sample_data": "[{'id':1, 'name':'Engineering'}, ...]"
    }
  ],
  "expected_output_columns": ["department_name", "employee_name", "salary"]
}

Concepts Tested

SQL Skills:
  • Window Functions: ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE()
  • Joins: INNER, LEFT, RIGHT, FULL OUTER, CROSS, SELF
  • Aggregations: GROUP BY, HAVING, COUNT(), SUM(), AVG(), MAX(), MIN()
  • CTEs: WITH clauses for complex multi-step queries
  • Subqueries: Correlated and non-correlated
  • String/Date Functions: SUBSTRING(), CONCAT(), DATE_DIFF(), etc.
Pandas Skills:
  • Data Manipulation: groupby(), merge(), apply(), transform()
  • Filtering: loc[], iloc[], boolean indexing, query()
  • Aggregation: agg(), pivot_table(), crosstab()
  • Window Operations: rolling(), expanding(), rank()
  • String Operations: str.contains(), str.split(), str.replace()
  • DateTime: to_datetime(), dt accessor methods
Problems are designed to be solvable in both SQL and Pandas. This helps you understand how database operations translate to DataFrame manipulations.

Difficulty Levels

Easy Problems

Concepts: Basic SELECT, simple WHERE clauses, single aggregations Example: “Find all employees with salary > 50000. Return name and salary.” SQL: Simple SELECT with WHERE Pandas: df[df['salary'] > 50000][['name', 'salary']]

Medium Problems

Concepts: JOINs, GROUP BY with HAVING, window functions, CTEs Example: “Find departments with average salary > 75000. Return department name and average salary, sorted by avg salary descending.” SQL: JOIN + GROUP BY + HAVING + ORDER BY Pandas: merge() + groupby() + filter() + sort_values()

Hard Problems

Concepts: Complex window functions, self-joins, multiple CTEs, advanced aggregations Example: “Find employees whose salary is in the top 10% within their department. For each, show their salary percentile rank.” SQL: Window functions with PERCENT_RANK(), partitioning, CTEs Pandas: groupby() + transform() + quantile() + complex filtering

AI Evaluation System

Your code is evaluated by Mistral AI against comprehensive test cases:
# Source: backend/coding_engine.py:92
evaluate_coding_answer(question, user_code, language="sql")

Evaluation Criteria

1. Correctness (60%)
  • Does the query return the correct results?
  • Are edge cases handled (nulls, duplicates, empty results)?
  • Is the output structure correct (column names, types)?
2. Code Quality (20%)
  • Is the solution efficient?
  • Are best practices followed (proper JOINs vs subqueries, avoiding SELECT *)?
  • Is the code readable and well-structured?
3. Edge Case Handling (20%)
  • Null values handled appropriately?
  • Duplicate records managed correctly (DISTINCT, RANK vs DENSE_RANK)?
  • Empty table scenarios considered?

Test Cases

Each submission is tested against:
[
  {
    "description": "Basic exact match",
    "passed": true/false
  },
  {
    "description": "Handles null/empty values",
    "passed": true/false
  },
  {
    "description": "Handles duplicate/tied records appropriately",
    "passed": true/false
  }
]
The AI simulates execution against standard edge cases. Even if your code “looks right,” you might fail tests for missing DISTINCT, using RANK instead of DENSE_RANK, or not handling nulls.

Scoring

  • Passed: All test cases passed (score 85-100)
  • Partial: Some test cases passed (score 40-84)
  • Failed: Most/all test cases failed (score 0-39)

Detailed Feedback

Every submission receives:

1. Pass/Fail Status

Clear verdict: ”✅ Passed all test cases” or ”❌ Failed 2/3 test cases”

2. Specific Feedback (2-3 sentences)

Not generic - tells you EXACTLY what’s wrong: Good Feedback Examples:
  • “You forgot to handle ties by using DENSE_RANK() instead of RANK(). This causes gaps in ranking when salaries are equal.”
  • “Your Pandas solution doesn’t handle null department_ids. Use dropna() or fillna() before the merge.”
  • “The query is correct but inefficient. Use a window function instead of a correlated subquery for better performance.”
What You Won’t See:
  • Generic: “Try harder”
  • Vague: “Something is wrong with your query”
  • Unhelpful: “Read the problem again”

3. Test Case Breakdown

See which specific cases passed/failed:
✅ Basic exact match: Passed
❌ Handles null/empty values: Failed
✅ Handles duplicate/tied records appropriately: Passed
This pinpoints where your logic fails.

4. Model Answers (Both Languages!)

Regardless of which language you used, you get optimal solutions in BOTH SQL and Pandas: SQL Model Answer:
-- Find top 3 salaries per department
WITH RankedSalaries AS (
  SELECT 
    d.name AS department_name,
    e.name AS employee_name,
    e.salary,
    DENSE_RANK() OVER (PARTITION BY e.department_id ORDER BY e.salary DESC) AS rank
  FROM employees e
  JOIN departments d ON e.department_id = d.id
)
SELECT department_name, employee_name, salary
FROM RankedSalaries
WHERE rank <= 3
ORDER BY department_name, salary DESC;
Pandas Model Answer:
# Find top 3 salaries per department
import pandas as pd

# Merge employees with departments
df = employees.merge(departments, left_on='department_id', right_on='id', how='inner')

# Rank within each department
df['rank'] = df.groupby('department_id')['salary'].rank(method='dense', ascending=False)

# Filter top 3 and select columns
result = df[df['rank'] <= 3][['name_y', 'name_x', 'salary']]
result.columns = ['department_name', 'employee_name', 'salary']
result = result.sort_values(['department_name', 'salary'], ascending=[True, False])
Studying both solutions helps you understand the conceptual mapping between SQL operations and Pandas methods. This is crucial for data engineering roles.

Example Workflow

1

Generate Problems

Request 3 medium difficulty problems. System generates:
  1. “Most Profitable Companies” (aggregation + ORDER BY)
  2. “Employee Salary Ranking” (window functions)
  3. “Department Average Above Company Average” (subqueries + JOIN)
2

Attempt Problem 1 (SQL)

Problem: Find 3 most profitable companiesYour SQL:
SELECT company, profits 
FROM forbes_global 
ORDER BY profits DESC 
LIMIT 3;
Result: ✅ Passed (Score: 92/100)Feedback: “Correct solution! Minor note: consider using TOP(3) for SQL Server compatibility instead of LIMIT.”
3

Attempt Problem 2 (SQL)

Problem: Rank employees by salary within each departmentYour SQL (first attempt):
SELECT name, salary, RANK() OVER (PARTITION BY department_id ORDER BY salary DESC)
FROM employees;
Result: ❌ Failed 1/3 tests (Score: 58/100)Feedback: “You used RANK() which creates gaps for ties (1, 2, 2, 4). The problem expects DENSE_RANK() for consecutive ranking (1, 2, 2, 3).”
4

Revise and Resubmit

Your SQL (revised):
SELECT 
  d.name AS department_name,
  e.name AS employee_name, 
  e.salary,
  DENSE_RANK() OVER (PARTITION BY e.department_id ORDER BY e.salary DESC) AS rank
FROM employees e
JOIN departments d ON e.department_id = d.id;
Result: ✅ Passed (Score: 95/100)
5

Compare with Pandas Solution

View the model Pandas answer to see how the same logic translates:
  • JOIN → merge()
  • PARTITION BY → groupby()
  • DENSE_RANK() → rank(method=‘dense’)

Practice Strategies

For SQL Mastery

Beginner Level (Weeks 1-2)
  • Focus on: SELECT, WHERE, ORDER BY, basic JOINs
  • Do 5 easy problems daily
  • Goal: 90%+ pass rate on easy problems
Intermediate Level (Weeks 3-6)
  • Focus on: Window functions, GROUP BY + HAVING, CTEs
  • Do 3 medium problems daily
  • Compare your solution to model answer efficiency
  • Goal: Understand RANK vs DENSE_RANK vs ROW_NUMBER
Advanced Level (Weeks 7+)
  • Focus on: Complex window functions, self-joins, query optimization
  • Do 2 hard problems daily
  • Try solving same problem multiple ways
  • Goal: Write production-quality, optimized queries

For Pandas Mastery

Beginner Level
  • Master: boolean indexing, loc/iloc, basic groupby()
  • Translate easy SQL problems to Pandas
  • Goal: Understand DataFrame as in-memory SQL table
Intermediate Level
  • Master: merge operations, transform(), apply()
  • Practice: multi-level groupby, pivot tables
  • Goal: Solve without looking at SQL first
Advanced Level
  • Master: Custom aggregations, method chaining, performance optimization
  • Challenge: Solve hard SQL problems in Pandas
  • Goal: Know when Pandas is better than SQL (and vice versa)

Cross-Training Strategy

Best Practice: Solve each problem in BOTH languages even if you only need one for your role. This deepens your understanding of data manipulation concepts.
Workflow:
  1. Solve in your primary language (SQL or Pandas)
  2. Get it passing all test cases
  3. Look at model answer in OTHER language
  4. Re-solve using the other language
  5. Compare execution logic
Key Mappings to Learn:
  • SELECT cols FROM table WHERE conditiondf[condition][cols]
  • GROUP BY col HAVING aggregate > Xdf.groupby('col').filter(lambda x: x.agg > X)
  • JOIN table2 ON table1.id = table2.iddf1.merge(df2, on='id')
  • RANK() OVER (PARTITION BY col ORDER BY val)df.groupby('col')['val'].rank()

Common Mistakes

SQL Mistakes

1. RANK vs DENSE_RANK
-- Wrong for "top 3" problems with ties
RANK() OVER (ORDER BY salary DESC) -- produces 1,2,2,4,5...

-- Correct
DENSE_RANK() OVER (ORDER BY salary DESC) -- produces 1,2,2,3,4...
2. Forgetting NULL Handling
-- Fails when department_id is NULL
JOIN departments d ON e.department_id = d.id

-- Better
LEFT JOIN departments d ON e.department_id = d.id
WHERE e.department_id IS NOT NULL
3. Inefficient Correlated Subqueries
-- Slow: runs subquery for every row
SELECT name, (SELECT AVG(salary) FROM employees e2 WHERE e2.dept = e1.dept)
FROM employees e1;

-- Fast: window function
SELECT name, AVG(salary) OVER (PARTITION BY dept)
FROM employees;

Pandas Mistakes

1. Chained Indexing (SettingWithCopyWarning)
# Wrong: may not work as expected
df[df['salary'] > 50000]['bonus'] = 1000

# Correct
df.loc[df['salary'] > 50000, 'bonus'] = 1000
2. Not Resetting Index After Groupby
# Returns Series with multi-index
df.groupby('dept')['salary'].mean()

# Returns DataFrame with regular index
df.groupby('dept')['salary'].mean().reset_index()
3. Forgetting to Sort Before Ranking
# Rank values without sorting first (rank != position)
df['rank'] = df['salary'].rank(ascending=False)

# For leaderboard, you need both rank AND sort
df['rank'] = df['salary'].rank(method='dense', ascending=False)
df = df.sort_values('salary', ascending=False)

Technical Details

AI Model

  • Engine: Mistral Large (via Mistral API)
  • Evaluation: Simulated execution with test case generation
  • Timeout: 30 seconds per evaluation
  • Fallback: If API fails, provides basic syntax feedback

Code Execution

Important: The system does NOT execute your code directly. Instead:
  1. AI analyzes your code syntax and logic
  2. Simulates execution against generated test cases
  3. Predicts output based on code understanding
  4. Compares predicted output to expected results
This is safe (no arbitrary code execution) but means:
  • Very novel syntax might confuse the AI
  • Stick to standard SQL/Pandas idioms for best results

Problem Generation

  • Prompt Engineering: Detailed prompts ensure realistic schemas and sample data
  • Validation: Generated problems checked for valid JSON structure
  • Fallback Problems: Pre-written problems used if generation fails
  • Variety: Session seed ensures different problems each time

Performance

  • Question generation: ~3-5 seconds for 3 problems
  • Code evaluation: ~2-4 seconds per submission
  • Model answer generation: Included in evaluation response

Build docs developers (and LLMs) love