Jupyter Notebooks

Overview

HAI Build provides specialized commands for working with Jupyter notebooks in VS Code. Get AI assistance to generate, explain, and improve notebook cells while maintaining the context of your data science or analysis workflow.

Features

Generate Cells

Create new notebook cells with AI-generated code

Explain Cells

Get detailed explanations of what cells do

Improve Cells

Optimize and enhance existing cells

Setup

Ensure you have the necessary extensions:

Install Required Extensions

Make sure you have installed:

HAI Build Code Generator (this extension)
Jupyter Extension for VS Code (by Microsoft)

The Jupyter extension is typically installed automatically when you open a .ipynb file.

Open a Jupyter Notebook

Open any .ipynb file in VS Code. The notebook interface will activate with HAI Build commands available.

Verify HAI Build is Active

Look for HAI Build icons in:

Notebook toolbar (top of notebook)
Individual cell toolbars (when hovering over cells)

Generating Notebook Cells

Create new cells with AI-generated code based on your prompts.

Access Generate Command

Click the Generate Jupyter Cell icon (sparkle icon) in the notebook toolbar at the top.Or use Command Palette:

Press Cmd+Shift+P (Mac) or Ctrl+Shift+P (Windows/Linux)
Type Generate Jupyter Cell with HAI
Press Enter

Enter Your Prompt

A prompt input box appears. Describe what you want the cell to do:

Load the CSV file 'sales_data.csv' and create a pandas DataFrame. 
Show the first 5 rows and basic statistics.

Press Enter to confirm or Esc to cancel.

Review Generated Cell

HAI Build:

Analyzes your notebook context (existing cells, variables, imports)
Generates appropriate code
Inserts a new cell above or below the current cell
Populates it with the generated code

The AI considers your existing notebook context, including imported libraries and defined variables.

Execute and Refine

Run the cell to test the generated code
Request improvements if needed
Iterate by generating additional cells

Generation Examples

Data Loading
Data Visualization
Statistical Analysis

Prompt:

Load the JSON file 'config.json' and extract the database connection settings

Generated Cell:

import json

with open('config.json', 'r') as f:
    config = json.load(f)

db_settings = config.get('database', {})
print(f"Database: {db_settings.get('host')}:{db_settings.get('port')}")

Prompt:

Create a scatter plot of age vs income colored by customer segment

Generated Cell:

import matplotlib.pyplot as plt

plt.figure(figsize=(10, 6))
for segment in df['segment'].unique():
    segment_data = df[df['segment'] == segment]
    plt.scatter(segment_data['age'], segment_data['income'], 
               label=segment, alpha=0.6)

plt.xlabel('Age')
plt.ylabel('Income')
plt.title('Age vs Income by Customer Segment')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

Prompt:

Perform correlation analysis on numerical columns and display a heatmap

Generated Cell:

import seaborn as sns
import matplotlib.pyplot as plt

# Calculate correlation matrix
correlation_matrix = df.select_dtypes(include=[np.number]).corr()

# Create heatmap
plt.figure(figsize=(12, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', 
            center=0, fmt='.2f', linewidths=1)
plt.title('Correlation Heatmap of Numerical Features')
plt.tight_layout()
plt.show()

Explaining Notebook Cells

Get detailed explanations of what existing cells do.

Select a Cell

Click on the cell you want to understand.

Trigger Explain

Click the Explain Jupyter Cell icon (question mark icon) in the cell toolbar.Or use Command Palette:

Press Cmd+Shift+P / Ctrl+Shift+P
Type Explain Jupyter Cell with HAI
Press Enter

View Explanation

The HAI Build chat opens with a detailed explanation including:

What the code does
How it works step-by-step
Purpose of key functions or methods
Expected inputs and outputs
Potential issues or considerations

Explanation Example

Cell Code:

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

AI Explanation:

This cell performs dimensionality reduction using Principal Component Analysis (PCA):

1. **StandardScaler**: Standardizes features by removing the mean and scaling 
   to unit variance. This is crucial for PCA because it's sensitive to the 
   scale of features.

2. **fit_transform**: Fits the scaler on X and transforms it in one step, 
   resulting in X_scaled where each feature has mean=0 and std=1.

3. **PCA(n_components=2)**: Creates a PCA instance that will reduce the 
   dimensionality to 2 components (principal components).

4. **fit_transform on PCA**: Finds the principal components from X_scaled 
   and transforms the data to the new 2-dimensional space.

Result: X_pca contains the data projected onto 2 principal components, 
making it suitable for 2D visualization while preserving maximum variance.

Use Explain when reviewing notebooks from colleagues or learning new techniques.

Improving Notebook Cells

Optimize, enhance, or fix existing cells with AI assistance.

Select Cell to Improve

Click on the cell you want to enhance.

Trigger Improve Command

Click the Improve Jupyter Cell icon (lightbulb icon) in the cell toolbar.Or use Command Palette:

Press Cmd+Shift+P / Ctrl+Shift+P
Type Improve Jupyter Cell with HAI
Press Enter

Describe Improvements

Enter what you want to improve:

Optimize this for better performance with large datasets

Review Improvements

HAI provides improved code in the chat. You can:

Review the suggested changes
Ask for different approaches
Accept and apply to your cell

Improvement Examples

Add Error Handling
Optimize Performance
Enhance Visualization

Original Cell:

df = pd.read_csv('data.csv')
df['date'] = pd.to_datetime(df['date'])
result = df.groupby('category')['sales'].sum()

Improved Cell:

import os

# Check if file exists
if not os.path.exists('data.csv'):
    raise FileNotFoundError("data.csv not found")

try:
    df = pd.read_csv('data.csv')
    
    # Validate required columns
    required_cols = ['date', 'category', 'sales']
    if not all(col in df.columns for col in required_cols):
        raise ValueError(f"Missing required columns: {required_cols}")
    
    # Convert date with error handling
    df['date'] = pd.to_datetime(df['date'], errors='coerce')
    if df['date'].isna().any():
        print(f"Warning: {df['date'].isna().sum()} invalid dates found")
    
    # Group and sum
    result = df.groupby('category')['sales'].sum()
    
except Exception as e:
    print(f"Error processing data: {str(e)}")
    raise

Original Cell:

results = []
for i in range(len(df)):
    row = df.iloc[i]
    results.append(row['value'] * 2 + 10)
df['new_column'] = results

Improved Cell:

# Vectorized operation - much faster for large datasets
df['new_column'] = df['value'] * 2 + 10

Original Cell:

plt.plot(x, y)
plt.show()

Improved Cell:

import matplotlib.pyplot as plt

plt.figure(figsize=(12, 6))
plt.plot(x, y, linewidth=2, color='#2E86AB', label='Data Series')
plt.xlabel('X Axis', fontsize=12, fontweight='bold')
plt.ylabel('Y Axis', fontsize=12, fontweight='bold')
plt.title('Time Series Analysis', fontsize=14, fontweight='bold', pad=20)
plt.grid(True, alpha=0.3, linestyle='--')
plt.legend(loc='best', frameon=True, shadow=True)
plt.tight_layout()
plt.show()

Best Practices for Notebook AI Assistance

Provide Context in Prompts

Help HAI understand your notebook’s context:Good prompts:

✅ “Create a function to preprocess the text column removing special characters”
✅ “Calculate correlation between features X1-X10 and the target variable”
✅ “Generate a confusion matrix for the classifier stored in ‘model’ variable”

Vague prompts:

❌ “Make a chart”
❌ “Process the data”
❌ “Add some analysis”

Maintain Cell Organization

Keep your notebook well-structured:

Imports at top: Generate import cells first
Data loading: Load data before processing
Exploration: Analysis cells in logical order
Modeling: Train/test/evaluate sequentially
Visualization: Display results clearly

This helps HAI understand the notebook flow.

Run Cells to Establish Context

HAI works better when cells are executed:

Execute cells to define variables and imports
HAI can reference executed variables in generations
Kernel state helps determine what’s available

Run your notebook sequentially before generating new cells for best results.

Iterate on Generated Code

Refine generated cells through conversation:

Workflow

Generate initial cell
Run and observe results
Use "Improve" to refine
Repeat until satisfied

Example:

"Generate a bar chart of sales by region"
→ Review generated chart
→ "Make it horizontal and sort by value descending"
→ Review improvements
→ "Add data labels on each bar"
→ Final version

Combine with Regular HAI Chat

For complex notebook tasks:

Use Jupyter commands for quick cell operations
Use HAI chat for multi-cell workflows or complex refactoring
Select multiple cells and add to HAI chat for broader context

Common Notebook Workflows

Data Science Pipeline

Data Loading

Prompt

Load the dataset from 'experiment_data.csv' and display basic info

Exploration

Prompt

Create a summary of missing values and data types for each column

Visualization

Prompt

Generate distribution plots for all numerical features in a grid layout

Preprocessing

Prompt

Create a preprocessing pipeline that handles missing values, 
encodes categorical variables, and scales numerical features

Modeling

Prompt

Train a logistic regression model with cross-validation 
and display accuracy scores

Data Analysis Report

Import Libraries

Prompt

Import pandas, numpy, matplotlib, and seaborn with standard aliases

Load Data

Prompt

Load the quarterly sales data and parse date columns

Calculate Metrics

Prompt

Calculate year-over-year growth rates for each product category

Visualize Trends

Prompt

Create a multi-line chart showing sales trends for top 5 products

Summary Statistics

Prompt

Generate a formatted summary table of key metrics by region

Tips for Effective Jupyter AI Usage

Specify Libraries

Mention preferred libraries in your prompts:“Use seaborn to create…”“With scikit-learn, implement…”

Request Comments

Ask for documented code:“Add comments explaining each step”“Include docstrings”

Define Variables

Reference existing variables:“Using the ‘df’ DataFrame…”“Apply this to ‘X_train’ and ‘y_train’”

Set Expectations

Be clear about outputs:“Return a pandas Series”“Display results in a table format”

Troubleshooting

Commands Not Visible
Context Errors
Import Issues

Issue: HAI Build icons don’t appear in notebookSolutions:

Ensure HAI Build extension is installed and enabled
Verify Jupyter extension is active
Reload VS Code window: Cmd+Shift+P → “Reload Window”
Check notebook type is jupyter-notebook

Next Steps

Code Generation

Learn more about AI-powered code generation

Task Execution

Execute larger notebook development tasks

Settings

Configure LLM providers for notebook assistance

MCP Integration

Connect to data sources via Model Context Protocol

Data Science Tip: Use HAI Build to quickly prototype analysis workflows, then refine the code for production use. The AI excels at generating exploratory code and visualizations.

Get Started

Core Features

Configuration

Usage Guides

Advanced

Overview

Features

Generate Cells

Explain Cells

Improve Cells

Setup

Generating Notebook Cells

Generation Examples

Explaining Notebook Cells

Explanation Example

Improving Notebook Cells

Improvement Examples

Best Practices for Notebook AI Assistance

Common Notebook Workflows

Data Science Pipeline

Data Analysis Report

Tips for Effective Jupyter AI Usage

Specify Libraries

Request Comments

Define Variables

Set Expectations

Troubleshooting

Next Steps

Code Generation

Task Execution

Settings

MCP Integration

Build docs developers (and LLMs) love

Get Started

Core Features

Configuration

Usage Guides

Advanced

​Overview

​Features

Generate Cells

Explain Cells

Improve Cells

​Setup

​Generating Notebook Cells

​Generation Examples

​Explaining Notebook Cells

​Explanation Example

​Improving Notebook Cells

​Improvement Examples

​Best Practices for Notebook AI Assistance

​Common Notebook Workflows

​Data Science Pipeline

​Data Analysis Report

​Tips for Effective Jupyter AI Usage

Specify Libraries

Request Comments

Define Variables

Set Expectations

​Troubleshooting

​Next Steps

Code Generation

Task Execution

Settings

MCP Integration

Build docs developers (and LLMs) love

Overview

Features

Setup

Generating Notebook Cells

Generation Examples

Explaining Notebook Cells

Explanation Example

Improving Notebook Cells

Improvement Examples

Best Practices for Notebook AI Assistance

Common Notebook Workflows

Data Science Pipeline

Data Analysis Report

Tips for Effective Jupyter AI Usage

Troubleshooting

Next Steps