Skip to main content

Overview

The bootcamp uses industry-standard tools and libraries for data science and machine learning. This page covers installation, configuration, and essential usage for each tool.
All tools are open-source and widely used in professional data science environments.

Core Technology Stack

Python

Version: 3.8+Core programming language

Jupyter

Tool: Jupyter Notebook/LabInteractive development environment

NumPy

Domain: Numerical ComputingArray operations and linear algebra

Pandas

Domain: Data ManipulationDataFrames and data analysis

Matplotlib

Domain: VisualizationStatic plotting library

Seaborn

Domain: VisualizationStatistical data visualization

scikit-learn

Domain: Machine LearningClassical ML algorithms

TensorFlow

Domain: Deep LearningNeural networks with Keras API

PyTorch

Domain: Deep LearningDynamic neural networks

Streamlit

Domain: Web AppsData app deployment

Keras

Domain: Deep LearningHigh-level neural network API

lxml

Domain: Data ParsingXML and HTML processing

Installation Guide

1

Install Python

Download Python 3.8 or higher from python.orgVerify installation:
python --version
# or
python3 --version
2

Create Virtual Environment

It’s best practice to use a virtual environment:
# Create virtual environment
python -m venv bootcamp-env

# Activate (Windows)
bootcamp-env\Scripts\activate

# Activate (Mac/Linux)
source bootcamp-env/bin/activate
3

Install Core Libraries

Install all required packages:
# Data manipulation and analysis
pip install numpy pandas

# Visualization
pip install matplotlib seaborn

# Machine Learning
pip install scikit-learn

# Deep Learning
pip install tensorflow keras torch torchvision

# Jupyter and utilities
pip install jupyter jupyterlab
pip install streamlit
pip install lxml requests
4

Verify Installation

Test that everything works:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn
import tensorflow as tf
import torch

print("All libraries imported successfully!")
print(f"NumPy: {np.__version__}")
print(f"Pandas: {pd.__version__}")
print(f"Scikit-learn: {sklearn.__version__}")
print(f"TensorFlow: {tf.__version__}")
print(f"PyTorch: {torch.__version__}")

Method 2: Using Anaconda

Anaconda provides a pre-packaged data science environment:
  1. Download Anaconda from anaconda.com
  2. Create a new environment:
conda create -n bootcamp python=3.9
conda activate bootcamp
  1. Install packages:
# Most packages included with Anaconda
conda install numpy pandas matplotlib seaborn scikit-learn jupyter

# Deep learning frameworks
conda install -c conda-forge tensorflow
conda install pytorch torchvision -c pytorch

# Additional tools
conda install streamlit
  1. Launch Jupyter:
jupyter notebook
# or
jupyter lab

Using Requirements Files

The bootcamp includes requirements.txt files in project folders:
# Navigate to project directory
cd source/002_A3/PROYECTO/

# Install all requirements
pip install -r requirements.txt

Library Reference

Python

Python 3.8+

Official Documentation: python.org/docPurpose: Core programming language for all bootcamp activitiesKey Features:
  • Easy-to-learn syntax
  • Extensive standard library
  • Rich ecosystem for data science
  • Cross-platform compatibility
Bootcamp Usage: Foundation for all modules (A1-A8)

Jupyter Notebook/Lab

Jupyter

Official Documentation: jupyter.orgPurpose: Interactive development environment for data scienceKey Features:
  • Combine code, text, and visualizations
  • Cell-by-cell execution
  • Rich output display (plots, tables, HTML)
  • Markdown support
  • Easy sharing and collaboration
Launch Commands:
# Classic Notebook
jupyter notebook

# JupyterLab (modern interface)
jupyter lab

# Open specific notebook
jupyter notebook path/to/notebook.ipynb
Bootcamp Usage: All 111+ notebooks (A1-A8)Tips:
  • Use JupyterLab for multi-file projects
  • Install extensions for enhanced functionality
  • Use %matplotlib inline for inline plots

NumPy

NumPy

Official Documentation: numpy.orgPurpose: Numerical computing with multi-dimensional arraysKey Features:
  • N-dimensional array object (ndarray)
  • Broadcasting for vectorized operations
  • Linear algebra functions
  • Random number generation
  • Fast mathematical operations
Common Operations:
import numpy as np

# Create arrays
arr = np.array([1, 2, 3, 4, 5])
matrix = np.array([[1, 2], [3, 4]])

# Array operations
result = arr * 2
mean = np.mean(arr)

# Linear algebra
dot_product = np.dot(matrix, matrix.T)
Bootcamp Usage: Module A3 (NumPy fundamentals), foundation for all numerical workVersion Requirement: 1.19+

Pandas

Pandas

Official Documentation: pandas.pydata.orgPurpose: Data manipulation and analysis with DataFramesKey Features:
  • DataFrame and Series data structures
  • Reading/writing various file formats (CSV, Excel, SQL)
  • Data cleaning and transformation
  • Group by operations
  • Time series functionality
  • Missing data handling
Common Operations:
import pandas as pd

# Load data
df = pd.read_csv('data.csv')

# Exploration
df.head()
df.info()
df.describe()

# Manipulation
df_clean = df.dropna()
df_grouped = df.groupby('category')['value'].sum()

# Save results
df.to_csv('output.csv', index=False)
Bootcamp Usage: Module A3 (primary focus), used throughout A4-A8Version Requirement: 1.2+

Matplotlib

Matplotlib

Official Documentation: matplotlib.orgPurpose: Comprehensive plotting and visualizationKey Features:
  • Publication-quality figures
  • Multiple plot types (line, scatter, bar, histogram, etc.)
  • Fine-grained control over plot elements
  • Subplots and figure layouts
  • Save plots in various formats
Common Operations:
import matplotlib.pyplot as plt

# Basic plot
plt.plot(x, y)
plt.xlabel('X Label')
plt.ylabel('Y Label')
plt.title('My Plot')
plt.show()

# Subplots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
ax1.plot(x, y)
ax2.scatter(x, z)

# Save figure
plt.savefig('plot.png', dpi=300, bbox_inches='tight')
Bootcamp Usage: Module A4 (primary), used in A5-A8 for visualizing resultsVersion Requirement: 3.3+

Seaborn

Seaborn

Official Documentation: seaborn.pydata.orgPurpose: Statistical data visualization built on MatplotlibKey Features:
  • Beautiful default styles
  • Statistical plotting functions
  • Integration with Pandas DataFrames
  • Complex visualizations with less code
  • Color palettes and themes
Common Operations:
import seaborn as sns
import matplotlib.pyplot as plt

# Set style
sns.set_style('whitegrid')

# Statistical plots
sns.histplot(data=df, x='column', kde=True)
sns.boxplot(data=df, x='category', y='value')
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')

# Pair plot for multivariate analysis
sns.pairplot(df, hue='target')

plt.show()
Bootcamp Usage: Module A4 (primary), enhances visualizations in A5-A8Version Requirement: 0.11+

scikit-learn

scikit-learn

Official Documentation: scikit-learn.orgPurpose: Machine learning algorithms and toolsKey Features:
  • Classification, regression, clustering algorithms
  • Model selection and evaluation
  • Data preprocessing and feature engineering
  • Pipeline construction
  • Cross-validation tools
  • Extensive algorithm library
Common Operations:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Preprocess
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train model
model = LinearRegression()
model.fit(X_train_scaled, y_train)

# Evaluate
y_pred = model.predict(X_test_scaled)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
Key Algorithms Used:
  • Linear/Logistic Regression
  • K-Nearest Neighbors (KNN)
  • Decision Trees and Random Forests
  • Gradient Boosting
  • K-Means Clustering
  • PCA (Principal Component Analysis)
Bootcamp Usage: Modules A6-A7 (primary), introduction to ML workflowsVersion Requirement: 0.24+

TensorFlow & Keras

TensorFlow + Keras

Official Documentation:Purpose: Deep learning framework with high-level Keras APIKey Features:
  • Sequential and Functional APIs
  • Pre-built layers and models
  • Automatic differentiation
  • GPU acceleration
  • Model saving and deployment
  • Extensive pre-trained models
Common Operations:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Build model
model = keras.Sequential([
    layers.Flatten(input_shape=(28, 28)),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(10, activation='softmax')
])

# Compile
model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# Train
history = model.fit(
    X_train, y_train,
    epochs=10,
    validation_split=0.2,
    batch_size=32
)

# Evaluate
test_loss, test_acc = model.evaluate(X_test, y_test)

# Save
model.save('model.h5')
Bootcamp Usage: Module A8 (proyecto_mod8_keras.ipynb)Version Requirement: TensorFlow 2.4+

PyTorch

PyTorch

Official Documentation: pytorch.org/docsPurpose: Dynamic deep learning frameworkKey Features:
  • Dynamic computational graphs
  • Pythonic and intuitive API
  • Strong GPU acceleration
  • Extensive neural network modules
  • Popular in research
  • TorchVision for computer vision
Common Operations:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms

# Define model
class NeuralNet(nn.Module):
    def __init__(self):
        super(NeuralNet, self).__init__()
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(28*28, 128)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(0.2)
        self.fc2 = nn.Linear(128, 10)
    
    def forward(self, x):
        x = self.flatten(x)
        x = self.fc1(x)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc2(x)
        return x

# Initialize
model = NeuralNet()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters())

# Training loop
for epoch in range(num_epochs):
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
Bootcamp Usage: Module A8 (proyecto_mod8_pytorch.ipynb)Version Requirement: PyTorch 1.8+, TorchVision 0.9+

Streamlit

Streamlit

Official Documentation: docs.streamlit.ioPurpose: Build and deploy data apps quicklyKey Features:
  • Pure Python - no HTML/CSS/JS required
  • Instant hot-reload
  • Interactive widgets
  • Built-in charting
  • Easy deployment
  • Session state management
Common Operations:
import streamlit as st
import pandas as pd
import matplotlib.pyplot as plt

# Title and text
st.title('My Data App')
st.write('Welcome to my analysis!')

# Load and display data
df = pd.read_csv('data.csv')
st.dataframe(df)

# Interactive widgets
option = st.selectbox('Choose a column:', df.columns)
slider_value = st.slider('Select a value', 0, 100, 50)

# Display charts
st.line_chart(df[option])

# Matplotlib integration
fig, ax = plt.subplots()
ax.hist(df[option])
st.pyplot(fig)
Run Streamlit App:
streamlit run app.py
Bootcamp Usage: Used in multiple modules for creating interactive demosVersion Requirement: 1.0+

Additional Libraries

lxml

Purpose: XML and HTML processingUsed for parsing web data and working with Excel files

requests

Purpose: HTTP library for API callsUsed for fetching data from web APIs

yfinance

Purpose: Yahoo Finance dataUsed in Module A3 for financial data analysis

openpyxl

Purpose: Excel file supportBackend for Pandas Excel operations

Version Requirements

Recommended versions as of the bootcamp creation:
Python >= 3.8
numpy >= 1.19.0
pandas >= 1.2.0
matplotlib >= 3.3.0
seaborn >= 0.11.0
scikit-learn >= 0.24.0
tensorflow >= 2.4.0
keras >= 2.4.0
torch >= 1.8.0
torchvision >= 0.9.0
streamlit >= 1.0.0
jupyter >= 1.0.0
lxml >= 4.6.0
requests >= 2.25.0

Check Your Versions

import sys
import numpy as np
import pandas as pd
import matplotlib
import seaborn as sns
import sklearn
import tensorflow as tf
import torch
import streamlit as st

print(f"Python: {sys.version}")
print(f"NumPy: {np.__version__}")
print(f"Pandas: {pd.__version__}")
print(f"Matplotlib: {matplotlib.__version__}")
print(f"Seaborn: {sns.__version__}")
print(f"Scikit-learn: {sklearn.__version__}")
print(f"TensorFlow: {tf.__version__}")
print(f"PyTorch: {torch.__version__}")
print(f"Streamlit: {st.__version__}")

Troubleshooting

Problem: ModuleNotFoundError: No module named 'package'Solutions:
  1. Install the package: pip install package-name
  2. Check you’re using the correct Python environment
  3. Restart Jupyter kernel after installation
  4. Verify installation: pip list | grep package-name
Problem: Models training slowly on CPUSolutions:
  1. Check GPU availability:
    # TensorFlow
    import tensorflow as tf
    print(tf.config.list_physical_devices('GPU'))
    
    # PyTorch
    import torch
    print(torch.cuda.is_available())
    
  2. Install GPU versions:
    # TensorFlow GPU
    pip install tensorflow-gpu
    
    # PyTorch GPU (check pytorch.org for your system)
    pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
    
  3. Install CUDA and cuDNN drivers
Problem: Kernel keeps dying or won’t startSolutions:
  1. Restart kernel: Kernel > Restart
  2. Check for memory issues (close other apps)
  3. Reinstall kernel:
    pip install --upgrade jupyter ipykernel
    python -m ipykernel install --user
    
  4. Clear notebook output: Cell > All Output > Clear
Problem: Incompatible package versionsSolutions:
  1. Create a fresh virtual environment
  2. Install packages one by one to identify conflicts
  3. Use pip install --upgrade package-name
  4. Check compatibility with pip check
Problem: Plots don’t display in JupyterSolution: Add this magic command at the start of your notebook:
%matplotlib inline
For interactive plots:
%matplotlib notebook
# or
%matplotlib widget

Additional Resources

Learning Resources

Python

Official Python Tutorial

NumPy

NumPy Quickstart

Pandas

10 Minutes to Pandas

Matplotlib

Matplotlib Tutorials

Scikit-learn

Scikit-learn Tutorials

TensorFlow

TensorFlow Tutorials

PyTorch

PyTorch Tutorials

Streamlit

Streamlit Get Started

Cheat Sheets

Next Steps

Jupyter Notebooks

Explore the 111+ notebooks that use these tools

Datasets

Work with bootcamp datasets

Glossary

Learn data science terminology

Setup Guide

Get started with environment setup

Build docs developers (and LLMs) love