Overview
The bootcamp uses industry-standard tools and libraries for data science and machine learning. This page covers installation, configuration, and essential usage for each tool.All tools are open-source and widely used in professional data science environments.
Core Technology Stack
Python
Version: 3.8+Core programming language
Jupyter
Tool: Jupyter Notebook/LabInteractive development environment
NumPy
Domain: Numerical ComputingArray operations and linear algebra
Pandas
Domain: Data ManipulationDataFrames and data analysis
Matplotlib
Domain: VisualizationStatic plotting library
Seaborn
Domain: VisualizationStatistical data visualization
scikit-learn
Domain: Machine LearningClassical ML algorithms
TensorFlow
Domain: Deep LearningNeural networks with Keras API
PyTorch
Domain: Deep LearningDynamic neural networks
Streamlit
Domain: Web AppsData app deployment
Keras
Domain: Deep LearningHigh-level neural network API
lxml
Domain: Data ParsingXML and HTML processing
Installation Guide
Method 1: Using pip (Recommended)
Install Python
Method 2: Using Anaconda
Anaconda Installation (Alternative)
Anaconda Installation (Alternative)
Anaconda provides a pre-packaged data science environment:
- Download Anaconda from anaconda.com
- Create a new environment:
- Install packages:
- Launch Jupyter:
Using Requirements Files
The bootcamp includesrequirements.txt files in project folders:
Library Reference
Python
Python 3.8+
Official Documentation: python.org/docPurpose: Core programming language for all bootcamp activitiesKey Features:
- Easy-to-learn syntax
- Extensive standard library
- Rich ecosystem for data science
- Cross-platform compatibility
Jupyter Notebook/Lab
Jupyter
Official Documentation: jupyter.orgPurpose: Interactive development environment for data scienceKey Features:Bootcamp Usage: All 111+ notebooks (A1-A8)Tips:
- Combine code, text, and visualizations
- Cell-by-cell execution
- Rich output display (plots, tables, HTML)
- Markdown support
- Easy sharing and collaboration
- Use JupyterLab for multi-file projects
- Install extensions for enhanced functionality
- Use
%matplotlib inlinefor inline plots
NumPy
NumPy
Official Documentation: numpy.orgPurpose: Numerical computing with multi-dimensional arraysKey Features:Bootcamp Usage: Module A3 (NumPy fundamentals), foundation for all numerical workVersion Requirement: 1.19+
- N-dimensional array object (ndarray)
- Broadcasting for vectorized operations
- Linear algebra functions
- Random number generation
- Fast mathematical operations
Pandas
Pandas
Official Documentation: pandas.pydata.orgPurpose: Data manipulation and analysis with DataFramesKey Features:Bootcamp Usage: Module A3 (primary focus), used throughout A4-A8Version Requirement: 1.2+
- DataFrame and Series data structures
- Reading/writing various file formats (CSV, Excel, SQL)
- Data cleaning and transformation
- Group by operations
- Time series functionality
- Missing data handling
Matplotlib
Matplotlib
Official Documentation: matplotlib.orgPurpose: Comprehensive plotting and visualizationKey Features:Bootcamp Usage: Module A4 (primary), used in A5-A8 for visualizing resultsVersion Requirement: 3.3+
- Publication-quality figures
- Multiple plot types (line, scatter, bar, histogram, etc.)
- Fine-grained control over plot elements
- Subplots and figure layouts
- Save plots in various formats
Seaborn
Seaborn
Official Documentation: seaborn.pydata.orgPurpose: Statistical data visualization built on MatplotlibKey Features:Bootcamp Usage: Module A4 (primary), enhances visualizations in A5-A8Version Requirement: 0.11+
- Beautiful default styles
- Statistical plotting functions
- Integration with Pandas DataFrames
- Complex visualizations with less code
- Color palettes and themes
scikit-learn
scikit-learn
Official Documentation: scikit-learn.orgPurpose: Machine learning algorithms and toolsKey Features:Key Algorithms Used:
- Classification, regression, clustering algorithms
- Model selection and evaluation
- Data preprocessing and feature engineering
- Pipeline construction
- Cross-validation tools
- Extensive algorithm library
- Linear/Logistic Regression
- K-Nearest Neighbors (KNN)
- Decision Trees and Random Forests
- Gradient Boosting
- K-Means Clustering
- PCA (Principal Component Analysis)
TensorFlow & Keras
TensorFlow + Keras
Official Documentation:Purpose: Deep learning framework with high-level Keras APIKey Features:Bootcamp Usage: Module A8 (
- Sequential and Functional APIs
- Pre-built layers and models
- Automatic differentiation
- GPU acceleration
- Model saving and deployment
- Extensive pre-trained models
proyecto_mod8_keras.ipynb)Version Requirement: TensorFlow 2.4+PyTorch
PyTorch
Official Documentation: pytorch.org/docsPurpose: Dynamic deep learning frameworkKey Features:Bootcamp Usage: Module A8 (
- Dynamic computational graphs
- Pythonic and intuitive API
- Strong GPU acceleration
- Extensive neural network modules
- Popular in research
- TorchVision for computer vision
proyecto_mod8_pytorch.ipynb)Version Requirement: PyTorch 1.8+, TorchVision 0.9+Streamlit
Streamlit
Official Documentation: docs.streamlit.ioPurpose: Build and deploy data apps quicklyKey Features:Run Streamlit App:Bootcamp Usage: Used in multiple modules for creating interactive demosVersion Requirement: 1.0+
- Pure Python - no HTML/CSS/JS required
- Instant hot-reload
- Interactive widgets
- Built-in charting
- Easy deployment
- Session state management
Additional Libraries
lxml
Purpose: XML and HTML processingUsed for parsing web data and working with Excel files
requests
Purpose: HTTP library for API callsUsed for fetching data from web APIs
yfinance
Purpose: Yahoo Finance dataUsed in Module A3 for financial data analysis
openpyxl
Purpose: Excel file supportBackend for Pandas Excel operations
Version Requirements
Recommended versions as of the bootcamp creation:
Check Your Versions
Troubleshooting
Import Errors
Import Errors
Problem:
ModuleNotFoundError: No module named 'package'Solutions:- Install the package:
pip install package-name - Check you’re using the correct Python environment
- Restart Jupyter kernel after installation
- Verify installation:
pip list | grep package-name
TensorFlow/PyTorch Not Using GPU
TensorFlow/PyTorch Not Using GPU
Problem: Models training slowly on CPUSolutions:
- Check GPU availability:
- Install GPU versions:
- Install CUDA and cuDNN drivers
Jupyter Kernel Issues
Jupyter Kernel Issues
Problem: Kernel keeps dying or won’t startSolutions:
- Restart kernel: Kernel > Restart
- Check for memory issues (close other apps)
- Reinstall kernel:
- Clear notebook output: Cell > All Output > Clear
Package Conflicts
Package Conflicts
Problem: Incompatible package versionsSolutions:
- Create a fresh virtual environment
- Install packages one by one to identify conflicts
- Use
pip install --upgrade package-name - Check compatibility with
pip check
Matplotlib Plots Not Showing
Matplotlib Plots Not Showing
Problem: Plots don’t display in JupyterSolution:
Add this magic command at the start of your notebook:For interactive plots:
Additional Resources
Learning Resources
Python
Official Python Tutorial
NumPy
NumPy Quickstart
Pandas
10 Minutes to Pandas
Matplotlib
Matplotlib Tutorials
Scikit-learn
Scikit-learn Tutorials
TensorFlow
TensorFlow Tutorials
PyTorch
PyTorch Tutorials
Streamlit
Streamlit Get Started
Cheat Sheets
Next Steps
Jupyter Notebooks
Explore the 111+ notebooks that use these tools
Datasets
Work with bootcamp datasets
Glossary
Learn data science terminology
Setup Guide
Get started with environment setup