NumPy and Pandas are the cornerstone libraries for data manipulation and analysis in Python. This module covers array operations, DataFrame manipulation, and efficient data processing techniques.
NumPy Arrays
Multi-dimensional arrays and vectorized operations
Extract specific data points or subsets efficiently:
# Get specific stock price on specific dayaccion_idx = 1 # Second row (stock 2)dia_idx = 3 # Fourth dayprecio_especifico = precios[accion_idx, dia_idx]print("Stock 2 price on day 4:", precio_especifico)# Extract all stocks on day 3precios_dia3 = precios[:, 2]print("Prices on day 3:", precios_dia3)# Extract subset: stocks 1, 3, and 5 on days 2 and 5acciones_idx = [0, 2, 4]dias_idx = [1, 4]subconjunto = precios[np.ix_(acciones_idx, dias_idx)]print("Subset stocks [1,3,5] days [2,5]:\n", subconjunto)
import pandas as pddf = pd.read_csv('empleados.csv')# Overview of DataFrameprint(df.info())# Statistical summaryprint(df.describe())# Check for missing valuesprint(df.isnull().sum())# Check for duplicatesprint(f"Duplicated rows: {df.duplicated().sum()}")
# Group by department and calculate statisticsresumen_depto = df.groupby('Departamento').agg({ 'ID': 'count', 'Salario': ['mean', 'min', 'max']}).round(2)resumen_depto.columns = ['Cantidad', 'Salario_Promedio', 'Salario_Mínimo', 'Salario_Máximo']print(resumen_depto)
# Calculate averages using NumPy (vectorized)promedios = precios.mean(axis=1)print("Averages with NumPy:", promedios)
Result: Fast, concise, optimized at C-level
# Calculate averages using pure Python (loops)precios_list = precios.tolist()promedios_sin_numpy = []for fila in precios_list: suma = 0 for valor in fila: suma += valor promedio = suma / len(fila) promedios_sin_numpy.append(promedio)print("Averages without NumPy:", promedios_sin_numpy)