Overview
The customer normalization process reads raw customer data from CSV files, cleans and validates the data, and generates normalized database tables. This module is implemented innormalizar_datos.py.
Input File
File:clientes_prueba_mas_datos.csv
Expected Columns:
cliente_id- Customer ID (optional, auto-generated if missing)nombre- Customer nameciudad- City namesegmento- Customer segmentfecha_registro- Registration date
Output Files
The normalization process generates four CSV files:-
clientes_normalizados.csv - Normalized customer table with foreign keys
- Fields:
cliente_id,nombre,ciudad_id,segmento_id,fecha_registro
- Fields:
-
ciudades.csv - City dimension table
- Fields:
ciudad_id,nombre
- Fields:
-
segmentos.csv - Segment dimension table
- Fields:
segmento_id,nombre
- Fields:
-
clientes_normalizados_completo.csv - Denormalized view with human-readable values
- Fields:
cliente_id,nombre,ciudad,segmento,fecha_registro
- Fields:
Data Cleaning Functions
The module uses three core cleaning functions documented in Data Cleaning Functions:limpiar_texto()- Cleans and normalizes text fieldsnormalizar_fecha()- Standardizes date formatsnormalizar_numero()- Parses and validates numeric values
Processing Logic
Main Processing Function
Theprocesar_clientes() function orchestrates the entire normalization workflow:
Customer ID Generation
The system handles missing or invalid customer IDs by auto-generating sequential IDs:Data Validation and Defaults
The system applies intelligent defaults for missing values:segmentois required - records without a segment are skipped- If
nombreis missing, generates “Cliente ” - If
ciudadis missing, uses “Sin especificar” - Empty
fecha_registrois stored as empty string
Dimension Table Creation
The normalization process extracts unique cities and segments into separate dimension tables:Mapping Logic
The system creates lookup dictionaries to map dimension names to IDs:ciudad_map Example
segmento_map Example
File Output
All output files are written with UTF-8 BOM encoding for Excel compatibility:Data Transformation Examples
Example 1: Complete Record
Input:Example 2: Missing Customer ID
Input:Example 3: Missing Name and City
Input:Usage
Run the normalization script from the command line:Error Handling
The script handles several error conditions:-
File Not Found:
- Encoding Errors: Automatically tries multiple encodings
- Invalid Data: Skips records that don’t meet validation requirements
- Missing Values: Applies default values where appropriate