Overview
This project develops an intelligent customer segmentation system for Retail Insights S.A. using unsupervised learning techniques. The goal is to identify hidden customer patterns, visualize segments, and enable personalized marketing strategies. Business Context: The company currently uses basic segmentation (age, total spending) but needs more sophisticated clustering to:- Identify hidden customer segments
- Design targeted marketing campaigns
- Improve customer retention and loyalty
- Optimize cross-selling and upselling strategies
- Allocate marketing budgets effectively
Project Structure
Dataset Description
Files:Train.csv (training set) and Test.csv (test set)
Variables
Demographic:customer_id: Unique customer identifierGender: Customer genderEver_Married: Marital statusAge: Customer ageGraduated: Education level (Yes/No)
Profession: Occupation (Artist, Executive, Healthcare, Engineer, Lawyer, etc.)Work_Experience: Years of work experienceFamily_Size: Number of family members
Spending_Score: Customer spending category (Low, Average, High)Var_1: Additional categorical variableSegmentation: Existing segmentation (A, B, C, D)
Data Preprocessing
1. Load and Explore Data
2. Handle Missing Values
3. Feature Encoding and Scaling
Dimensionality Reduction
1. PCA (Principal Component Analysis)
2. t-SNE (t-distributed Stochastic Neighbor Embedding)
- PCA: Linear projection, preserves global structure, faster
- t-SNE: Non-linear, preserves local neighborhoods, better for visualization
Clustering Algorithms
1. K-Means Clustering
Elbow Method:2. DBSCAN (Density-Based Clustering)
3. Hierarchical (Agglomerative) Clustering
Algorithm Comparison
| Algorithm | N_Clusters | Silhouette_Score | Noise_Points |
|---|---|---|---|
| K-Means | 4 | 0.312 | 0 |
| DBSCAN | 3-5 | 0.285 | 50-150 |
| Hierarchical | 4 | 0.308 | 0 |
Cluster Interpretation
- Average age: 28 years
- Occupation: Artist, Healthcare
- Spending: Low to Average
- Family size: Small (1-2 members)
- Strategy: Entry-level products, loyalty programs
- Average age: 42 years
- Occupation: Executive, Lawyer
- Spending: High
- Family size: Medium (3-4 members)
- Strategy: Premium products, VIP programs, family bundles
- Average age: 35 years
- Occupation: Engineer, Healthcare
- Spending: Average
- Family size: Medium
- Strategy: Value-for-money products, upgrade campaigns
- Average age: 32 years
- Mixed occupations
- Spending: Low
- Family size: Variable
- Strategy: Discounts, promotions, payment plans
Business Recommendations
1. Marketing Strategies by Segment
High-Value Customers (Cluster 1):- Premium product launches
- Exclusive events and previews
- Personalized account management
- Loyalty rewards programs
- Upgrade campaigns (“move from Standard to Premium”)
- Cross-selling opportunities
- Bundle offers
- Referral incentives
- Seasonal promotions
- Clearance sales
- Payment installment plans
- Entry-level product lines
- Student/young professional discounts
- Onboarding programs
- Educational content
- Future high-value customers
2. Retention Strategies
- Churn risk prediction: Monitor customers showing signs of cluster migration
- Win-back campaigns: Re-engage inactive customers by cluster
- Satisfaction surveys: Tailored by segment
3. Product Development
- Cluster-specific products: Design offerings for each segment
- Pricing tiers: Align with segment spending capacity
- Feature prioritization: Based on segment preferences
Model Deployment
Conclusions
Key Achievements
- Segmentation System: Successfully identified 4 distinct customer segments
- Dimensionality Reduction: Reduced 20+ features to interpretable 2D visualizations
- Algorithm Comparison: Evaluated K-Means, DBSCAN, and Hierarchical clustering
- Business Insights: Translated clusters into actionable marketing strategies
Technical Insights
- K-Means: Best balance of performance and interpretability
- DBSCAN: Identified outliers (3-5% of customers)
- Hierarchical: Provided hierarchical structure for nested campaigns
- PCA vs t-SNE: PCA better for interpretation, t-SNE better for visualization
Limitations
- Static segmentation: Doesn’t capture customer evolution over time
- Limited features: Could benefit from transactional history, web behavior
- Cluster stability: May need periodic re-clustering as business evolves
- Interpretability: Some clusters may overlap in certain dimensions
Future Work
- Dynamic segmentation: Time-series clustering to track customer journeys
- Additional features: Integrate purchase history, website clicks, customer service interactions
- Advanced algorithms: Try Gaussian Mixture Models, Self-Organizing Maps
- A/B testing: Validate segment-based campaigns against control groups
- Real-time scoring: Deploy model as API for real-time customer assignment
- Cluster transition analysis: Predict when customers will move between segments