CRISP-DM is a vendor-neutral, industry-standard process model for data science and machine learning projects. It was developed in the late 1990s and remains the most widely adopted framework for structuring analytical work. By following CRISP-DM, teams ensure that every phase — from problem definition to deployment — is documented, traceable, and repeatable.
The six phases
Babel divides every project into the following six phases. Each phase has its own color accent in the UI to help you visually distinguish where you are in the lifecycle.| Phase | Name | Purpose | Accent color |
|---|---|---|---|
| 01 | Business Understanding | Define objectives, scope, and plan | #888780 |
| 02 | Data Understanding | Explore and assess data quality | #534AB7 |
| 03 | Data Preparation | Cleaning, ETL, feature engineering | #0F6E56 |
| 04 | Modeling | Train and tune models | #993556 |
| 05 | Evaluation | Validate against success criteria | #BA7517 |
| 06 | Deployment | Deploy, monitor, and report | #185FA5 |
Phase 01 — Business understanding
This is the starting point of every project. The goal is to translate business or research goals into a data science problem definition. Typical tasks: stakeholder interviews, problem framing, defining success criteria, project scoping, creating a project plan, identifying risks. Traceability: decisions and constraints documented here inform every downstream phase. Tag decisions to this phase so the team can trace why certain modeling choices were made.Phase 02 — Data understanding
Explore the data available to you. Assess its quality, coverage, and relevance before committing to any preparation or modeling approach. Typical tasks: data collection, exploratory data analysis (EDA), identifying missing values, documenting data sources, initial quality assessment. Traceability: datasets registered in this phase carry lineage information — source, schema, and quality notes — that feeds into Phase 03.Phase 03 — Data preparation
Transform raw data into a dataset suitable for modeling. This is often the most time-consuming phase. Typical tasks: data cleaning, handling missing values, outlier treatment, feature engineering, joins and aggregations, ETL pipelines, creating train/validation/test splits. Traceability: each transformation step should be linked to a dataset record so reviewers can reproduce the prepared dataset from its source.Phase 04 — Modeling
Select and apply modeling techniques. Tune hyperparameters and iterate until the model meets preliminary quality criteria. Typical tasks: algorithm selection, model training, hyperparameter tuning, cross-validation, model comparison. Traceability: experiment records capture model type, parameters, and evaluation metrics. Link experiments to the datasets used in Phase 03 for full lineage.Phase 05 — Evaluation
Determine whether the model truly meets the business objectives defined in Phase 01. This is a critical gate before deployment. Typical tasks: evaluating model performance against success criteria, error analysis, business case validation, stakeholder sign-off, reviewing traceability records. Traceability: evaluation results and approval decisions are recorded here. These records form the audit trail required in regulated environments.Phase 06 — Deployment
Deliver the solution to end users or production systems and set up monitoring to track its ongoing performance. Typical tasks: model deployment, API integration, monitoring setup, documentation, reporting, handoff to operations, post-deployment review. Traceability: deployment records capture the version deployed, the environment, and any rollback plans. Monitoring alerts and post-deployment decisions are tagged to this phase.The iterative nature of CRISP-DM
CRISP-DM is explicitly designed to be iterative, not linear. In practice, data science teams frequently cycle back to earlier phases as they learn more about the data and the problem. For example, EDA in Phase 02 might reveal that the business question defined in Phase 01 needs refinement. Or evaluation in Phase 05 might surface data quality issues that require returning to Phase 03.
How Babel maps CRISP-DM to the UI
Phase cards and the Kanban board
The Project details view shows one card per phase. Each card displays:- The phase name and number
- A progress bar showing overall phase completion
- A semaphore indicator (green, amber, or red)
- A count of high-priority tasks that are not yet completed
Pending, On hold, In progress, Under review, and Completed.
You can drag tasks between columns directly in the Kanban view to update their status without opening the task detail.
Traceability panel
Each phase has an associated traceability panel where you can attach datasets, experiments, and decisions. These records are searchable and exportable as part of the project report.Progress calculation
Babel calculates phase progress as the average of the status weights of all tasks in that phase. Each task status carries a numeric weight:| Status | Weight |
|---|---|
| Pending | 0% |
| On hold | 25% |
| In progress | 50% |
| Under review | 75% |
| Completed | 100% |
inProgress, completed, pending, and underReview, the progress is:
Semaphore color system
The semaphore on each phase card gives an at-a-glance health signal based on the calculated progress percentage.| Color | Threshold | Meaning |
|---|---|---|
| Green | ≥ 80% | Phase is on track or nearly complete |
| Amber | ≥ 30% and < 80% | Phase is in progress but needs attention |
| Red | < 30% | Phase is at risk or has not started |
High-priority tasks that are not yet completed are counted separately as an “at risk” metric on the dashboard. Even if a phase’s overall progress is green, the presence of high-priority incomplete tasks will surface a warning counter on the phase card.
