CRISP-DM workflow

Babel organizes every data science project using the CRISP-DM methodology (Cross-Industry Standard Process for Data Mining). CRISP-DM provides a structured, iterative framework that guides a project from raw business requirements all the way through to a deployed, monitored solution.

CRISP-DM is a vendor-neutral, industry-standard process model for data science and machine learning projects. It was developed in the late 1990s and remains the most widely adopted framework for structuring analytical work. By following CRISP-DM, teams ensure that every phase — from problem definition to deployment — is documented, traceable, and repeatable.

The six phases

Babel divides every project into the following six phases. Each phase has its own color accent in the UI to help you visually distinguish where you are in the lifecycle.

Phase	Name	Purpose	Accent color
01	Business Understanding	Define objectives, scope, and plan	`#888780`
02	Data Understanding	Explore and assess data quality	`#534AB7`
03	Data Preparation	Cleaning, ETL, feature engineering	`#0F6E56`
04	Modeling	Train and tune models	`#993556`
05	Evaluation	Validate against success criteria	`#BA7517`
06	Deployment	Deploy, monitor, and report	`#185FA5`

Phase 01 — Business understanding

This is the starting point of every project. The goal is to translate business or research goals into a data science problem definition. Typical tasks: stakeholder interviews, problem framing, defining success criteria, project scoping, creating a project plan, identifying risks. Traceability: decisions and constraints documented here inform every downstream phase. Tag decisions to this phase so the team can trace why certain modeling choices were made.

Phase 02 — Data understanding

Explore the data available to you. Assess its quality, coverage, and relevance before committing to any preparation or modeling approach. Typical tasks: data collection, exploratory data analysis (EDA), identifying missing values, documenting data sources, initial quality assessment. Traceability: datasets registered in this phase carry lineage information — source, schema, and quality notes — that feeds into Phase 03.

Phase 03 — Data preparation

Transform raw data into a dataset suitable for modeling. This is often the most time-consuming phase. Typical tasks: data cleaning, handling missing values, outlier treatment, feature engineering, joins and aggregations, ETL pipelines, creating train/validation/test splits. Traceability: each transformation step should be linked to a dataset record so reviewers can reproduce the prepared dataset from its source.

Phase 04 — Modeling

Select and apply modeling techniques. Tune hyperparameters and iterate until the model meets preliminary quality criteria. Typical tasks: algorithm selection, model training, hyperparameter tuning, cross-validation, model comparison. Traceability: experiment records capture model type, parameters, and evaluation metrics. Link experiments to the datasets used in Phase 03 for full lineage.

Phase 05 — Evaluation

Determine whether the model truly meets the business objectives defined in Phase 01. This is a critical gate before deployment. Typical tasks: evaluating model performance against success criteria, error analysis, business case validation, stakeholder sign-off, reviewing traceability records. Traceability: evaluation results and approval decisions are recorded here. These records form the audit trail required in regulated environments.

Phase 06 — Deployment

Deliver the solution to end users or production systems and set up monitoring to track its ongoing performance. Typical tasks: model deployment, API integration, monitoring setup, documentation, reporting, handoff to operations, post-deployment review. Traceability: deployment records capture the version deployed, the environment, and any rollback plans. Monitoring alerts and post-deployment decisions are tagged to this phase.

The iterative nature of CRISP-DM

CRISP-DM is explicitly designed to be iterative, not linear. In practice, data science teams frequently cycle back to earlier phases as they learn more about the data and the problem. For example, EDA in Phase 02 might reveal that the business question defined in Phase 01 needs refinement. Or evaluation in Phase 05 might surface data quality issues that require returning to Phase 03.

In Babel, you can add tasks to any phase at any time, regardless of the phase’s current progress. This reflects real-world project dynamics where work in one phase often surfaces requirements in another. The CRISP-DM phases in Babel are containers for organizing and tracking work — they do not enforce a strict linear gate.

How Babel maps CRISP-DM to the UI

Phase cards and the Kanban board

The Project details view shows one card per phase. Each card displays:

The phase name and number
A progress bar showing overall phase completion
A semaphore indicator (green, amber, or red)
A count of high-priority tasks that are not yet completed

Clicking a phase card expands it to reveal all tasks assigned to that phase. Within the expanded view, tasks are displayed in a Kanban-style board organized by status columns: Pending, On hold, In progress, Under review, and Completed. You can drag tasks between columns directly in the Kanban view to update their status without opening the task detail.

Traceability panel

Each phase has an associated traceability panel where you can attach datasets, experiments, and decisions. These records are searchable and exportable as part of the project report.

Progress calculation

Babel calculates phase progress as the average of the status weights of all tasks in that phase. Each task status carries a numeric weight:

Status	Weight
Pending	0%
On hold	25%
In progress	50%
Under review	75%
Completed	100%

Formula:

Phase progress = sum(task weights) / number of tasks

For example, if Phase 03 has four tasks with statuses inProgress, completed, pending, and underReview, the progress is:

(50 + 100 + 0 + 75) / 4 = 56.25%

Phases with no tasks show 0% progress.

Semaphore color system

The semaphore on each phase card gives an at-a-glance health signal based on the calculated progress percentage.

Color	Threshold	Meaning
Green	≥ 80%	Phase is on track or nearly complete
Amber	≥ 30% and < 80%	Phase is in progress but needs attention
Red	< 30%	Phase is at risk or has not started

The dashboard aggregates these signals across all phases so you can immediately see which parts of your project need attention.

High-priority tasks that are not yet completed are counted separately as an “at risk” metric on the dashboard. Even if a phase’s overall progress is green, the presence of high-priority incomplete tasks will surface a warning counter on the phase card.

Get Started

Core Features

User Guide

The six phases

Phase 01 — Business understanding

Phase 02 — Data understanding

Phase 03 — Data preparation

Phase 04 — Modeling

Phase 05 — Evaluation

Phase 06 — Deployment

The iterative nature of CRISP-DM

How Babel maps CRISP-DM to the UI

Phase cards and the Kanban board

Traceability panel

Progress calculation

Semaphore color system

Build docs developers (and LLMs) love

Get Started

Core Features

User Guide

​The six phases

​Phase 01 — Business understanding

​Phase 02 — Data understanding

​Phase 03 — Data preparation

​Phase 04 — Modeling

​Phase 05 — Evaluation

​Phase 06 — Deployment

​The iterative nature of CRISP-DM

​How Babel maps CRISP-DM to the UI

​Phase cards and the Kanban board

​Traceability panel

​Progress calculation

​Semaphore color system

Build docs developers (and LLMs) love

The six phases

Phase 01 — Business understanding

Phase 02 — Data understanding

Phase 03 — Data preparation

Phase 04 — Modeling

Phase 05 — Evaluation

Phase 06 — Deployment

The iterative nature of CRISP-DM

How Babel maps CRISP-DM to the UI

Phase cards and the Kanban board

Traceability panel

Progress calculation

Semaphore color system