Skip to main content
Babel organizes every data science project using the CRISP-DM methodology (Cross-Industry Standard Process for Data Mining). CRISP-DM provides a structured, iterative framework that guides a project from raw business requirements all the way through to a deployed, monitored solution.
CRISP-DM is a vendor-neutral, industry-standard process model for data science and machine learning projects. It was developed in the late 1990s and remains the most widely adopted framework for structuring analytical work. By following CRISP-DM, teams ensure that every phase — from problem definition to deployment — is documented, traceable, and repeatable.

The six phases

Babel divides every project into the following six phases. Each phase has its own color accent in the UI to help you visually distinguish where you are in the lifecycle.
PhaseNamePurposeAccent color
01Business UnderstandingDefine objectives, scope, and plan#888780
02Data UnderstandingExplore and assess data quality#534AB7
03Data PreparationCleaning, ETL, feature engineering#0F6E56
04ModelingTrain and tune models#993556
05EvaluationValidate against success criteria#BA7517
06DeploymentDeploy, monitor, and report#185FA5

Phase 01 — Business understanding

This is the starting point of every project. The goal is to translate business or research goals into a data science problem definition. Typical tasks: stakeholder interviews, problem framing, defining success criteria, project scoping, creating a project plan, identifying risks. Traceability: decisions and constraints documented here inform every downstream phase. Tag decisions to this phase so the team can trace why certain modeling choices were made.

Phase 02 — Data understanding

Explore the data available to you. Assess its quality, coverage, and relevance before committing to any preparation or modeling approach. Typical tasks: data collection, exploratory data analysis (EDA), identifying missing values, documenting data sources, initial quality assessment. Traceability: datasets registered in this phase carry lineage information — source, schema, and quality notes — that feeds into Phase 03.

Phase 03 — Data preparation

Transform raw data into a dataset suitable for modeling. This is often the most time-consuming phase. Typical tasks: data cleaning, handling missing values, outlier treatment, feature engineering, joins and aggregations, ETL pipelines, creating train/validation/test splits. Traceability: each transformation step should be linked to a dataset record so reviewers can reproduce the prepared dataset from its source.

Phase 04 — Modeling

Select and apply modeling techniques. Tune hyperparameters and iterate until the model meets preliminary quality criteria. Typical tasks: algorithm selection, model training, hyperparameter tuning, cross-validation, model comparison. Traceability: experiment records capture model type, parameters, and evaluation metrics. Link experiments to the datasets used in Phase 03 for full lineage.

Phase 05 — Evaluation

Determine whether the model truly meets the business objectives defined in Phase 01. This is a critical gate before deployment. Typical tasks: evaluating model performance against success criteria, error analysis, business case validation, stakeholder sign-off, reviewing traceability records. Traceability: evaluation results and approval decisions are recorded here. These records form the audit trail required in regulated environments.

Phase 06 — Deployment

Deliver the solution to end users or production systems and set up monitoring to track its ongoing performance. Typical tasks: model deployment, API integration, monitoring setup, documentation, reporting, handoff to operations, post-deployment review. Traceability: deployment records capture the version deployed, the environment, and any rollback plans. Monitoring alerts and post-deployment decisions are tagged to this phase.

The iterative nature of CRISP-DM

CRISP-DM is explicitly designed to be iterative, not linear. In practice, data science teams frequently cycle back to earlier phases as they learn more about the data and the problem. For example, EDA in Phase 02 might reveal that the business question defined in Phase 01 needs refinement. Or evaluation in Phase 05 might surface data quality issues that require returning to Phase 03.
In Babel, you can add tasks to any phase at any time, regardless of the phase’s current progress. This reflects real-world project dynamics where work in one phase often surfaces requirements in another. The CRISP-DM phases in Babel are containers for organizing and tracking work — they do not enforce a strict linear gate.

How Babel maps CRISP-DM to the UI

Phase cards and the Kanban board

The Project details view shows one card per phase. Each card displays:
  • The phase name and number
  • A progress bar showing overall phase completion
  • A semaphore indicator (green, amber, or red)
  • A count of high-priority tasks that are not yet completed
Clicking a phase card expands it to reveal all tasks assigned to that phase. Within the expanded view, tasks are displayed in a Kanban-style board organized by status columns: Pending, On hold, In progress, Under review, and Completed. You can drag tasks between columns directly in the Kanban view to update their status without opening the task detail.

Traceability panel

Each phase has an associated traceability panel where you can attach datasets, experiments, and decisions. These records are searchable and exportable as part of the project report.

Progress calculation

Babel calculates phase progress as the average of the status weights of all tasks in that phase. Each task status carries a numeric weight:
StatusWeight
Pending0%
On hold25%
In progress50%
Under review75%
Completed100%
Formula:
Phase progress = sum(task weights) / number of tasks
For example, if Phase 03 has four tasks with statuses inProgress, completed, pending, and underReview, the progress is:
(50 + 100 + 0 + 75) / 4 = 56.25%
Phases with no tasks show 0% progress.

Semaphore color system

The semaphore on each phase card gives an at-a-glance health signal based on the calculated progress percentage.
ColorThresholdMeaning
Green≥ 80%Phase is on track or nearly complete
Amber≥ 30% and < 80%Phase is in progress but needs attention
Red< 30%Phase is at risk or has not started
The dashboard aggregates these signals across all phases so you can immediately see which parts of your project need attention.
High-priority tasks that are not yet completed are counted separately as an “at risk” metric on the dashboard. Even if a phase’s overall progress is green, the presence of high-priority incomplete tasks will surface a warning counter on the phase card.

Build docs developers (and LLMs) love