KITTI Dataset Examples

The KITTI dataset is a widely-used benchmark for autonomous driving research. ORB-SLAM3 provides examples for processing KITTI’s stereo and monocular sequences.

Dataset Overview

The KITTI dataset provides:

Stereo camera setup (rectified grayscale, 10 Hz)
Large-scale outdoor environments (urban, residential, highway)
Ground truth from GPS/IMU (for some sequences)
22 stereo sequences for odometry benchmark
Challenging conditions (moving objects, lighting changes)

Key Characteristics

Autonomous Driving

Real-world driving scenarios with moving vehicles and pedestrians.

Large Scale

Long sequences covering several kilometers.

Rectified Stereo

Pre-rectified images ready for stereo processing.

GPS Ground Truth

Accurate position data for evaluation.

Download Instructions

Visit KITTI website

Navigate to: http://www.cvlibs.net/datasets/kitti/eval_odometry.php

Download sequences

You’ll need:

Left camera images (grayscale)
Right camera images (grayscale) - for stereo
Ground truth poses - for evaluation
Calibration files

Download the odometry dataset (sequences 00-21).

Extract the data

# Extract sequences
unzip data_odometry_gray.zip -d ~/Datasets/KITTI/
unzip data_odometry_calib.zip -d ~/Datasets/KITTI/
unzip data_odometry_poses.zip -d ~/Datasets/KITTI/

Directory structure:

KITTI/
├── sequences/
│   ├── 00/
│   │   ├── image_0/     # Left camera
│   │   │   ├── 000000.png
│   │   │   ├── 000001.png
│   │   │   └── ...
│   │   ├── image_1/     # Right camera
│   │   ├── calib.txt    # Calibration
│   │   └── times.txt    # Timestamps
│   ├── 01/
│   └── .../
└── poses/
    ├── 00.txt           # Ground truth
    └── .../

Running Monocular Examples

Process KITTI sequences using only the left camera:

./Examples/Monocular/mono_kitti \
    Vocabulary/ORBvoc.txt \
    Examples/Monocular/KITTI00-02.yaml \
    ~/Datasets/KITTI/sequences/00/

Code Structure

The monocular KITTI example (mono_kitti.cc:108) reads images sequentially:

// Load image from sequence
im = cv::imread(vstrImageFilenames[ni], cv::IMREAD_UNCHANGED);

// Track monocular frame
SLAM.TrackMonocular(im, timestamp);

Sequence-Specific Configurations

KITTI sequences have different calibrations:

Sequences 00-02
Sequences 03
Sequences 04-12

./Examples/Monocular/mono_kitti \
    Vocabulary/ORBvoc.txt \
    Examples/Monocular/KITTI00-02.yaml \
    ~/Datasets/KITTI/sequences/00/

./Examples/Monocular/mono_kitti \
    Vocabulary/ORBvoc.txt \
    Examples/Monocular/KITTI03.yaml \
    ~/Datasets/KITTI/sequences/03/

./Examples/Monocular/mono_kitti \
    Vocabulary/ORBvoc.txt \
    Examples/Monocular/KITTI04-12.yaml \
    ~/Datasets/KITTI/sequences/04/

Different KITTI sequences use different camera calibrations. Always use the appropriate YAML file.

Running Stereo Examples

Process both left and right cameras for accurate depth estimation:

./Examples/Stereo/stereo_kitti \
    Vocabulary/ORBvoc.txt \
    Examples/Stereo/KITTI00-02.yaml \
    ~/Datasets/KITTI/sequences/00/

Why Stereo for KITTI?

Accurate Scale

Stereo provides metric scale without drift, critical for driving applications.

Better Initialization

Instant depth from first frame enables faster startup.

Robust Tracking

Depth constraints improve tracking in feature-poor areas (sky, roads).

Direct Comparison

Easier evaluation against GPS/IMU ground truth.

Image Loading

The stereo example loads pre-rectified image pairs:

// LoadImages reads times.txt and constructs paths
vector<string> vstrImageLeft, vstrImageRight;
vector<double> vTimestamps;
LoadImages(strPathToSequence, vstrImageLeft, vstrImageRight, vTimestamps);

// Load left and right images
imLeft = cv::imread(vstrImageLeft[ni], cv::IMREAD_UNCHANGED);
imRight = cv::imread(vstrImageRight[ni], cv::IMREAD_UNCHANGED);

// Track stereo frame
SLAM.TrackStereo(imLeft, imRight, timestamp);

Typical Use Cases

KITTI sequences represent different driving scenarios:

Urban Sequences

Sequence 00
Sequence 05

Environment: Urban neighborhood

Length: 4541 frames (~7 minutes)
Features: Buildings, parked cars, trees
Difficulty: Moderate

Highway Sequences

Sequence 01
Sequence 02

Environment: Highway

Length: 1101 frames
Features: Guard rails, distant background
Difficulty: Moderate (high speed)

Challenging Sequences

Sequence 08
Sequence 10

Environment: Residential

Length: 4071 frames
Features: Complex loops, similar structures
Difficulty: High (requires loop closure)

Output and Evaluation

Trajectory Format

KITTI examples save trajectories in TUM format by default (mono_kitti.cc:151):

SLAM.SaveKeyFrameTrajectoryTUM("KeyFrameTrajectory.txt");

Format:

timestamp tx ty tz qx qy qz qw

Converting to KITTI Format

For submission to KITTI benchmark, convert to KITTI pose format:

# Each line is a 3x4 transformation matrix (flattened)
r11 r12 r13 tx r21 r22 r23 ty r31 r32 r33 tz

Evaluation with Ground Truth

# Evaluate trajectory against ground truth
python evaluation/evaluate_kitti.py \
    ~/Datasets/KITTI/poses/00.txt \
    KeyFrameTrajectory.txt

Metrics reported:

Translation error (%)
Rotation error (deg/m)
Success rate

Performance Considerations

Speed vs Accuracy

Real-time Processing
High Accuracy

For 10 Hz real-time processing:

ORBextractor.nFeatures: 1000
ORBextractor.scaleFactor: 1.2
ORBextractor.nLevels: 8

For best accuracy (slower):

ORBextractor.nFeatures: 2000
ORBextractor.scaleFactor: 1.2  
ORBextractor.nLevels: 8

Hardware Requirements

KITTI sequences are long and computationally demanding. Recommended hardware:

CPU: Intel i7 or better
RAM: 8GB minimum
Storage: 100GB for full dataset

Configuration Files

KITTI uses pinhole camera parameters:

# Camera calibration (example from KITTI00-02.yaml)
Camera.type: "PinHole"

Camera.fx: 718.856
Camera.fy: 718.856
Camera.cx: 607.1928
Camera.cy: 185.2157

# Distortion parameters (already rectified)
Camera.k1: 0.0
Camera.k2: 0.0
Camera.p1: 0.0
Camera.p2: 0.0

# Stereo baseline (meters)
Camera.bf: 387.5744  # baseline * fx

# Image resolution
Camera.width: 1241
Camera.height: 376

Troubleshooting

Scale Drift in Monocular

Monocular SLAM has no absolute scale:

Scale drift is expected over long sequences
Use stereo mode for metric scale
Loop closures help reduce drift

Tracking Lost on Highway

Highways have fewer features:

Increase ORBextractor.nFeatures
Use stereo mode for depth constraints
Guard rails and lane markings provide tracking

Moving Object Interference

Cars and pedestrians can affect SLAM:

ORB-SLAM3 is designed to handle outliers
Most moving objects are automatically rejected
Some sequences may have temporary tracking issues

Loop Closure Not Working

For sequences with loops (e.g., 08):

Ensure vocabulary loaded correctly
Allow sufficient processing time
Check that place recognition is enabled

Comparing Results

Expected Performance

Typical results on KITTI sequences:

Sequence	Length	Translation Error	Rotation Error
00	3.7 km	0.75%	0.003 deg/m
01	1.0 km	1.2%	0.004 deg/m
02	5.1 km	0.9%	0.003 deg/m
05	2.2 km	0.8%	0.003 deg/m

Results shown are for stereo mode. Monocular mode typically has higher drift.

Advanced: Custom KITTI Data

To use your own KITTI-format data:

Organize images

my_sequence/
├── image_0/
│   ├── 000000.png
│   └── ...
├── image_1/           # For stereo
│   ├── 000000.png  
│   └── ...
└── times.txt

Create times.txt

One timestamp per line (seconds):

0
1
2
...

Calibrate camera

Create a custom YAML file with your camera parameters. See Camera Configuration.

Run example

./Examples/Stereo/stereo_kitti \
    Vocabulary/ORBvoc.txt \
    my_settings.yaml \
    my_sequence/

Next Steps

EuRoC Dataset

Try indoor sequences with IMU

Camera Calibration

Calibrate your own stereo rig

Loop Closure

Understanding place recognition

Custom Datasets

Process your own sequences

Get Started

Core Concepts

Guides

Examples

Advanced

​Dataset Overview

​Key Characteristics

Autonomous Driving

Large Scale

Rectified Stereo

GPS Ground Truth

​Download Instructions

​Running Monocular Examples

​Code Structure

​Sequence-Specific Configurations

​Running Stereo Examples

​Why Stereo for KITTI?

Accurate Scale

Better Initialization

Robust Tracking

Direct Comparison

​Image Loading

​Typical Use Cases

​Urban Sequences

​Highway Sequences

​Challenging Sequences

​Output and Evaluation

​Trajectory Format

​Converting to KITTI Format

​Evaluation with Ground Truth

​Performance Considerations

​Speed vs Accuracy

​Hardware Requirements

​Configuration Files

​Troubleshooting

​Comparing Results

​Expected Performance

​Advanced: Custom KITTI Data

​Next Steps

EuRoC Dataset

Camera Calibration

Loop Closure

Custom Datasets

Build docs developers (and LLMs) love

Dataset Overview

Key Characteristics

Download Instructions

Running Monocular Examples

Code Structure

Sequence-Specific Configurations

Running Stereo Examples

Why Stereo for KITTI?

Image Loading

Typical Use Cases

Urban Sequences

Highway Sequences

Challenging Sequences

Output and Evaluation

Trajectory Format

Converting to KITTI Format

Evaluation with Ground Truth

Performance Considerations

Speed vs Accuracy

Hardware Requirements

Configuration Files

Troubleshooting

Comparing Results

Expected Performance

Advanced: Custom KITTI Data

Next Steps