Skip to main content
The KITTI dataset is a widely-used benchmark for autonomous driving research. ORB-SLAM3 provides examples for processing KITTI’s stereo and monocular sequences.

Dataset Overview

The KITTI dataset provides:
  • Stereo camera setup (rectified grayscale, 10 Hz)
  • Large-scale outdoor environments (urban, residential, highway)
  • Ground truth from GPS/IMU (for some sequences)
  • 22 stereo sequences for odometry benchmark
  • Challenging conditions (moving objects, lighting changes)

Key Characteristics

Autonomous Driving

Real-world driving scenarios with moving vehicles and pedestrians.

Large Scale

Long sequences covering several kilometers.

Rectified Stereo

Pre-rectified images ready for stereo processing.

GPS Ground Truth

Accurate position data for evaluation.

Download Instructions

2

Download sequences

You’ll need:
  • Left camera images (grayscale)
  • Right camera images (grayscale) - for stereo
  • Ground truth poses - for evaluation
  • Calibration files
Download the odometry dataset (sequences 00-21).
3

Extract the data

# Extract sequences
unzip data_odometry_gray.zip -d ~/Datasets/KITTI/
unzip data_odometry_calib.zip -d ~/Datasets/KITTI/
unzip data_odometry_poses.zip -d ~/Datasets/KITTI/
Directory structure:
KITTI/
├── sequences/
│   ├── 00/
│   │   ├── image_0/     # Left camera
│   │   │   ├── 000000.png
│   │   │   ├── 000001.png
│   │   │   └── ...
│   │   ├── image_1/     # Right camera
│   │   ├── calib.txt    # Calibration
│   │   └── times.txt    # Timestamps
│   ├── 01/
│   └── .../
└── poses/
    ├── 00.txt           # Ground truth
    └── .../

Running Monocular Examples

Process KITTI sequences using only the left camera:
./Examples/Monocular/mono_kitti \
    Vocabulary/ORBvoc.txt \
    Examples/Monocular/KITTI00-02.yaml \
    ~/Datasets/KITTI/sequences/00/

Code Structure

The monocular KITTI example (mono_kitti.cc:108) reads images sequentially:
// Load image from sequence
im = cv::imread(vstrImageFilenames[ni], cv::IMREAD_UNCHANGED);

// Track monocular frame
SLAM.TrackMonocular(im, timestamp);

Sequence-Specific Configurations

KITTI sequences have different calibrations:
./Examples/Monocular/mono_kitti \
    Vocabulary/ORBvoc.txt \
    Examples/Monocular/KITTI00-02.yaml \
    ~/Datasets/KITTI/sequences/00/
Different KITTI sequences use different camera calibrations. Always use the appropriate YAML file.

Running Stereo Examples

Process both left and right cameras for accurate depth estimation:
./Examples/Stereo/stereo_kitti \
    Vocabulary/ORBvoc.txt \
    Examples/Stereo/KITTI00-02.yaml \
    ~/Datasets/KITTI/sequences/00/

Why Stereo for KITTI?

Accurate Scale

Stereo provides metric scale without drift, critical for driving applications.

Better Initialization

Instant depth from first frame enables faster startup.

Robust Tracking

Depth constraints improve tracking in feature-poor areas (sky, roads).

Direct Comparison

Easier evaluation against GPS/IMU ground truth.

Image Loading

The stereo example loads pre-rectified image pairs:
// LoadImages reads times.txt and constructs paths
vector<string> vstrImageLeft, vstrImageRight;
vector<double> vTimestamps;
LoadImages(strPathToSequence, vstrImageLeft, vstrImageRight, vTimestamps);

// Load left and right images
imLeft = cv::imread(vstrImageLeft[ni], cv::IMREAD_UNCHANGED);
imRight = cv::imread(vstrImageRight[ni], cv::IMREAD_UNCHANGED);

// Track stereo frame
SLAM.TrackStereo(imLeft, imRight, timestamp);

Typical Use Cases

KITTI sequences represent different driving scenarios:

Urban Sequences

Environment: Urban neighborhood
  • Length: 4541 frames (~7 minutes)
  • Features: Buildings, parked cars, trees
  • Difficulty: Moderate

Highway Sequences

Environment: Highway
  • Length: 1101 frames
  • Features: Guard rails, distant background
  • Difficulty: Moderate (high speed)

Challenging Sequences

Environment: Residential
  • Length: 4071 frames
  • Features: Complex loops, similar structures
  • Difficulty: High (requires loop closure)

Output and Evaluation

Trajectory Format

KITTI examples save trajectories in TUM format by default (mono_kitti.cc:151):
SLAM.SaveKeyFrameTrajectoryTUM("KeyFrameTrajectory.txt");
Format:
timestamp tx ty tz qx qy qz qw

Converting to KITTI Format

For submission to KITTI benchmark, convert to KITTI pose format:
# Each line is a 3x4 transformation matrix (flattened)
r11 r12 r13 tx r21 r22 r23 ty r31 r32 r33 tz

Evaluation with Ground Truth

# Evaluate trajectory against ground truth
python evaluation/evaluate_kitti.py \
    ~/Datasets/KITTI/poses/00.txt \
    KeyFrameTrajectory.txt
Metrics reported:
  • Translation error (%)
  • Rotation error (deg/m)
  • Success rate

Performance Considerations

Speed vs Accuracy

For 10 Hz real-time processing:
ORBextractor.nFeatures: 1000
ORBextractor.scaleFactor: 1.2
ORBextractor.nLevels: 8

Hardware Requirements

KITTI sequences are long and computationally demanding. Recommended hardware:
  • CPU: Intel i7 or better
  • RAM: 8GB minimum
  • Storage: 100GB for full dataset

Configuration Files

KITTI uses pinhole camera parameters:
# Camera calibration (example from KITTI00-02.yaml)
Camera.type: "PinHole"

Camera.fx: 718.856
Camera.fy: 718.856
Camera.cx: 607.1928
Camera.cy: 185.2157

# Distortion parameters (already rectified)
Camera.k1: 0.0
Camera.k2: 0.0
Camera.p1: 0.0
Camera.p2: 0.0

# Stereo baseline (meters)
Camera.bf: 387.5744  # baseline * fx

# Image resolution
Camera.width: 1241
Camera.height: 376

Troubleshooting

Monocular SLAM has no absolute scale:
  • Scale drift is expected over long sequences
  • Use stereo mode for metric scale
  • Loop closures help reduce drift
Highways have fewer features:
  • Increase ORBextractor.nFeatures
  • Use stereo mode for depth constraints
  • Guard rails and lane markings provide tracking
Cars and pedestrians can affect SLAM:
  • ORB-SLAM3 is designed to handle outliers
  • Most moving objects are automatically rejected
  • Some sequences may have temporary tracking issues
For sequences with loops (e.g., 08):
  • Ensure vocabulary loaded correctly
  • Allow sufficient processing time
  • Check that place recognition is enabled

Comparing Results

Expected Performance

Typical results on KITTI sequences:
SequenceLengthTranslation ErrorRotation Error
003.7 km0.75%0.003 deg/m
011.0 km1.2%0.004 deg/m
025.1 km0.9%0.003 deg/m
052.2 km0.8%0.003 deg/m
Results shown are for stereo mode. Monocular mode typically has higher drift.

Advanced: Custom KITTI Data

To use your own KITTI-format data:
1

Organize images

my_sequence/
├── image_0/
│   ├── 000000.png
│   └── ...
├── image_1/           # For stereo
│   ├── 000000.png  
│   └── ...
└── times.txt
2

Create times.txt

One timestamp per line (seconds):
0.0
0.1
0.2
...
3

Calibrate camera

Create a custom YAML file with your camera parameters. See Camera Configuration.
4

Run example

./Examples/Stereo/stereo_kitti \
    Vocabulary/ORBvoc.txt \
    my_settings.yaml \
    my_sequence/

Next Steps

EuRoC Dataset

Try indoor sequences with IMU

Camera Calibration

Calibrate your own stereo rig

Loop Closure

Understanding place recognition

Custom Datasets

Process your own sequences

Build docs developers (and LLMs) love