Introducing the KITTI dataset

The KITTI Vision Suite benchmark is a dataset for autonomous vehicle research consisting of 6 hours of multi-modal data recorded at 10-100 Hz. It is widely used because it provides detailed documentation and includes datasets prepared for a variety of tasks including stereo matching, optical flow, visual odometry and object detection. The data is open access but requires registration for download.

Download: http://www.cvlibs.net/datasets/kitti/

Sensor Modalities

image-right

The data was taken with a mobile platform (automobile) equiped with the following sensor modalities: RGB Stereo Cameras, Moncochrome Stereo Cameras, 360 Degree Velodyne 3D Laser Scanner and a GPS/IMU Inertial Navigation system

The data is calibrated, synchronized and timestamped providing rectified and raw image sequences divided into the categories ’Road’, ’City’, ’Residential’, ’Campus’ and ’Person’

Sensor Configuration

Data was collected a single automobile (shown above) instrumented with the following configuration of sensors:

Qty	Type	Details
2	PointGray Flea2 grayscale camera (FL2-14S3M-C)	1.4 Megapixels
2	PointGray Flea2 color camera (FL2-14S3C-C)	1.4 Megapixels
1	Velodyne HDL-64E 3D laser scanner	resolution 0.02m/0.09◦ , ∼ 1.3 million points/sec, range: H360◦ V26.8◦ 120 m
1	OXTS RT3003 GPS/IMU	6 axis, 100 Hz, resolution: 0.02m / 0.1◦

Data Description

All sensor readings of a sequence are zipped into a single file named {date}_{drive}.zip, where {date} and {drive} are placeholders for the recording date and the sequence number. Additional to the raw recordings (’raw data’), rectified and synchronized (‘sync_data’) are provided.

{date}/
  |--{date_drive}/
  |    |--{date_drive}.zip
  |    |    |--image_0x/ x={0,..,3}
  |    |    |    |--data/
  |    |    |    |    |--frame_number.png
  |    |    |    |--timestamps.txt
  |    |    |--oxts/
  |    |    |    |--data/
  |    |    |    |    |--frame_number.txt
  |    |    |    |--dataformat.txt
  |    |    |    |--timestamps.txt
  |    |    |--velodyne_points/
  |    |    |    |--data/
  |    |    |    |    |--frame_number.bin
  |    |    |    |--timestamps.txt
  |    |    |    |--timestamps_start.txt
  |    |    |    |--timestamps_end.txt
  |    |--{date_drive}_tracklets.zip
  |    |    |--tracklet_labels.xml
  |--{date}_calib.zip
  |    |--calib_cam_to_cam.txt
  |    |--calib_imu_to_velo.txt
  |    |--calib_velo_to_cam.txt

Timestamps

Timestamps are stored in timestamps.txt and perframe sensor readings are provided in the corresponding data sub-folders. Each line in timestamps.txt is composed of the date and time in hours, minutes and seconds. The Velodyne laser scanner has three timestamp files coresponding to positions in a spin (forward triggers the cameras):

start positions (timestamps_start.txt)
end positions (timestamps_end.txt)
forward positions (timestamps.txt)

Cameras

Color and grayscale images are stored with compression using 8-bit PNG files croped to remove the engine hood and sky and are also provided as rectified images.

Velodyne

For compactness Velodyne scans are stored as floating point binaries with each point stored as (x, y, z) coordinate and a reflectance value (r).

GPS/IMU

For each frame GPS/IMU values including coordinates, altitude, velocities, accelerations, angular rate, accuracies are stored in a text file. Accelerations and angular rates are specified using two coordinate systems, one which is attached to the vehicle body (x, y, z) and one that is mapped to the tangent plane of the earth surface at that location.

Coordinate Systems

The coordinate systems are defined as illustrated in Fig. 1 and Fig. 3, i.e.: where l=left, r=right, u=up, d=down, f=forward

Sensor	Local	Vehicle
Camera	(x, y, z)	(r, d, f)
Velodyne	(x, y, z)	(f, l, u)
GPS/IMU	(x, y, z)	(f, l, u)

image-center