COCO2017 url |
Common |
The COCO dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.It has been updated for several editions, and coco2017 is widely used. In 2017, the training/validation split was 118K/5K and test set is a subset of 41K images of the 2015 test set. |
Baidu Netdisk (提取码:bcmm) train2017.zip (18G) val2017.zip (1G) annotations_trainval2017.zip (241MB) |
✓ |
data_source=dict( type='DetSourceCoco2017', path='{root path}', download=True, split='train') |
LICENSE |
VOC2007 url |
Common |
PASCAL VOC 2007 is a dataset for image recognition consisting of 20 object categories. Each image in this dataset has pixel-level segmentation annotations, bounding box annotations, and object class annotations. |
VOCtrainval_06-Nov-2007.tar (439MB) |
✓ |
data_source=dict( type='DetSourceVOC2007', path='{root path}', download=True, split='train') |
|
VOC2012 url |
Common |
From 2009 to 2011, the amount of data is still growing on the basis of the previous year's dataset, and from 2011 to 2012, the amount of data used for classification, detection and person layout tasks does not change. Mainly for segmentation and action recognition, improve the corresponding data subsets and label information. |
Baidu Netdisk (提取码:ro9f) VOCtrainval_11-May-2012.tar (2G) |
✓ |
data_source=dict( type='DetSourceVOC2012', path='{root path}', download=True, split='train') |
LICENSE |
LVIS url |
Common |
LVIS uses the COCO 2017 train, validation, and test image sets. If you have already downloaded the COCO images, you only need to download the LVIS annotations. LVIS val set contains images from COCO 2017 train in addition to the COCO 2017 val split. |
Baidu Netdisk (提取码:8ief) refer to coco |
✓ |
data_source=dict( type='DetSourceLvis', path='{root path}', download=True, split='train') |
LICENSE |
Object365 url |
Common |
Objects365 is a brand new dataset, designed to spur object detection research with a focus on diverse objects in the Wild. 365 categories, 2 million images, 30 million bounding boxes. |
refer to data-set-detail |
✓ |
data_source=dict( type='DetSourceObjects365', ann_file='{annotation file path} ', imp_prefix = '{images file root path}', pipeline=[{pipeline parameter}]) |
|
CrowdHuman url |
Common |
CrowdHuman is a benchmark dataset to better evaluate detectors in crowd scenarios. The CrowdHuman dataset is large, rich-annotated and contains high diversity. CrowdHuman contains 15000, 4370 and 5000 images for training, validation, and testing, respectively. There are a total of 470K human instances from train and validation subsets and 23 persons per image, with various kinds of occlusions in the dataset. Each human instance is annotated with a head bounding-box, human visible-region bounding-box and human full-body bounding-box. |
refer to crowdhuman |
✓ |
data_source=dict( type='DetSourceCrowdHuman', ann_file='{annotation file path} ', imp_prefix = '{images file root path}', gt_op='vbox') |
|
Openimages url |
Common |
Open Images is a dataset of ~9 million URLs to images that have been annotated with image-level labels and bounding boxes spanning thousands of classes. |
refer to cvdfoundation/open-images-dataset |
|
|
|
WIDER FACE url |
Face |
The WIDER FACE dataset contains 32,203 images and labels 393,703 faces with a high degree of variability in scale, pose and occlusion. The database is split into training (40%), validation (10%) and testing (50%) set. Besides, the images are divided into three levels (Easy ⊆ Medium ⊆ Hard) according to the difficulties of the detection. |
WIDER Face Training Images [Google Drive] [Tencent Drive] (1.36GB) WIDER Face Validation Images [Google Drive] [Tencent Drive] (345.95MB) WIDER Face Testing Images [Google Drive] [Tencent Drive] (1.72GB) Face annotations (3.6MB) |
✓ |
data_source=dict( type='DetSourceWiderFace', ann_file='{annotation file path} ', imp_prefix = '{images file root path}') |
|
DeepFashion url |
Clothing |
The DeepFashion is a large-scale clothes database. It contains over 800,000 diverse fashion images ranging from well-posed shop images to unconstrained consumer photos. Second, DeepFashion is annotated with rich information of clothing items. Each image in this dataset is labeled with 50 categories, 1,000 descriptive attributes, bounding box and clothing landmarks. Third, DeepFashion contains over 300,000 cross-pose/cross-domain image pairs. |
Category and Attribute Prediction Benchmark: [Download Page] In-shop Clothes Retrieval Benchmark: [Download Page] Consumer-to-shop Clothes Retrieval Benchmark: [Download Page] Fashion Landmark Detection Benchmark: [Download Page] |
|
|
|
Fruit Images url |
Fruit |
Containing labelled fruit images to train object detection systems. 240 images in train folder. 60 images in test folder.It contains only 3 different fruits: Apple,Banana,Orange. |
archive.zip (30MB) |
✓ |
data_source=dict( type='DetSourceFruit', path=' {data root path} ') |
|
Oxford-IIIT Pet url |
Animal |
The Oxford-IIIT Pet Dataset is a 37 category pet dataset with roughly 100 images for each class created by the Visual Geometry Group at Oxford. The images have large variations in scale, pose and lighting. All images have an associated ground truth annotation of the breed, head ROI, and pixel level trimap segmentation. |
archive.zip (818MB) |
✓ |
data_source=dict( type='DetSourcePet', path=' {annotation file path} ') |
|
Arthropod Taxonomy Orders url |
Animal |
The ArTaxOr data set covers arthropods, which includes insects, spiders, crustaceans, centipedes, millipedes etc. There are more than 1.3 million species of arthropods described. The dataset consists of images of arthropods in jpeg format and object boundary boxes in json format. There are between one and 50 objects per image. |
archive.zip (12GB) |
✓ |
data_source=dict( type='DetSourceArtaxor', path=' {data root path} ') |
|
African Wildlife url |
Animal |
Four animal classes commonly found in nature reserves in South Africa are represented in this data set: buffalo, elephant, rhino and zebra. This data set contains at least 376 images for each animal. Each example in the data set consists of a jpg image and a txt label file. The images have differing aspect ratios and contain at least one example of the specified animal class. The txt file contains a list of detectable instances on separate lines of the class in the YOLOv3 labeling format. |
archive.zip (469MB) |
✓ |
data_source=dict( type='DetSourceAfricanWildlife', path=' {data root path} ') |
|
AI-TOD航空图 url |
Aerial (small objects) |
AI-TOD contains 700,621 objects across 8 categories in 28,036 aerial images. Compared with existing object detection datasets in aerial images, the average size of objects in AI-TOD is about 12.8 pixels, which is much smaller than other datasets. |
download url (22.95GB) |
|
|
|
TinyPerson url |
Person (small objects) |
There are 1610 labeled and 759 unlabeled images in TinyPerson (both mostly from the same video set), for a total of 72651 annotations. |
download url (1.6GB) |
✓ |
data_source=dict( type='DetSourceTinyPerson', ann_file='{annotation file path} ', imp_prefix = '{images file root path}', pipeline=[{pipeline parameter}]) |
|
WiderPerson url |
Person (Dense pedestrian detection) |
The WiderPerson dataset is a benchmark dataset for pedestrian detection in the wild, with images selected from a wide range of scenes, no longer limited to traffic scenes. We selected 13,382 images and annotated about 400K annotations with various occlusions. |
download url (969.72MB) |
✓ |
data_source=dict( type='DetSourceWiderPerson', path=' {annotation file path} ') |
|
Caltech Pedestrian Dataset url |
Person |
The Caltech Pedestrian dataset consists of about 10 hours of 640x480 30Hz video taken from vehicles driving through regular traffic in an urban environment. About 250,000 frames (in 137 roughly minute-long clips) were annotated for a total of 350,000 bounding boxes and 2300 unique pedestrians. Annotations include temporal correspondence between bounding boxes and detailed occlusion labels. |
download url (1.98GB) |
|
|
|
DOTA url |
Aerial |
DOTA is a large-scale dataset for object detection in aerial images. It can be used to develop and evaluate object detectors in aerial images. The images are collected from different sensors and platforms. Each image is of the size in the range from 800 × 800 to 20,000 × 20,000 pixels and contains objects exhibiting a wide variety of scales, orientations, and shapes. |
download url (156.33GB) |
|
|
|