FineAction Dataset

FineAction: A Fined Video Dataset for Temporal Action Localization

  Yi Liu     Limin Wang     Xiao Ma     Yali Wang     Yu Qiao  

MMLAB @ Shenzhen Institute of Advanced Technology

MCG Group @ Nanjing University


On the existing benchmark datasets, THUMOS14 and ActivityNet, temporal action localization techniques have achieved great success. However, there are still existing some problems, such as the source of the action is too single, there are only sports categories in THUMOS14, coarse instances with uncertain boundaries in ActivityNet and HACS Segments interfering with proposal generation and behavior prediction.To take temporal action localization to a new level, we develop FineAction, a new large-scale fined video dataset collected from existing video datasets and web videos. Overall, this dataset contains 139K fined action instances densely annotated in almost 17K untrimmed videos spanning 106 action categories. FineAction has a more fined definition of action categories and high-quality annotations to reduce the boundary uncertainty compared to the existing action localization datasets. We systematically investigate representative methods of temporal action localization on our dataset and obtain some interesting findings with further analysis. Experimental results reveal that our FineAction brings new challenges for action localization on fined and multi-label instances with shorter duration. This dataset will be public in the future and we hope our FineAction could advance research towards temporal action localization.

Dataset Statitics

Our FineAction is a large-scale dataset that is suitable for training deep learning models, which contains 103,324 instances for a total of 705 video hours. As a consequence, the number of instances in FineAction is as great as 6.17 per video and 975 per category.

  1. Comparing FineAction with existing datasets for temporal action localization. comparison with other dataset
  2. Number of instances per category for FineAction. comparison with other dataset
  3. Comparison of datasets: the number of instances (a) and categories per video (c). The distribution of duration of instances (b) and overlap (d) on FineAction. number of instances

Experiment Results

  1. Comparisons with state-of-the-art proposal generation methods on our FineAction in terms of AR@AN, where SNMS stands forSoft-NMS. Comparison of SOTA methods
  2. Performance evaluation of state-of-the-art methods on our FineAction in terms of mAP at IoU thresholds from 0.3 to 0.7. Comparison between SlowFast and SlowOnly


Please refer to the competition page for more information.