Track Organizers: Yi Liu , Xiao Ma
Most existing action localization datasets are built upon coarse action categories, e.g., “Layup drill in basketball” in ActivityNet rather than “dunk basketball” or “cast basketball”. As such coarse categories are often highly related with background context, their temporal annotations often lack clear boundaries to describe detailed actions.
To fill this gap, we propose a new benchmark, FineAction, which is a large and more-fined video dataset for temporal action localization. Different from the previous datasets, our FineAction contains two distinguished features.
- The definition of action categories is much more fined, and an action can be divided into more subtle atomic actions.
- The annotation rules and procedures are more accurate with cross-check, reducing the boundary uncertainty for temporal action localization.
We expect that such fined dataset can bring new challenges and novel contributions in the research of temporal action localization.