Toward Improving the Visual Characterization of Sport Activities With Abstracted Scene Graphs
We present techniques for abstracting relevant information from scene graph features to improve action recognition in sports videos. Feature representation with relevant information can dramatically increase machine learning's utility across many tasks. Despite the advantages of incorporating objects and relations as building blocks of semantic information, we still encounter too many irrelevant objects and relations in sports videos, adding uncertainty to the classifiers. This paper describes four fundamentally different scene abstraction techniques, each searching for the relevant information within aggregated features from pixel-level to object-level. In each method, we formulate relevancy through co-occurrence statistics, semantic similarity, feature decomposition, and correlation-based mapping and evaluate each technique's efficacy through performance gains in action recognition and decay rate of training loss. We demonstrate that by creating a relevant and more concise knowledge representation, we improve performance (mAP) of action recognition in sports by 26.6% and achieve faster converging models due to higher representation power.