Detecting Objects in Less Response Time for Processing Multimedia Events in Smart Cities

Asra Aslam; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2022, pp. 2044-2054

Abstract


Due to increase in multimedia traffic in smart cities, we are facing the problem of processing unseen classes in real-time. Existing neural-network based object detectors may support this growing demand of multimedia data but have the limitation of availability of trained classifiers for unseen concepts. This results in a long waiting time for users who want to detect unseen classes. In this paper, we proposed three approaches where we can utilize existing object detection models and can train unseen classes within short training time. Our approaches are based on similarity of unseen classes with seen classes, and availability (presence or absence) of bounding boxes. Our results indicate that the proposed framework can achieve accuracy between 95.14% to 98.53% within response time of 0.01 min to 30 min for seen and partially unseen classes. Moreover we achieve state of the art results (68.78 mAP within 10 min) for unseen classes that have only image-level labels for training and no bounding boxes. Our qualitative results indicate that our approaches can work well for any unseen class (not only for conventional object detection datasets).

Related Material


[pdf]
[bibtex]
@InProceedings{Aslam_2022_CVPR, author = {Aslam, Asra}, title = {Detecting Objects in Less Response Time for Processing Multimedia Events in Smart Cities}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2022}, pages = {2044-2054} }