DISCO - U-Net Based Autoencoder Architecture With Dual Input Streams for Skeleton Image Drawing

Soonyong Song, Heechul Bae, Junhee Park; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2021, pp. 2128-2135

Abstract


In this paper, we propose a DISCO, which is a manner of designing autoencoder architecture to process dual input streams for skeletal image generation. The DISCO was designed to be dealing with binary masks and skeletonized images concurrently at the input side. We expected the skeletonized images using traditional thinning algorithms could help to boost skeleton prediction performances. Inside the DISCO architecture, there exist two encoders and a single decoder. Each functional block is stacked with multiple logical layers. We designed that logical layer outputs of encoders transferred corresponding counterpart layers in a decoder referring to U-Net architecture. In addition, we proposed hybrid-type encoder models based on the DISCO architecture to capitalize on the effect of the model ensemble. We demonstrated performances of the DISCO-A and DISCO-B models derived from the proposed architecture in terms of f1-score and loss convergence per each epoch. We confirmed the DISCO-B had produced the best performance under symbolic label usage. In the development phase, our best score reached 0.7386 with 500 epochs.

Related Material


[pdf]
[bibtex]
@InProceedings{Song_2021_ICCV, author = {Song, Soonyong and Bae, Heechul and Park, Junhee}, title = {DISCO - U-Net Based Autoencoder Architecture With Dual Input Streams for Skeleton Image Drawing}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2021}, pages = {2128-2135} }