iColoriT: Towards Propagating Local Hint to the Right Region in Interactive Colorization by Leveraging Vision Transformer

WACV 2023

Jooyeol Yun*
blizzard072@kaist.ac.kr
Sanghyeon Lee*
shlee6825@kaist.ac.kr
Minho Park*
m.park@kaist.ac.kr
Jaegul Choo
jchoo@kaist.ac.kr
Korea Advanced Institute of Science and Technology (KAIST)
* indicates equal contrubutions.
Responsive image
iColoriT empowers users to enliven grayscale images with only a few clicks.

Abstract

Point-interactive image colorization aims to colorize grayscale images when a user provides the colors for specific locations. It is essential for point-interactive colorization methods to appropriately propagate user-provided colors (i.e., user hints) in the entire image to obtain a reasonably colorized image with minimal user effort. However, existing approaches often produce partially colorized results due to the inefficient design of stacking convolutional layers to propagate hints to distant relevant regions. To address this problem, we present iColoriT, a novel point-interactive colorization Vision Transformer capable of propagating user hints to relevant regions, leveraging the global receptive field of Transformers. The self-attention mechanism of Transformers enables iColoriT to selectively colorize relevant regions with only a few local hints. Our approach colorizes images in real-time by utilizing pixel shuffling, an efficient upsampling technique that replaces the decoder architecture. Also, in order to mitigate the artifacts caused by pixel shuffling with large upsampling ratios, we present the local stabilizing layer. Extensive quantitative and qualitative results demonstrate that our approach highly outperforms existing methods for point-interactive colorization, producing accurately colorized images with a user's minimal effort.


Paper

[Paper PDF] [arXiv] [Github]

WACV, 2023.
Jooyeol Yun, Sanghyeon Lee, Minho Park, and Jaegul Choo.
"iColoriT: Towards Propagating Local Hint to the Right Region in Interactive Colorization by Leveraging Vision Transformer"


Method overview

Responsive image
An overview of our proposed method.

Qualitative Comparisons

    (Left) User hints   (Center) Zhang et al.   (Right) iColoriT

    Diverse Colorization Results


      Effect of the Local Stabilizing Layer

      Responsive image
      Inconsistent colorization results observed in images produced without the local stabilizing layer.

      Visualizing the Internal Representation


        Limitation

        Responsive image
        iColoriT may fail to colorize small/detailed regions.

        Citation

        @InProceedings{Yun_2023_WACV,
            author    = {Yun, Jooyeol and Lee, Sanghyeon and Park, Minho and Choo, Jaegul},
            title     = {iColoriT: Towards Propagating Local Hints to the Right Region in Interactive Colorization by Leveraging Vision Transformer},
            booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
            month     = {January},
            year      = {2023},
            pages     = {1787-1796}
        }