Abstract Image colorization is inherently an ill-posed problem with multi-modal uncertainty. Previous methods leverage the deep neural network to map input grayscale images to plausible color outputs directly. Although these learning-based methods have shown impressive performance, they usually fail on the input images that contain multiple objects. The leading cause is that existing colorization models always do learning and colorization on the whole image. In the absence of a clear figure-ground separation, these models cannot effectively locate and learn meaningful semantics at object level. In this paper, we propose a novel deep learning framework to achieve instance-aware colorization. Our network architecture leverages an off-the-shelf object detector to obtain cropped object images, which are fed to an instance colorization network to extract object-level features. The full-image features are extracted with a similar network and then fused with object-level features via a fusion module to predict the final colors. Both colorization networks and fusion modules are learned from a large-scale dataset. Experimental results show that our work outperforms existing methods on different quality metrics and achieves state-of-the-art performance on image colorization.