Much work has been done developing systems of converting black and white images into color. A common method is to use deep learning, but how effective is this process on video? I aim to analyse various deep learning systems on video to gauge their effectiveness and potentially offer some improvements.
To conduct this study, I'll start by comparing various deep learning algorithms made in the past on image colorization to compare how true to the original they are using images that have been convereted to greyscale and comparing the results. From this I'll select some of the best deep learning algorithms to test against video. Again for this test I'll be converting a film into greyscale to compare the generated colors to the original. In this test I'll also be looking at consistency, as the algorithms may generate slightly different colors on different frames, which would not be ideal.
Depending on the results of the survey I may be able to provide some modifications to the best performing algorithm in order to improve performance on video.
For this study I'll be examining one or more of the following deep learning papers:
So far I've found the most success with the tools that the researchers behind Colorful Image Colorization have created. I've been working on getting their machine learning algorithms running on my machine so I work on modifications to improve color stability, but so far I've only been using the algorithms they developed.
I've started work on a system to automatically convert black and white video sources to color. To do this I use ffmpeg to convert between video and image frames, pass the frame through the colorizer, then convert the images back into a video. You can see a manually compiled example above.
This example also demonstrates the problem with colorizers on video, which is color inconsistency. The wall behind the character in the colorized scene seems to flash due to how the colorizer interprets the given frame and this is the problem I'm planning on focusing on next.
For the final report, I'd like to have finished analysing the algorithms on video sources, and have an idea of how true to the original they are and how well they perform on sequential images. I may also have some improvements to be implemented on the existing algorithms to make them better suited to the task. Potentially I'd like to colorize some originally black and white images to see how the algorithms perform on sources that have not been converted to greyscale from color.
For this study I undertook an examination of automated image colorization processes and how they could be applied to video. Automated colorization of single images is a topic that has been covered in the past so I wanted to determine how effective these systems were on a series of similar but slightly different frames like you would find in a video. I examined the tools from the studies: Colorful Image Colorization, Deep Colorization, and Let There Be Color! For most of the testing, I applied these colorizers on the 1916 film The Pawnshop starring Charlie Chaplin. Some example image output can be seen below.
Of the tools I examined, Colorful Image Colorization proved to be the best at estimating the colors in the film so that is the tool I choose to examine in further detail. The Colorful Image Colorization involves 8 layers made up of a collection of repeated convolutional and rectified linear unit layers. An image describing the dimensions of the layers any how they are connected can be found on the right.
The main software requirement for the colorizer is Caffe. Additional requirements for this are numpy, pyplot, skimage, and scipy. For video separation and recombination, ffmpeg is required. The code used in the project is a modification of Colorful Image Colorization and can be found here.
As this project was all about video colorization, it wouldn't be complete without a fully colorized video. To the right you can find the Charlie Chaplin movie which I used for testing fully colorized.
With more time I would have liked to tackle the problem of color strobing which is apparent in my output video. If there was time I wanted to take a look at either training the neural network with the previous frames of the video or use the previous frame to weight the color choices when colorizing the next frame.