Project 1 - Images of the Russian Empire

In class, we learned about Sergei Mikhailovich Prokudin-Gorskii, an early photographer who tried to take "color photographs" before color photography was invented. He captured three filtered exposures of the same scene onto a glass plate, representing the B, G, and R channels of the image. While he was unable to combine these images into a single colored image in his time, the Library of Congress saved his glass plate negatives and combined them digitally in the 21st century.

Project 1 does just that - using the original glass plate negatives from Prokudin-Gorskii, we attempt to combine the three filtered exposures into a single, cohesive, colored image.

Approach

I first tackled the smaller, .jpg images provided (cathedral.jpg, monastery.jpg, tobolsk.jpg) as shown below:

Cathedral
Monastery
Tobolsk

Following the guidelines, I divided each glass plate negatives into three equal sections corresponding to the B, G, and R channels. To align the images, I attempted to find the optimal x, y offsets for each channel relative to the B channel. Optimality can be defined by several metrics, such as the L2 norm between images (which did not work well in my case), or structural similarity index (SSIM) from the scikit-image library.

I chose the SSIM metric, and used a displacement search approach over the range of -15 to 15 pixels in both the x and y directions. Whichever displacement yielded the highest SSIM score was chosen as the optimal offset. Additionally, I used a naive 10% crop of the images to remove the black borders, which helped improve the SSIM score.

Cathedral

Cathedral

Monastery

Monastery

Tobolsk

Tobolsk

However, this naive approach does not scale well for the massive .tif images. Here, we can extend the original approach using a pyramid search algorithm, which starts with running displacement search on coarser versions of the images and iteratively refining the search on finer versions of the images with a smaller displacement search range.

Specifically, for each image, I save a list of images, starting with the original, then scaling it down by a factor of 2 repeatedly until the image's height would be less than 100 pixels. I then run the displacement search on the smallest image with the original range of -15 to 15 pixels. I then scale the displacement by a factor of 2 and run the displacement search on the next larger image with a range of -2 to 2 pixels. I repeat this process until I reach the original image, at which point I have the optimal displacement for the original image.

Results

Cathedral

Cathedral - G: (2, 5), R: (3, 12)

Monastery

Monastery - G: (0, 3), R: (2, 3)

Tobolsk

Tobolsk - G: (3, 3), R: (3, 6)

Camel

Camel (New Example) - G: (27, 46), R: (40, 104)

Church

Church - G: (4, 25), R: (-4, 58)

Emir

Emir - G: (23, 50), R: (40, 105)

Harvesters

Harvesters - G: (16, 59), R: (13, 123)

Icon

Icon - G: (17, 40), R: (23, 89)

Lady

Lady - G: (9, 56), R: (12, 119)

Melons

Melons - G: (10, 81), R: (13, 177)

Mill

Mill (New Example) - G: (25, 67), R: (34, 131)

Onion Church

Onion Church - G: (28, 51), R: (35, 108)

Sculpture

Sculpture - G: (-11, 33), R: (35, 108)

Self Portrait

Self Portrait - G: (29, 78), R: (37, 175)

Three Generations

Three Generations - G: (17, 55), R: (11, 113)

Train

Train - G: (7, 41), R: (31, 85)