Project 1 - Images of the Russian Empire
In class, we learned about Sergei Mikhailovich Prokudin-Gorskii, an early photographer who tried to take "color photographs" before color photography was invented. He captured three filtered exposures of the same scene onto a glass plate, representing the B, G, and R channels of the image. While he was unable to combine these images into a single colored image in his time, the Library of Congress saved his glass plate negatives and combined them digitally in the 21st century.
Project 1 does just that - using the original glass plate negatives from Prokudin-Gorskii, we attempt to combine the three filtered exposures into a single, cohesive, colored image.
Approach
I first tackled the smaller, .jpg
images provided
(cathedral.jpg
, monastery.jpg
,
tobolsk.jpg
) as shown below:
Following the guidelines, I divided each glass plate negatives
into three equal sections corresponding to the B, G, and R
channels. To align the images, I attempted to find the
optimal x
, y
offsets for each channel
relative to the B channel. Optimality can be defined by
several metrics, such as the L2 norm between images (which
did not work well in my case), or structural similarity index (SSIM)
from the scikit-image
library.
I chose the SSIM metric, and used a displacement search
approach over the range of -15
to 15
pixels in both the x
and y
directions.
Whichever displacement yielded the highest SSIM score was
chosen as the optimal offset. Additionally, I used a naive
10% crop of the images to remove the black borders, which
helped improve the SSIM score.
Cathedral
Monastery
Tobolsk
However, this naive approach does not scale well for the massive .tif
images. Here, we can extend the original approach using a pyramid search
algorithm, which starts with running displacement search on
coarser versions of the images and
iteratively refining the search on finer versions of the images
with a smaller displacement search range.
Specifically, for each image, I save a list of images, starting
with the original, then scaling it down by a factor of 2
repeatedly until the image's height would be less than 100 pixels.
I then run the displacement search on the smallest image with
the original range of -15
to 15
pixels.
I then scale the displacement by a factor of 2 and run the
displacement search on the next larger image with a range of
-2
to 2
pixels. I repeat this process
until I reach the original image, at which point I have the
optimal displacement for the original image.
Results
Cathedral - G: (2, 5), R: (3, 12)
Monastery - G: (0, 3), R: (2, 3)
Tobolsk - G: (3, 3), R: (3, 6)
Camel (New Example) - G: (27, 46), R: (40, 104)
Church - G: (4, 25), R: (-4, 58)
Emir - G: (23, 50), R: (40, 105)
Harvesters - G: (16, 59), R: (13, 123)
Icon - G: (17, 40), R: (23, 89)
Lady - G: (9, 56), R: (12, 119)
Melons - G: (10, 81), R: (13, 177)
Mill (New Example) - G: (25, 67), R: (34, 131)
Onion Church - G: (28, 51), R: (35, 108)
Sculpture - G: (-11, 33), R: (35, 108)
Self Portrait - G: (29, 78), R: (37, 175)
Three Generations - G: (17, 55), R: (11, 113)
Train - G: (7, 41), R: (31, 85)