A group of research students had tried to use a 2D discrete Fourier Transform to characterize the pattern of repeating protein units on a bacteria surface layer image. Since their spectral graph looked unusually messy and their pattern estimates seemed very wrong, my research professor asked me to take a crack at interpreting and cleaning it. This was a neat application of understanding the mathematical assumptions of a technique to properly isolate and interpret results.
Table of Contents
1 THE HOT-PEANUTS CHALLENGE
In 2015-2016, Professor Jean Huang and her research group cultured a phototrophic bacteria from the Little Sippewissett Salt Marsh. They nicknamed it ‘Hot Peanuts’ based on its interesting shape and growth at high temperatures. They wanted to study and publish their findings about its physiology and metabolism in a paper. The group performed a wide variety of analysis, including growth studies, 16sRNA analysis, and S-layer protein dissociation and identification.
As part of their analysis, the researchers wanted to know the pattern and size of the S-layer subunits. To do this, they took a transmission electron microscopy (TEM) on a ‘Hot Peanut’ sample stained with tungsten at Boston University, and attempted to do FFT analysis on the image to discern its dominant periodic patterns.
From their FFT magnitude plot, the 2015-2016 research group could tell that their S-layer was probably hexagonal, but they held back on publishing because their FFT plot looked unusually messy and they did not know how to read size information from the magnitude plot.
Fortunately, I figured it out. See my post on 1D and 2D Fourier Transforms to learn more about the basic concepts (magnitude, phase, shifting, log transforms). The rest of this blog post will use the bacteria S-layer as a case-study for performing image analysis with Fourier Transforms.
2 INTRO TO BACTERIA S-LAYERS
Most prokaryotic cells are encapsulated by a surface layer (S-layer) consisting of repeating units of S-layer proteins. These S-layers protect cells from the outside, provide mechanical stability, and play roles in spreading disease.
The S-layer arrangement has a pattern of lattice symmetry with a center-to-center subunit spacing ranging from 3 to 35 nm. The patterns are named by the number of monomers involved to make a repeating unit. The most common is hexagonal symmetry (p3, p6), but oblique (p1, p2) and square symmetry (p4) have been observed as well.
3 FFT WITH PYTHON CODE
Packages used in analysis:
import matplotlib.pyplot as plt
import numpy as np
from scipy.fftpack import fft2, fftshift, ifft2, ifftshift
from skimage import img_as_float
from skimage.color import rgb2gray
from skimage.filters import window
How to get image, image’s DFT magnitude plot, and log transformed magnitude plot:
## image
image = img_as_float(plt.imread("2A.tif"))
## take 2D DFT, center it, and get magnitude
image_f = np.abs(fftshift(fft2(image)))
## log transform to compress big values into smaller range
## lets us see more information in the magnitude plot
image_f_log = np.log(1 + image_f)
4 CLEANING AND INTERPRETING THE 2D FFT
Step 1. Remove Vertical and Horizontal Pattern Noise
When the FFT algorithm looks at an image, it assumes the image is one period of an infinitely repeating spectrum:
The vertical and horizontal leakage is caused by discontinuities at the seams where the images join. The patterns are caused by the edge and shadows where the bacteria and the white background meet, and the sharp white box of the scale. We don’t care about these artificial patterns. Covering these areas with a uniform neutral grey color reduces the magnitude of the noise frequencies.
The edge mismatches at the seams of the images still produce sharp horizontal and vertical periodic patterns that the FFT picks up on, but overall the noise has gone down and the hexagonal rings are more evident.
To reduce the frequencies of the edge discontinuities, I can use a windowing function. Windowing smoothly reduces the amplitude of the signal as it reaches the edges, removing the effect of the artificial discontinuity from the FFT. I use the Hann windowing function in this analysis.
## windowed image
wimage = image * window('hann', image.shape)
wimage_f = np.abs(fftshift(fft2(wimage)))
Step 2. Identify Key Periodic Frequencies
The FFT magnitude plot clearly shows two hexagonal rings of bright points:
To identify the key frequencies I need to find the distance from the brightest points of the log magnitude plot to the center of the plot. I threshold by log of pixel intensity to create a mask of dominant frequency points (keep points with a log pixel intensity greater than 4).
thresh_min = 4
binary_min = np.log(1 + wimage_f) > thresh_min
The mask contains some points from the inner and outer rings, as well as lower frequency points that are not important. I can use the mask to find how far the dominant frequency points are from the center of the plot.
def get_distance_matrix(N):
x = np.linspace(-N/2, N/2, N)
y = np.linspace(-N/2, N/2, N)
X, Y = np.meshgrid(x, y)
distMat = np.sqrt((X**2) + (Y**2))
return distMat
distMat = get_distance_matrix(400)
distances = binary_min * distMat
Then I can simply sort the result, select ranges of similar values, and I have the distances for the points in the rings!
Inner Ring (Low Frequency Ring) (Red)
* number of points in mask: 20
* mean distance: 46.11 pixels
* standard deviation: 1.39 pixels
Outer Ring (High Frequency Ring) (Blue)
* number of points in mask: 8
* mean distance: 79.27 pixels
* standard deviation: 0.90 pixels
The Nyquist frequency is 0.5 cycles / pixel (the highest frequency pattern I can pick up on, i.e. if the pattern alternated each pixel). So the normalized range of possible frequency basis in the FFT goes from -0.5 to +0.5 cycles / pixel. Since our image shape is 400x400 pixels, each pixel in the frequency domain represents 1/400 cycles per pixel. Thus, I divide the frequencies by a constant 400 to normalize them to the unit scale.
Inner Ring (Low Frequency Ring) (Red)
* normalized average frequency: 0.115 cycles / pixel
Outer Ring (High Frequency Ring) (Blue)
* normalized average frequency: 0.198 cycles / pixel
Step 3. Calculate Size and Scale
From the normalized average frequency, I can find the period length of one cycle:
Inner Ring (Low Frequency Ring) (Red)
* average cycle length: 8.67 pixels
Outer Ring (High Frequency Ring) (Blue)
* average cycle length: 5.04 pixels
The inner (red) ring has a longer period, compared to the outer (blue) ring. This is expected, as the inner ring is closer to the center and therefore has a lower frequency.
Finally, I have to convert from pixels to nanometers using the scale on the original image.
Inner Ring (Low Frequency Ring) (Red)
* average cycle length: 14.7 nm
Outer Ring (High Frequency Ring) (Blue)
* average cycle length: 8.5 nm
These cycle lengths map to the dominant periodic frequencies in a hexagonal pattern!
As a quick check, we verify the 8.5 nm and 14.7 nm periods are consistent with the ratios expected in a hexagon pattern. We can see the periods make up the legs of a 30-60-90 right triangle, so if the shorter leg is $a = 8.5$, we expect the longer leg to be $a\sqrt(3) ~= 14.72$. This looks good!
We can get our center-to-center spacing by multiplying the period length by 2:
Inner Ring (Low Frequency Ring) (Red)
* center-to-center spacing: 29.4 nm
Outer Ring (High Frequency Ring) (Blue)
* center-to-center spacing: 17.0 nm
Step 4. Identify Shape
The Fourier magnitude plot pretty clearly shows two hexagonal rings. Lets compare the layout to the example S-layer images from earlier in the post:
Here, I cropped the images, multiplied them with the Hann windowing function, and plotted their log of centered FFT magnitude plots. The magnitude plots are quite distinct: oblique S-layer has a dominant slant, square S-layer has a repeating grid of squares, and the hexagonal S-layer has hexagonal rings.
The Hot Peanut magnitude plot has the same layout as the example hexagonal S-layer!
Bonus: Visualize the S-Layer Units
I thought it might be cool to visualize the S-layer units.
I use the threshold logic from earlier to get a pixel intensity cutoff to isolate the key frequencies. This time, I am using a non-windowed image because when I inverse FFT, I want the result to have edges.
The resulting mask can be used as a filter. I multiply the original FFT by it, then perform de-centering and inverse FFT.
The resulting filtered image has hexagonal features! If I manually measure the center-to-center subunit distance along both of the key hexagonal frequencies, I get 16.24 nm and 29.6 nm. That is pretty close to the calculated center-to-center spacings! The error margin can be attributed to human error when trying to estimate the center of hexagons.
I can see what I filtered out by inverting the mask and doing the same inverse FFT reconstruction process. I have used an orange color map to display the faint noise more clearly.
5 CONCLUSION
The 2D discrete Fast Fourier plot is a nice tool for picking out key periodic frequencies in an image. By using FFT analysis, I determined that the ‘Hot Peanuts’ bacteria S-layer had a hexagonal lattice pattern, and center-to-center subunit spacings of 29.4 nm and 17.0 nm.
This analysis was verified by comparing the ‘Hot Peanut’ FFT lattice pattern to known S-layer FFT patterns and by manually measuring the center-to-center spacing of the hexagonal subunits in a ‘Hot Peanut’ low-pass filtered spatial reconstruction.