Project

General

Profile

Actions

Bug #4891

open

FFT in redux of large micrographs is too slow

Added by Neil Voss over 7 years ago. Updated over 7 years ago.

Status:
Assigned
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
04/07/2017
Due date:
% Done:

0%

Estimated time:
Affected Version:
Appion/Leginon 3.2
Show in known bugs:
No
Workaround:

Description

It would be great if you could also take a look at the FFT of the large images as these are very slow. There are two layers of problems on FFT:

1. redux was original written for multi-thread but FFTw implementation
for python had a memory leak when multi-threaded. There might be
a solution now or a better thing to use.

2. Large image FFT is slow by default, but K2 super resolution images are
even worse. Currently the way to get faster is to make wisdom plan. If it can
be better, do so.

If you are going to do some changes in there, keep in mind that
K2 images are not square, nor a play-it-nice dimension. We also need to crop
first before transform so that the result will look more like optical diffraction with
the same pixel size in both axes.


I was looking over the FT code in Appion. It seems that both fftw3 and fftpack have been tried to speed this up. I see Anchi has played with the wisdom file to optimize the calculations. There is only so much you can do to speed up an FFT calculation.

Things that could be sped up with their caveats:

A. Figure out multi-threading. May just work in CentOS7 vs. CentOS6.

B. Change the shape. It appears that we are not using a power of 2 FFT (fast FT) but rather a slower DFT (discrete FT). Possible work around:
1. Pad the micrograph into a power of 2 box, such as from 7k x 5k into 8192 x 8192 (slower in my tests)
2. Do the four corners of the micrograph and average them (loss of outer resolution, but would be minimal) 5k x 7k into 4096x4096 was twice as fast, but if we do 4 times we are doubling the total time.

C. We could per-calculate the FT and save it disk. The could be done using a daemon running on a GPU node. Do we save the power spectrum only or the complex form.

D. We could go full GPU with this and require that the reduxd server be run on a GPU system. [Or even more crazy with a FPGA/ASIC hardware FFT solution: http://www.dilloneng.com/2d-fft.html ; handles 16-bit pixel data, at resolution of 2K x 2K pixels, at a frame rate of 120 fps. ]

I think it needs more discussion.

Actions

Also available in: Atom PDF