Project

General

Profile

Actions

Bug #2005

closed

redux crashing on fft

Added by Scott Stagg over 12 years ago. Updated over 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
08/17/2012
Due date:
% Done:

0%

Estimated time:
Affected Version:
Appion/Leginon 2.2.0
Show in known bugs:
No
Workaround:

Description

I am having a problem where the redux server is crashing when I'm looking at FFTs. The redux log looks like this:

REQUEST: pipeline=standard&filename=/data/lustre/leginondata/12jun20b/rawdata/12jun20b_00008ex.mrc&power=1&maskradius=5.12&shape=512x512&scaletype=stdev&scalemin=-3&scalemax=3&oformat=JPEG
NOT IN MEMORY: Format(38359760,oformat=JPEG,rgb=False)
NOT IN MEMORY: Scale(38360912,scalemax=3.0,scalemin=-3.0,scaletype=stdev)
NOT IN MEMORY: Mask(38360848,maskradius=5)
NOT IN MEMORY: Shape(38360784,shape=(512, 512))
NOT IN MEMORY: Power(38360656,)
IN MEMORY: Read(38359312,filename=/data/lustre/leginondata/12jun20b/rawdata/12jun20b_00008ex.mrc,info=False)
NOT IN FILE: Read(38359312,filename=/data/lustre/leginondata/12jun20b/rawdata/12jun20b_00008ex.mrc,info=False)
Running Power(38360656,)
1345235788.14
Traceback (most recent call last):
File "/panfs/storage.local/imb/stagg/software/myami_dev/redux/server.py", line 37, in run_process
result = pipeline.process(**kwargs)
File "/panfs/storage.local/imb/stagg/software/myami_dev/redux/pipeline.py", line 104, in process
result = pipe(result)
File "/panfs/storage.local/imb/stagg/software/myami_dev/redux/pipe.py", line 162, in call
return self.run(input, **self.kwargs)
File "/panfs/storage.local/imb/stagg/software/myami_dev/redux/pipes/power.py", line 7, in run
output = pyami.fft.calculator.power(input, full=True, centered=True)
File "/panfs/storage.local/imb/stagg/software/myami_dev/pyami/fft/calc_base.py", line 101, in power
pow = self.post_fft(pow, full, centered, mask)
File "/panfs/storage.local/imb/stagg/software/myami_dev/pyami/fft/calc_base.py", line 72, in post_fft
fft_array = pyami.imagefun.swap_quadrants(fft_array)
File "/panfs/storage.local/imb/stagg/software/myami_dev/pyami/imagefun.py", line 183, in swap_quadrants
a = numpy.roll(a, shift0, 0)
File "/usr/lib64/python2.6/site-packages/numpy/core/numeric.py", line 1063, in roll
res = a.take(indexes, axis)
MemoryError

Thoughts


Related issues 1 (0 open1 closed)

Related to Appion - Bug #2058: redux caching problemsClosedDmitry Lyumkis09/21/2012

Actions
Actions #1

Updated by Jim Pulokas over 12 years ago

  • Status changed from New to Assigned
  • Assignee set to Jim Pulokas

Scott, there are a few things I am curious about. Is there any chance that the redux disk cache is actually using a virtual disk, and therefore using memory instead of disk?

Can you provide more details about this machine.

How much memory does it have?

Is it a dedicated web server or sharing memory resources with other tasks?

What is the size of your memory cache in redux/pipeline.py?

You should be able to reproduce the exception from a command line using the stand-alone redux client:

redux --server_host=localhost --request="pipeline=standard&filename=/data/lustre/leginondata/12jun20b/rawdata/12jun20b_00008ex.mrc&power=1&maskradius=5.12&shape=512x512&scaletype=stdev&scalemin=-3&scalemax=3&oformat=JPEG" > result.jpg

try to run "top" with processes sorted by memory usage to see if redux is the main culprit for running out of memory (type shift-m in top to soft by memory)

Actions #2

Updated by Scott Stagg over 12 years ago

The cache is on an actual disk, /var/www/html/cache , where /var is a separate partition that is currently 90% full.

The machine has 24 Gb of RAM

It is a semi-dedicated webserver with a handful of shared processes, but none of them are consuming resources when the crash happens

In pipeline.py: mem_cache_size = 400*1024*1024 # 400 MB

The crash only happens when browsing from images of one FFT image to the next

I ran top with processes sorted by memory and discovered something interesting. When using redux to browse from image to image, python memory briefly rose to 1% then fell to 0. When I turned on FFT and browsed from image to image, memory use rose by about a half a percent each time I clicked an image. I stopped clicking when python hit 9%. Memory use stayed 9% despite not doing anything with the browser. Seems like a memory leak?

I tried updating to the latest trunk version of redux, but that broke everything, so I reverted to r17113.

Actions #3

Updated by Anchi Cheng over 12 years ago

  • Affected Version changed from Appion/Leginon 2.1.0 to Appion/Leginon 2.2.0

I got similar error on longboard this morning. Longboard appears not running cache (I could not find redux.cfg). I started reduxd with /etc/init.d/reduxd start

I randomly accessed a few images after, the smaller images still ran through for the next two, but not the big ones.  After two smaller image got through, reduxd stopped by itself without error.

Here is the sequence.

  1. all images ran fine before
  2. 4k image memory error during binning
  3. 4k image memory error during read
  4. 1k image not in memory processed successfully
  5. 4k image memory error during read
  6. 4k image read from memory successfully
  7. 1k image (the last one before redux stopped) showed the following in the log (can't remember if the image showed up on the web browser, I think it did):

REQUEST: pipeline=standard&filename=/ami/data00/leginon/12nov29d/rawdata/12nov29d_b_00014gr_00013sq_v02_00008hl_v08.mrc&shape=494x512&scaletype=stdev&scalemin=-5&scalemax=5&oformat=JPEG
NOT IN MEMORY: Format(74429776,oformat=JPEG,rgb=False)
NOT IN MEMORY: Scale(74429392,scalemax=5.0,scalemin=-5.0,scaletype=stdev)
NOT IN MEMORY: Shape(74429584,shape=(494, 512))
NOT IN MEMORY: Read(74429328,filename=/ami/data00/leginon/12nov29d/rawdata/12nov29d_b_00014gr_00013sq_v02_00008hl_v08.mrc,info=False)
Running Read(74429328,filename=/ami/data00/leginon/12nov29d/rawdata/12nov29d_b_00014gr_00013sq_v02_00008hl_v08.mrc,info=False)
Running Shape(74429584,shape=(494, 512))
Running Scale(74429392,scalemax=5.0,scalemin=-5.0,scaletype=stdev)
Running Format(74429776,oformat=JPEG,rgb=False)

Actions #4

Updated by Jim Pulokas over 12 years ago

Found major problem introduced in r17138: the cache size from config file was read as a string rather than integer, causing completely wrong cache clean up. This is probably not Scott's problem, because that was reported before r17138. I fixed this in r17291.

For Scott's problem, I am going to suspect the way that pyami.fft causes an image to hold a reference to its fft and power spectrum, which would hold on to a lot more memory than we are actually keeping track of in the caching code. My attempt to fix it is in r17292.

Actions #5

Updated by Anchi Cheng over 12 years ago

Got the same memory error while running Leginon Manual Application with calculating FFT on for 4kx4k images. It went fine for maybe 15 images until the same memory error on the same function of numpy.roll. Looks like this issue originate at lower level than redux.

xception in thread data binder handler thread:
Traceback (most recent call last):
  File "/usr/lib64/python2.4/threading.py", line 442, in __bootstrap
    self.run()
  File "/usr/lib64/python2.4/threading.py", line 422, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/acheng/myami/leginon/databinder.py", line 131, in handleData
    method(args)
  File "/home/acheng/myami/leginon/watcher.py", line 35, in handleEvent
    self.processEvent(pubevent)
  File "/home/acheng/myami/leginon/watcher.py", line 43, in processEvent
    self.processData(newdata)
  File "/home/acheng/myami/leginon/imagewatcher.py", line 44, in processData
    self.processImageData(idata)
  File "/home/acheng/myami/leginon/fftmaker.py", line 47, in processImageData
    pow = self.calculatePowerImage(imagedata)
  File "/home/acheng/myami/leginon/fftmaker.py", line 68, in calculatePowerImage
    pow = imagefun.power(imarray, self.settings['mask radius'])
  File "/home/acheng/myami/pyami/imagefun.py", line 101, in power
    pow = swap_quadrants(pow)
  File "/home/acheng/myami/pyami/imagefun.py", line 184, in swap_quadrants
    a = numpy.roll(a, shift1, 1)
  File "/usr/lib64/python2.4/site-packages/numpy/core/numeric.py", line 356, in roll
    res = a.take(indexes, axis)
MemoryError
Actions #6

Updated by Jim Pulokas over 11 years ago

  • Assignee changed from Jim Pulokas to Scott Stagg

Scott,
Have you had any issues related to this lately? We are cleaning up issues and think this may have been solved in another issue regarding a memory leak in pyfftw.
Thanks,
Jim

Actions #7

Updated by Scott Stagg over 11 years ago

I haven't had problems related to this specific issue, but I have had some problems with random redux crashes lately. I have tracked it down (I think) to the redux log. Redux seems to be OK for a long time (weeks) then will start to crash with increasing frequency until I'm restarting it multiple times a day. I have been able to fix this temporarily by removing redux.log from /var/log. To answer your next question, no it is not unusually large and no the disk partition is not close to being full.

Actions #8

Updated by Anchi Cheng over 7 years ago

  • Status changed from Assigned to Closed
Actions

Also available in: Atom PDF