Project

General

Profile

Actions

Bug #4622

closed

AverageStack fails with Stack is too large error when the computer has too much memory

Added by Anchi Cheng almost 8 years ago. Updated almost 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
11/28/2016
Due date:
% Done:

0%

Estimated time:
Affected Version:
Appion/Leginon 3.2
Show in known bugs:
No
Workaround:

Description

This error is reported by a local user on semc-cluster. It can be reproduced with the stack of 154635 particles of 160 boxsize. Stack average part failed due to the memory needed by the chunk based on self.freememory is larger than the hard-coded memorylimit value defined at the start of apImageFile module.

>>> from appionlib import apStack
>>> a = apStack.AverageStack(True)
>>> a.start('start.hed',None)

 ... processStack: Free memory: 206.4 GB
 ... processStack: Box size: 160
 ... processStack: Memory used per part: 100.0 kB
 ... processStack: Max particles in memory: 2164206
 ... processStack: Particles allowed in memory: 108210
 ... processStack: Number of particles in stack: 154635
 ... processStack: Particle loop num chunks: 2
 ... processStack: Particle loop step size: 77317
 ... processStack: partnum 1 to 77317 of 154635
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/acheng/myami/appion/appionlib/apImagicFile.py", line 974, in start
    stackdata = readImagic(stackfile, first=first, last=last, msg=False)
  File "/home/acheng/myami/appion/appionlib/apImagicFile.py", line 168, in readImagic
    %(numpart, apDisplay.bytes(partbytes*numpart)))
  File "/home/acheng/myami/appion/appionlib/apDisplay.py", line 65, in printError
    raise Exception, colorString("\n *** FATAL ERROR ***\n"+text+"\n\a","red")
Exception: 
 *** FATAL ERROR ***
Stack is too large to read 77317 particles, requesting 7.4 GB

Neil, how do you want this to be handled ? Many cluster nodes have large memory but shared by many processors. Unless the user always blocks everyone else from accessing a node, the shared memory is not really all available. I hope the rest of you weigh in as well since if we assign it too small it will slow things down.


Related issues 1 (0 open1 closed)

Related to Appion - Bug #4651: XmippDuplicate12/06/2016

Actions
Actions #2

Updated by Anchi Cheng almost 8 years ago

Actions #3

Updated by Anchi Cheng almost 8 years ago

Neil,

If you are doing anything about this, I am going to increase the limit to 16 GB as a work around.

Actions #4

Updated by Neil Voss almost 8 years ago

I could take the available memory divided by the number of processors.

Actions #5

Updated by Anchi Cheng almost 8 years ago

How about set it to mem.free()*1024 / 2 (i.e. half of the free memory) ?

Actions #6

Updated by Neil Voss almost 8 years ago

Actually, it is already mem.free()*1024 / 10

Actions #7

Updated by Neil Voss almost 8 years ago

 
 ... processStack: Free memory: 206.4 GB
 ... processStack: Memory used per part: 100.0 kB
 ... processStack: Max particles in memory: 2164206
 ... processStack: Particles allowed in memory: 108210
 ... processStack: Number of particles in stack: 154635
 ... processStack: Particle loop num chunks: 2
 ... processStack: Particle loop step size: 77317
 ... processStack: partnum 1 to 77317 of 154635

So based on these numbers, it is only requesting to use 7.4 GB of 206.4 GB
Actions #8

Updated by Anchi Cheng almost 8 years ago

The code has two parts that conflict which is why it failed:

There is a hard-code bytelimit at the start of the file used in readImagicData and then there is self.freememory used in initValues of processStack.

I have tested changing bytelimit to calculated from mem.free()*1024 / 2 to resolve this. As long as mem.free() does not change suddenly, it would work.

Actions #9

Updated by Anchi Cheng almost 8 years ago

pushed to repository my idea to get around it.

Actions #10

Updated by Anchi Cheng almost 7 years ago

  • Status changed from Assigned to Closed
Actions

Also available in: Atom PDF