Bug #4622
closedAverageStack fails with Stack is too large error when the computer has too much memory
0%
Description
This error is reported by a local user on semc-cluster. It can be reproduced with the stack of 154635 particles of 160 boxsize. Stack average part failed due to the memory needed by the chunk based on self.freememory is larger than the hard-coded memorylimit value defined at the start of apImageFile module.
>>> from appionlib import apStack >>> a = apStack.AverageStack(True) >>> a.start('start.hed',None)
... processStack: Free memory: 206.4 GB ... processStack: Box size: 160 ... processStack: Memory used per part: 100.0 kB ... processStack: Max particles in memory: 2164206 ... processStack: Particles allowed in memory: 108210 ... processStack: Number of particles in stack: 154635 ... processStack: Particle loop num chunks: 2 ... processStack: Particle loop step size: 77317 ... processStack: partnum 1 to 77317 of 154635 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/acheng/myami/appion/appionlib/apImagicFile.py", line 974, in start stackdata = readImagic(stackfile, first=first, last=last, msg=False) File "/home/acheng/myami/appion/appionlib/apImagicFile.py", line 168, in readImagic %(numpart, apDisplay.bytes(partbytes*numpart))) File "/home/acheng/myami/appion/appionlib/apDisplay.py", line 65, in printError raise Exception, colorString("\n *** FATAL ERROR ***\n"+text+"\n\a","red") Exception: *** FATAL ERROR *** Stack is too large to read 77317 particles, requesting 7.4 GB
Neil, how do you want this to be handled ? Many cluster nodes have large memory but shared by many processors. Unless the user always blocks everyone else from accessing a node, the shared memory is not really all available. I hope the rest of you weigh in as well since if we assign it too small it will slow things down.
Updated by Anchi Cheng about 8 years ago
Neil,
If you are doing anything about this, I am going to increase the limit to 16 GB as a work around.
Updated by Neil Voss about 8 years ago
I could take the available memory divided by the number of processors.
Updated by Anchi Cheng about 8 years ago
How about set it to mem.free()*1024 / 2 (i.e. half of the free memory) ?
Updated by Neil Voss about 8 years ago
Actually, it is already mem.free()*1024 / 10
Updated by Neil Voss about 8 years ago
... processStack: Free memory: 206.4 GB ... processStack: Memory used per part: 100.0 kB ... processStack: Max particles in memory: 2164206 ... processStack: Particles allowed in memory: 108210 ... processStack: Number of particles in stack: 154635 ... processStack: Particle loop num chunks: 2 ... processStack: Particle loop step size: 77317 ... processStack: partnum 1 to 77317 of 154635
So based on these numbers, it is only requesting to use 7.4 GB of 206.4 GB
Updated by Anchi Cheng about 8 years ago
The code has two parts that conflict which is why it failed:
There is a hard-code bytelimit at the start of the file used in readImagicData and then there is self.freememory used in initValues of processStack.
I have tested changing bytelimit to calculated from mem.free()*1024 / 2 to resolve this. As long as mem.free() does not change suddenly, it would work.
Updated by Anchi Cheng about 8 years ago
pushed to repository my idea to get around it.
Updated by Anchi Cheng almost 7 years ago
- Status changed from Assigned to Closed