Bug #2303
openrunXmippCL2D.py cannot generate average.mrc
0%
Description
This bug crashes my CL2D runs. It seems to be specific to a particular stack, so that may be a clue.
... merged 1866 particles in 13.33 sec ... Getting stack data for stackid=1171 Old stack info: 'initial stack ... combined stack ids 1170,1169' ... Stack 1171 pixel size: 2.05 ... averaging stack for summary web page !!! WARNING: could not create stack average, average.mrc ... Inserting CL2D Run into DB Traceback (most recent call last): File "/opt/applications/myami/2.2/bin/runXmippCL2D.py", line 624, in <module> cl2d.start() File "/opt/applications/myami/2.2/bin/runXmippCL2D.py", line 605, in start self.insertAlignStackRunIntoDatabase("alignedStack.hed") File "/opt/applications/myami/2.2/bin/runXmippCL2D.py", line 386, in insertAlignStackRunIntoDatabase apDisplay.printError("could not find average mrc file: "+avgmrcfile) File "/opt/applications/myami/2.2/lib/python2.6/site-packages/appionlib/apDisplay.py", line 57, in printError raise Exception, colorString("\n *** FATAL ERROR ***\n"+text+"\n\a","red") Exception: *** FATAL ERROR *** could not find average mrc file: /ami/data15/appion/13feb17a/align/cl2d3/average.mrc
I ran the following command to generate this error:
runXmippCL2D.py --description="further refining raw particles for proj. matching" --stack=1171\ --lowpass=20 --highpass=0 --num-part=1866 --num-ref=64 --bin=2 --max-iter=15 --nproc=$nproc --fast --classical_multiref --correlation --commit\ --rundir=/ami/data15/appion/13feb17a/align/cl2d3 --runname=cl2d3 --projectid=391 --expid=11189 --jobtype=partalign
Files
Updated by Ryan Hoffman over 11 years ago
- File bug2013mar22.png bug2013mar22.png added
I've added a screenshot of the stack's summary page when viewed on Longboard. The only off thing I notice is that the mean/STDEV montage is broken. Travis noted that yesterday in a different session; may be an entirely unrelated issue.
Updated by Ryan Hoffman over 11 years ago
I've now seen this with multiple stacks. I think the problem may generalize. I don't know whether it's a database issue.
Updated by Ryan Hoffman over 11 years ago
Acquired three data sets yesterday. CL2D for all of them failed with the same error as was originally reported. The directories are /gpfs/home/rmhoff/processing/13mar26[efg]
. Here's the PBS script for the 13mar26g session:
#!/bin/bash #PBS -q myami #PBS -l nodes=16:ppn=8 #PBS -l mem=752gb #PBS -l walltime=80:00:00 cd $PBS_O_WORKDIR nproc=$(cat $PBS_NODEFILE | wc -l) runXmippCL2D.py --description="initial alignment" --stack=1173 --lowpass=20 --highpass=0 --num-part=33793 --num-ref=32 --bin=3 --max-iter=15 --nproc=$nproc --fast --classical_multiref --correlation --commit --rundir=/ami/data15/appion/13mar26g/align/cl2d1 --runname=cl2d1 --projectid=392 --expid=11446 --jobtype=partalign
Updated by Gabriel Lander over 11 years ago
This is an odd bug - Appion is looking for a file named something like part####_level###_.hed (that I'm assuming is created by CL2D) to generate the "average.mrc" file.
Looking in all your directories, I don't see a file like this anywhere.
So either CL2D is not generating this file and there is a bug in CL2D, or Appion isn't looking for the appropriate file name.
As far as you can tell, CL2D is running to completion without error?
Updated by Amber Herold over 11 years ago
I noticed a couple of lines in xmipp.std.
Are you using your own installation of mpi?
Updated by Ryan Hoffman over 11 years ago
Gabe wrote:
This is an odd bug - Appion is looking for a file named something like part####_level###_.hed (that I'm assuming is created by CL2D) to generate the "average.mrc" file.
Looking in all your directories, I don't see a file like this anywhere.
So either CL2D is not generating this file and there is a bug in CL2D, or Appion isn't looking for the appropriate file name.
As far as you can tell, CL2D is running to completion without error?
Yeah...I didn't see any particular complaints from Xmipp, although Amber maybe found one relating to MPI. I don't know if Xmipp's completing successfully.
Amber wrote:
I noticed a couple of lines in xmipp.std.
Are you using your own installation of mpi?
Yeah good spot! I may be overshadowing the MPI with a demo version of ifort that I installed. I just un-commented the appropriate line from my bashrc and tried again.
BUT I learned that Helen has managed to reproduce the same error, and I checked, and she's not sourcing my ifort installation.
Updated by Amber Herold over 11 years ago
Where is the job file?
In your .cshrc you are loading myami. You could try loading myami/trunk, unless you are doing that in the job file directly.
And does qsub need the -V option?
Updated by Amber Herold over 11 years ago
Oh, I see you pasted the job file here :)
Updated by Ryan Hoffman over 11 years ago
In your .cshrc you are loading myami.
I use BASH as my default shell on Garibaldi. The only way I get a CSH environment through batch PBS is if I add the directive #PBS -S /usr/bin/csh
. So, I'm not using CSH.
Updated by Ryan Hoffman over 11 years ago
Un-sourcing my ifort installation seemed to solve this problem. I haven't been able to reproduce Helen's error, and I haven't delved further into this. So I have no reason to think this is a real bug at this point (I think this was error on my part.) So I'm closing this issue. Thanks much for everyone's help!
Updated by Dipali Sashital over 11 years ago
- Status changed from Closed to New
I had the same error on two cl2d runs that failed earlier today. They can be found in directories
/ami/data15/appion/13feb28b/align/cl2d25
/ami/data15/appion/13feb28d/align/cl2d20
I'm not running my own version of MPI, so I agree with Ryan's early assessment that this may be a generalized problem.
Updated by Dipali Sashital over 11 years ago
Per Ryan's suggestion, I restarted my job requesting the maximum memory per node. The job seems to be running normally now and has progressed passed the point where it failed before. The current job is in directory /ami/data15/appion/13feb28b/align/cl2d27
I'll update again if something goes wrong past this point.
Updated by Ryan Hoffman over 11 years ago
This also seems to be an MPI-related issue. From /ami/data15/appion/13feb28d/align/cl2d20/xmipp.std
;
/usr/lib64/mpi/gcc/openmpi/bin/mpirun: symbol lookup error: /usr/lib64/mpi/gcc/openmpi/bin/mpirun: undefined symbol: orte_dss
Updated by Dipali Sashital over 11 years ago
After running 3 successful cl2d jobs, this average.mrc error has returned for me. I am requesting the maximum memory per node. My failed runs are in the directories:
/ami/data15/appion/13mar29d/align/cl2d2
/ami/data15/appion/13mar29d/align/cl2d3
Updated by Amber Herold over 11 years ago
Dipa, it looks like you are having the same mpi error as Ryan.
/usr/lib64/mpi/gcc/openmpi/bin/mpirun: symbol lookup error: /usr/lib64/mpi/gcc/o penmpi/bin/mpirun: undefined symbol: orte_dss
I'll open a ticket with it services and see if they can help out.
Updated by Amber Herold over 11 years ago
JC's response:
it looks like the module openmpi was not loaded/used ? It uses the MPI version from the Opensuse distribution which is pretty outdated.maybe it should read only mpirun instead of the entire path JC
I'll look into this more on Monday.
Updated by Amber Herold over 11 years ago
Hey folks,
At this point I'm thinking there might be an issue with cl2d itself.
The plan is to install xmipp3 and see if that takes care of this.
Until we have a chance to do the xmipp upgrade, please use maximum likelihood alignment instead.
Updated by Melody Campbell over 9 years ago
Hi,
I am getting this problem again on garibaldi.
Directory: /gpfs/group/em/appion/15mar13a/align/cl2d3-wontupload/
For this run, i requested the max mem on the standard garibaldi nodes, and i have module load openmpi in my .cshrc. Any suggestions?
ps. also, i cant seem to assign or add watchers since redmine moved....