Project

General

Profile

Actions

Bug #2303

open

runXmippCL2D.py cannot generate average.mrc

Added by Ryan Hoffman over 11 years ago. Updated over 9 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
03/22/2013
Due date:
% Done:

0%

Estimated time:
Affected Version:
Appion/Leginon 2.1.0
Show in known bugs:
No
Workaround:

Description

This bug crashes my CL2D runs. It seems to be specific to a particular stack, so that may be a clue.

...
merged 1866 particles in 13.33 sec
 ... Getting stack data for stackid=1171
Old stack info: 'initial stack ... combined stack ids 1170,1169'
 ... Stack 1171 pixel size: 2.05
 ... averaging stack for summary web page
!!! WARNING: could not create stack average, average.mrc
 ... Inserting CL2D Run into DB
Traceback (most recent call last):
  File "/opt/applications/myami/2.2/bin/runXmippCL2D.py", line 624, in <module>
    cl2d.start()
  File "/opt/applications/myami/2.2/bin/runXmippCL2D.py", line 605, in start
    self.insertAlignStackRunIntoDatabase("alignedStack.hed")
  File "/opt/applications/myami/2.2/bin/runXmippCL2D.py", line 386, in insertAlignStackRunIntoDatabase
    apDisplay.printError("could not find average mrc file: "+avgmrcfile)
  File "/opt/applications/myami/2.2/lib/python2.6/site-packages/appionlib/apDisplay.py", line 57, in printError
    raise Exception, colorString("\n *** FATAL ERROR ***\n"+text+"\n\a","red")
Exception: 
 *** FATAL ERROR ***
could not find average mrc file: /ami/data15/appion/13feb17a/align/cl2d3/average.mrc

I ran the following command to generate this error:

runXmippCL2D.py --description="further refining raw particles for proj. matching" --stack=1171\
                             --lowpass=20 --highpass=0 --num-part=1866 --num-ref=64 --bin=2 --max-iter=15 --nproc=$nproc --fast --classical_multiref --correlation --commit\
                             --rundir=/ami/data15/appion/13feb17a/align/cl2d3 --runname=cl2d3 --projectid=391 --expid=11189 --jobtype=partalign


Files

bug2013mar22.png (331 KB) bug2013mar22.png screenshot of problematic stack's summary page Ryan Hoffman, 03/22/2013 09:27 AM
Actions #1

Updated by Ryan Hoffman over 11 years ago

I've added a screenshot of the stack's summary page when viewed on Longboard. The only off thing I notice is that the mean/STDEV montage is broken. Travis noted that yesterday in a different session; may be an entirely unrelated issue.

Actions #2

Updated by Ryan Hoffman over 11 years ago

I've now seen this with multiple stacks. I think the problem may generalize. I don't know whether it's a database issue.

Actions #3

Updated by Ryan Hoffman over 11 years ago

Acquired three data sets yesterday. CL2D for all of them failed with the same error as was originally reported. The directories are /gpfs/home/rmhoff/processing/13mar26[efg]. Here's the PBS script for the 13mar26g session:

#!/bin/bash
#PBS -q myami
#PBS -l nodes=16:ppn=8
#PBS -l mem=752gb
#PBS -l walltime=80:00:00

cd $PBS_O_WORKDIR
nproc=$(cat $PBS_NODEFILE | wc -l)

runXmippCL2D.py --description="initial alignment" --stack=1173 --lowpass=20 --highpass=0 --num-part=33793 --num-ref=32 --bin=3 --max-iter=15 
--nproc=$nproc --fast --classical_multiref --correlation --commit --rundir=/ami/data15/appion/13mar26g/align/cl2d1 --runname=cl2d1 --projectid=392 --expid=11446 --jobtype=partalign
Actions #4

Updated by Gabriel Lander over 11 years ago

This is an odd bug - Appion is looking for a file named something like part####_level###_.hed (that I'm assuming is created by CL2D) to generate the "average.mrc" file.
Looking in all your directories, I don't see a file like this anywhere.
So either CL2D is not generating this file and there is a bug in CL2D, or Appion isn't looking for the appropriate file name.
As far as you can tell, CL2D is running to completion without error?

Actions #5

Updated by Amber Herold over 11 years ago

I noticed a couple of lines in xmipp.std.
Are you using your own installation of mpi?

Actions #6

Updated by Ryan Hoffman over 11 years ago

Gabe wrote:

This is an odd bug - Appion is looking for a file named something like part####_level###_.hed (that I'm assuming is created by CL2D) to generate the "average.mrc" file.
Looking in all your directories, I don't see a file like this anywhere.
So either CL2D is not generating this file and there is a bug in CL2D, or Appion isn't looking for the appropriate file name.
As far as you can tell, CL2D is running to completion without error?

Yeah...I didn't see any particular complaints from Xmipp, although Amber maybe found one relating to MPI. I don't know if Xmipp's completing successfully.

Amber wrote:

I noticed a couple of lines in xmipp.std.
Are you using your own installation of mpi?

Yeah good spot! I may be overshadowing the MPI with a demo version of ifort that I installed. I just un-commented the appropriate line from my bashrc and tried again.

BUT I learned that Helen has managed to reproduce the same error, and I checked, and she's not sourcing my ifort installation.

Actions #7

Updated by Amber Herold over 11 years ago

Where is the job file?
In your .cshrc you are loading myami. You could try loading myami/trunk, unless you are doing that in the job file directly.
And does qsub need the -V option?

Actions #8

Updated by Amber Herold over 11 years ago

Oh, I see you pasted the job file here :)

Actions #9

Updated by Ryan Hoffman over 11 years ago

In your .cshrc you are loading myami.

I use BASH as my default shell on Garibaldi. The only way I get a CSH environment through batch PBS is if I add the directive #PBS -S /usr/bin/csh. So, I'm not using CSH.

Actions #10

Updated by Ryan Hoffman over 11 years ago

Un-sourcing my ifort installation seemed to solve this problem. I haven't been able to reproduce Helen's error, and I haven't delved further into this. So I have no reason to think this is a real bug at this point (I think this was error on my part.) So I'm closing this issue. Thanks much for everyone's help!

Actions #11

Updated by Ryan Hoffman over 11 years ago

  • Status changed from New to Closed
Actions #12

Updated by Dipali Sashital over 11 years ago

  • Status changed from Closed to New

I had the same error on two cl2d runs that failed earlier today. They can be found in directories

/ami/data15/appion/13feb28b/align/cl2d25
/ami/data15/appion/13feb28d/align/cl2d20

I'm not running my own version of MPI, so I agree with Ryan's early assessment that this may be a generalized problem.

Actions #13

Updated by Dipali Sashital over 11 years ago

Per Ryan's suggestion, I restarted my job requesting the maximum memory per node. The job seems to be running normally now and has progressed passed the point where it failed before. The current job is in directory /ami/data15/appion/13feb28b/align/cl2d27
I'll update again if something goes wrong past this point.

Actions #14

Updated by Ryan Hoffman over 11 years ago

This also seems to be an MPI-related issue. From /ami/data15/appion/13feb28d/align/cl2d20/xmipp.std;

/usr/lib64/mpi/gcc/openmpi/bin/mpirun: symbol lookup error: /usr/lib64/mpi/gcc/openmpi/bin/mpirun: undefined symbol: orte_dss
Actions #15

Updated by Dipali Sashital over 11 years ago

After running 3 successful cl2d jobs, this average.mrc error has returned for me. I am requesting the maximum memory per node. My failed runs are in the directories:

/ami/data15/appion/13mar29d/align/cl2d2
/ami/data15/appion/13mar29d/align/cl2d3

Actions #16

Updated by Amber Herold over 11 years ago

Dipa, it looks like you are having the same mpi error as Ryan.

/usr/lib64/mpi/gcc/openmpi/bin/mpirun: symbol lookup error: /usr/lib64/mpi/gcc/o
penmpi/bin/mpirun: undefined symbol: orte_dss

I'll open a ticket with it services and see if they can help out.

Actions #17

Updated by Amber Herold over 11 years ago

JC's response:

it looks like the module openmpi was not loaded/used ? It uses the MPI
version from the Opensuse distribution which is pretty outdated.maybe it
should read only mpirun instead of the entire path
JC

I'll look into this more on Monday.

Actions #18

Updated by Amber Herold over 11 years ago

Hey folks,
At this point I'm thinking there might be an issue with cl2d itself.
The plan is to install xmipp3 and see if that takes care of this.
Until we have a chance to do the xmipp upgrade, please use maximum likelihood alignment instead.

Actions #19

Updated by Melody Campbell over 9 years ago

Hi,

I am getting this problem again on garibaldi.
Directory: /gpfs/group/em/appion/15mar13a/align/cl2d3-wontupload/

For this run, i requested the max mem on the standard garibaldi nodes, and i have module load openmpi in my .cshrc. Any suggestions?

ps. also, i cant seem to assign or add watchers since redmine moved....

Actions

Also available in: Atom PDF