Project

General

Profile

Actions

Bug #3646

closed

maxlikelihood run cannot take large number of processors

Added by Venkata Dandey about 9 years ago. Updated about 9 years ago.

Status:
Closed
Priority:
High
Category:
-
Target version:
-
Start date:
10/08/2015
Due date:
% Done:

0%

Estimated time:
Affected Version:
Appion/Leginon 3.2
Show in known bugs:
Workaround:

Description

when submitting the maxlikelihood job with 256 processors, it does not submit the job and throws up the error (screenshot attached) saying that no writing permissions, even though I used the right one.


Files

Actions #1

Updated by Venkata Dandey about 9 years ago

  • Priority changed from Normal to High
Actions #2

Updated by Anchi Cheng about 9 years ago

  • Project changed from 138 to Appion
  • Assignee set to Sargis Dallakyan
  • Affected Version set to Appion/Leginon 3.2

Since smaller number works, there may be an error caught by myamiweb testing the cluster and failed, not a real permission problem.

Actions #3

Updated by Yong Zi Tan about 9 years ago

The problem is that the --ppn set by the system is automatically 4, even though it is 24 for our cluster. Therefore if you ask for say, 240 processors, Appion divides that by 4 to result in it asking for 60 nodes, which is more than what we have, hence the crash. The way to fix it could be to add in an option in the ML2D Appion GUI to let you pick the number of nodes and processors per node.

Actions #4

Updated by Yong Zi Tan about 9 years ago

  • Assignee changed from Sargis Dallakyan to Anchi Cheng

Dear Anchi, maybe you can take care of this? Because this does not seem to be an issue with the cluster as I have successfully run 360 processor ML2D jobs, just by submitting in the command line. Thank you!

Run that worked: /gpfs/appion/yztan/15oct15f/align/maxlike6/

Default run command:
runJob.py /opt/myamisnap/bin/appion maxlikeAlignment.py --description=ml2d of original stack --stack=4 --lowpass=15 --highpass=600 --num-part=76029 --num-ref=300 --bin=4 --angle-interval=5 --max-iter=15 --nproc=360 --fast --fast-mode=normal --mirror --savemem --commit --converge=normal --rundir=/gpfs/appion/yztan/15oct15f/align/maxlike7 --runname=maxlike7 --projectid=124 --expid=853 --jobtype=partalign --ppn=4 --nodes=90 --walltime=240 --queue=longq --jobid=52

Altered run command:
runJob.py /opt/myamisnap/bin/appion maxlikeAlignment.py --description=ml2d of original stack --stack=4 --lowpass=15 --highpass=600 --num-part=76029 --num-ref=300 --bin=4 --angle-interval=5 --max-iter=15 --nproc=360 --fast --fast-mode=normal --mirror --savemem --commit --converge=normal --rundir=/gpfs/appion/yztan/15oct15f/align/maxlike7 --runname=maxlike7 --projectid=124 --expid=853 --jobtype=partalign --ppn=15 --nodes=24 --walltime=999 --queue=longq --jobid=52

Actions #5

Updated by Anchi Cheng about 9 years ago

  • Status changed from New to In Test
  • Assignee changed from Anchi Cheng to Venkata Dandey

r19274 forces ppn to be assigned to maximum ppn of the cluster. This would not be a good thing of variable number of processors are on that cluster. Also a validation of total number of nodes requested should be added. This is a temporary fix. The new appion form for cluster really should be used for this.

Vankata, please see if this comes out right now.

Actions #6

Updated by Anchi Cheng about 9 years ago

Yong Zi and Vankata,

If you know other places that is better to be default to max ppn, add them here. My fix is general and can be apply to them.

Actions #7

Updated by Anchi Cheng about 9 years ago

  • Status changed from In Test to Closed

no feadback

Actions

Also available in: Atom PDF