Project

General

Profile

Actions

Bug #1084

closed

There are unknown errors (in refine*.txt) for this job, you should resubmit??

Added by Bridget Carragher over 13 years ago. Updated over 13 years ago.

Status:
Closed
Priority:
Low
Category:
-
Target version:
Start date:
12/14/2010
Due date:
% Done:

0%

Estimated time:
Affected Version:
Appion/Leginon 2.1.0
Show in known bugs:
No
Workaround:

Description

Got this after a run of eman. Anyone have any idea what it means? Can we do better than this in reporting the error?

Actions #1

Updated by Lauren Fisher over 13 years ago

I have seen this error numerous times while working on the Rotavirus dataset. This error message is a bit annoying because 1)it doesn't tell you anything and 2)it doesn't kill the job even though something is clearly wrong. My problems are due to memory and the size of the stack/images and every time I see this error message the run fails. I don't know if this helps, but the "unknown errors" I am seeing in the refine1.txt file are:

Warning! Sigma=0 for image 0

Error, running 'classalign2 cls0000.lst finalref saverefs logit=1 keep=0.800000 mask=478 imask=-1 8 locfromfile phase' on guppy-18

Error, running 'make3d classes.1.hed out=threed.1.mrc mask=478 hard=25.00 logit=1 sym=icos mode=2 pad=1200 lowmem' on guppy-18

Are you seeing the same errors, Bridget?

Actions #2

Updated by Eric Hou over 13 years ago

  • Status changed from New to Assigned
  • Assignee set to Lauren Fisher
Actions #3

Updated by Eric Hou over 13 years ago

  • Assignee changed from Lauren Fisher to Eric Hou
Actions #4

Updated by Eric Hou over 13 years ago

I would love to know how to implement this issue, but I also need more inputs from everyone.

Thanks.

Eric

Actions #5

Updated by Bridget Carragher over 13 years ago

I am getting these errors:

  1. RUNPAR COMPLETE
    EMAN 1.9 Cluster ($Date: 2009/02/18 05:12:22 $).
    Run 'runpar help' for detailed help.
    EMAN 1.9 Cluster ($Date: 2009/02/18 05:12:22 $).
    Run 'proc2d help' for detailed help.

Warning! Sigma=0 for image 0

Warning! Sigma=0 for image 1

Warning! Sigma=0 for image 2

Warning! Sigma=0 for image 3

Warning! Sigma=0 for image 4

Warning! Sigma=0 for image 5

Warning! Sigma=0 for image 6

Warning! Sigma=0 for image 7

Warning! Sigma=0 for image 8
10 complete
Warning! Sigma=0 for image 9

Warning! Sigma=0 for image 10

Actions #6

Updated by Bridget Carragher over 13 years ago

And the sum total of errors for this run were:

bcarr@garibaldi recon> grep Error refine*.txt
refine17.txt:Error, running 'classalign2 cls0252.lst finalref saverefs logit=17 keep=0.800000 mask=158 imask=-1 8 locfromfile phase refine quiet' on node0591
refine17.txt:Error, running 'classalign2 cls0254.lst finalref saverefs logit=17 keep=0.800000 mask=158 imask=-1 8 locfromfile phase refine quiet' on node0594
refine17.txt:Error, running 'classalign2 cls0254.lst finalref saverefs logit=17 keep=0.800000 mask=158 imask=-1 8 locfromfile phase refine quiet' on node0593
refine17.txt:Error, running 'classalign2 cls0252.lst finalref saverefs logit=17 keep=0.800000 mask=158 imask=-1 8 locfromfile phase refine quiet' on node0592
refine17.txt:Error, running 'make3d classes.17.hed out=threed.17.mrc mask=158 hard=25.00 logit=17 sym=d7 mode=2 pad=400' on node0591
refine17.txt:Error, running 'make3d classes.17.hed out=threed.17.mrc mask=158 hard=25.00 logit=17 sym=d7 mode=2 pad=400' on node0592
refine17.txt:Error, running 'make3d classes.17.hed out=threed.17.mrc mask=158 hard=25.00 logit=17 sym=d7 mode=2 pad=400' on node0593
refine17.txt:Error, running 'make3d classes.17.hed out=threed.17.mrc mask=158 hard=25.00 logit=17 sym=d7 mode=2 pad=400' on node0594
refine17.txt:Error, running 'proc3d threed.17.mrc threed.17a.mrc norm mask=158' on node0591
refine17.txt:Error, running 'proc3d threed.17.mrc threed.17a.mrc norm mask=158' on node0592
refine17.txt:Error, running 'proc3d threed.17.mrc threed.17a.mrc norm mask=158' on node0593
refine17.txt:Error, running 'proc3d threed.17.mrc threed.17a.mrc norm mask=158' on node0594
refine18.txt:Error, running 'classalign2 cls0288.lst finalref saverefs logit=18 keep=0.800000 mask=158 imask=-1 8 locfromfile phase refine quiet' on node0591
refine18.txt:Error, running 'classalign2 cls0288.lst finalref saverefs logit=18 keep=0.800000 mask=158 imask=-1 8 locfromfile phase refine quiet' on node0592
refine18.txt:Error, running 'classalign2 cls0288.lst finalref saverefs logit=18 keep=0.800000 mask=158 imask=-1 8 locfromfile phase refine quiet' on node0593
refine18.txt:Error, running 'classalign2 cls0287.lst finalref saverefs logit=18 keep=0.800000 mask=158 imask=-1 8 locfromfile phase refine quiet' on node0594
refine18.txt:Error, running 'make3d classes.18.hed out=threed.18.mrc mask=158 hard=25.00 logit=18 sym=d7 mode=2 pad=400' on node0591
refine18.txt:Error, running 'make3d classes.18.hed out=threed.18.mrc mask=158 hard=25.00 logit=18 sym=d7 mode=2 pad=400' on node0592
refine18.txt:Error, running 'make3d classes.18.hed out=threed.18.mrc mask=158 hard=25.00 logit=18 sym=d7 mode=2 pad=400' on node0593
refine19.txt:Error, running 'classalign2 cls0252.lst finalref saverefs logit=19 keep=0.800000 mask=158 imask=-1 3 locfromfile phase refine quiet' on node0591
refine19.txt:Error, running 'classalign2 cls0254.lst finalref saverefs logit=19 keep=0.800000 mask=158 imask=-1 3 locfromfile phase refine quiet' on node0592
refine19.txt:Error, running 'classalign2 cls0254.lst finalref saverefs logit=19 keep=0.800000 mask=158 imask=-1 3 locfromfile phase refine quiet' on node0593
refine19.txt:Error, running 'classalign2 cls0254.lst finalref saverefs logit=19 keep=0.800000 mask=158 imask=-1 3 locfromfile phase refine quiet' on node0594
refine19.txt:Error, running 'classalign2 cls0252.lst finalref saverefs logit=19 keep=0.800000 mask=158 imask=-1 3 locfromfile phase refine quiet' on node0594
refine19.txt:Error, running 'make3d classes.19.hed out=threed.19.mrc mask=158 hard=25.00 logit=19 sym=d7 mode=2 pad=400' on node0591
refine19.txt:Error, running 'make3d classes.19.hed out=threed.19.mrc mask=158 hard=25.00 logit=19 sym=d7 mode=2 pad=400' on node0592
refine19.txt:Error, running 'make3d classes.19.hed out=threed.19.mrc mask=158 hard=25.00 logit=19 sym=d7 mode=2 pad=400' on node0593
refine19.txt:Error, running 'make3d classes.19.hed out=threed.19.mrc mask=158 hard=25.00 logit=19 sym=d7 mode=2 pad=400' on node0594
refine19.txt:Error, running 'proc3d threed.19.mrc threed.19a.mrc norm mask=158' on node0591
refine19.txt:Error, running 'proc3d threed.19.mrc threed.19a.mrc norm mask=158' on node0592
refine19.txt:Error, running 'proc3d threed.19.mrc threed.19a.mrc norm mask=158' on node0593
refine19.txt:Error, running 'proc3d threed.19.mrc threed.19a.mrc norm mask=158' on node0594
refine20.txt:Error, running 'classalign2 cls0254.lst finalref saverefs logit=20 keep=0.800000 mask=158 imask=-1 3 locfromfile phase refine quiet' on node0593
refine20.txt:Error, running 'classalign2 cls0254.lst finalref saverefs logit=20 keep=0.800000 mask=158 imask=-1 3 locfromfile phase refine quiet' on node0594
refine20.txt:Error, running 'classalign2 cls0254.lst finalref saverefs logit=20 keep=0.800000 mask=158 imask=-1 3 locfromfile phase refine quiet' on node0591
refine20.txt:Error, running 'classalign2 cls0254.lst finalref saverefs logit=20 keep=0.800000 mask=158 imask=-1 3 locfromfile phase refine quiet' on node0592
refine20.txt:Error, running 'classalign2 cls0253.lst finalref saverefs logit=20 keep=0.800000 mask=158 imask=-1 3 locfromfile phase refine quiet' on node0591
refine20.txt:Error, running 'make3d classes.20.hed out=threed.20.mrc mask=158 hard=25.00 logit=20 sym=d7 mode=2 pad=400' on node0591
refine20.txt:Error, running 'make3d classes.20.hed out=threed.20.mrc mask=158 hard=25.00 logit=20 sym=d7 mode=2 pad=400' on node0592
refine20.txt:Error, running 'make3d classes.20.hed out=threed.20.mrc mask=158 hard=25.00 logit=20 sym=d7 mode=2 pad=400' on node0593
refine20.txt:Error, running 'make3d classes.20.hed out=threed.20.mrc mask=158 hard=25.00 logit=20 sym=d7 mode=2 pad=400' on node0594
refine20.txt:Error, running 'proc3d threed.20.mrc threed.20a.mrc norm mask=158' on node0591
refine20.txt:Error, running 'proc3d threed.20.mrc threed.20a.mrc norm mask=158' on node0592
refine20.txt:Error, running 'proc3d threed.20.mrc threed.20a.mrc norm mask=158' on node0593
refine20.txt:Error, running 'proc3d threed.20.mrc threed.20a.mrc norm mask=158' on node0594

Actions #7

Updated by Eric Hou over 13 years ago

  • Assignee changed from Eric Hou to Dmitry Lyumkis
  • Priority changed from Normal to Urgent
  • Target version set to Appion/Leginon 2.2.0
Actions #8

Updated by Bridget Carragher over 13 years ago

Thanks to Dmityr for figuring out that at least part of the reason that the job crashed and burned because the dmf copy of the stack and model failed.
Thanks to Christopher for figuring out that this seems to be because RC has changed form rsh to ssh for dmf.
Thanks to Christopher for figuring out how to get around this for Bridget. (Lauren you should ask him to set you up the same way as I think some of your problems arise form this)
Christopher is planning to come up with a generic way of doing this for all of us on all external clusters that will make all this happen like fairy dust and completely avoid the use of dmf.

Actions #9

Updated by Bridget Carragher over 13 years ago

  • Status changed from Assigned to In Test
  • Assignee changed from Dmitry Lyumkis to Bridget Carragher
  • Priority changed from Urgent to Low
Actions #10

Updated by Bridget Carragher over 13 years ago

  • Status changed from In Test to Closed
Actions

Also available in: Atom PDF