Bug #1084
closedThere are unknown errors (in refine*.txt) for this job, you should resubmit??
0%
Description
Got this after a run of eman. Anyone have any idea what it means? Can we do better than this in reporting the error?
Updated by Lauren Fisher about 14 years ago
I have seen this error numerous times while working on the Rotavirus dataset. This error message is a bit annoying because 1)it doesn't tell you anything and 2)it doesn't kill the job even though something is clearly wrong. My problems are due to memory and the size of the stack/images and every time I see this error message the run fails. I don't know if this helps, but the "unknown errors" I am seeing in the refine1.txt file are:
Warning! Sigma=0 for image 0
Error, running 'classalign2 cls0000.lst finalref saverefs logit=1 keep=0.800000 mask=478 imask=-1 8 locfromfile phase' on guppy-18
Error, running 'make3d classes.1.hed out=threed.1.mrc mask=478 hard=25.00 logit=1 sym=icos mode=2 pad=1200 lowmem' on guppy-18
Are you seeing the same errors, Bridget?
Updated by Eric Hou about 14 years ago
- Status changed from New to Assigned
- Assignee set to Lauren Fisher
Updated by Eric Hou about 14 years ago
- Assignee changed from Lauren Fisher to Eric Hou
Updated by Eric Hou about 14 years ago
I would love to know how to implement this issue, but I also need more inputs from everyone.
Thanks.
Eric
Updated by Bridget Carragher about 14 years ago
I am getting these errors:
- RUNPAR COMPLETE
EMAN 1.9 Cluster ($Date: 2009/02/18 05:12:22 $).
Run 'runpar help' for detailed help.
EMAN 1.9 Cluster ($Date: 2009/02/18 05:12:22 $).
Run 'proc2d help' for detailed help.
Warning! Sigma=0 for image 0
Warning! Sigma=0 for image 1
Warning! Sigma=0 for image 2
Warning! Sigma=0 for image 3
Warning! Sigma=0 for image 4
Warning! Sigma=0 for image 5
Warning! Sigma=0 for image 6
Warning! Sigma=0 for image 7
Warning! Sigma=0 for image 8
10 complete
Warning! Sigma=0 for image 9
Warning! Sigma=0 for image 10
Updated by Bridget Carragher about 14 years ago
And the sum total of errors for this run were:
bcarr@garibaldi recon> grep Error refine*.txt
refine17.txt:Error, running 'classalign2 cls0252.lst finalref saverefs logit=17 keep=0.800000 mask=158 imask=-1 8 locfromfile phase refine quiet' on node0591
refine17.txt:Error, running 'classalign2 cls0254.lst finalref saverefs logit=17 keep=0.800000 mask=158 imask=-1 8 locfromfile phase refine quiet' on node0594
refine17.txt:Error, running 'classalign2 cls0254.lst finalref saverefs logit=17 keep=0.800000 mask=158 imask=-1 8 locfromfile phase refine quiet' on node0593
refine17.txt:Error, running 'classalign2 cls0252.lst finalref saverefs logit=17 keep=0.800000 mask=158 imask=-1 8 locfromfile phase refine quiet' on node0592
refine17.txt:Error, running 'make3d classes.17.hed out=threed.17.mrc mask=158 hard=25.00 logit=17 sym=d7 mode=2 pad=400' on node0591
refine17.txt:Error, running 'make3d classes.17.hed out=threed.17.mrc mask=158 hard=25.00 logit=17 sym=d7 mode=2 pad=400' on node0592
refine17.txt:Error, running 'make3d classes.17.hed out=threed.17.mrc mask=158 hard=25.00 logit=17 sym=d7 mode=2 pad=400' on node0593
refine17.txt:Error, running 'make3d classes.17.hed out=threed.17.mrc mask=158 hard=25.00 logit=17 sym=d7 mode=2 pad=400' on node0594
refine17.txt:Error, running 'proc3d threed.17.mrc threed.17a.mrc norm mask=158' on node0591
refine17.txt:Error, running 'proc3d threed.17.mrc threed.17a.mrc norm mask=158' on node0592
refine17.txt:Error, running 'proc3d threed.17.mrc threed.17a.mrc norm mask=158' on node0593
refine17.txt:Error, running 'proc3d threed.17.mrc threed.17a.mrc norm mask=158' on node0594
refine18.txt:Error, running 'classalign2 cls0288.lst finalref saverefs logit=18 keep=0.800000 mask=158 imask=-1 8 locfromfile phase refine quiet' on node0591
refine18.txt:Error, running 'classalign2 cls0288.lst finalref saverefs logit=18 keep=0.800000 mask=158 imask=-1 8 locfromfile phase refine quiet' on node0592
refine18.txt:Error, running 'classalign2 cls0288.lst finalref saverefs logit=18 keep=0.800000 mask=158 imask=-1 8 locfromfile phase refine quiet' on node0593
refine18.txt:Error, running 'classalign2 cls0287.lst finalref saverefs logit=18 keep=0.800000 mask=158 imask=-1 8 locfromfile phase refine quiet' on node0594
refine18.txt:Error, running 'make3d classes.18.hed out=threed.18.mrc mask=158 hard=25.00 logit=18 sym=d7 mode=2 pad=400' on node0591
refine18.txt:Error, running 'make3d classes.18.hed out=threed.18.mrc mask=158 hard=25.00 logit=18 sym=d7 mode=2 pad=400' on node0592
refine18.txt:Error, running 'make3d classes.18.hed out=threed.18.mrc mask=158 hard=25.00 logit=18 sym=d7 mode=2 pad=400' on node0593
refine19.txt:Error, running 'classalign2 cls0252.lst finalref saverefs logit=19 keep=0.800000 mask=158 imask=-1 3 locfromfile phase refine quiet' on node0591
refine19.txt:Error, running 'classalign2 cls0254.lst finalref saverefs logit=19 keep=0.800000 mask=158 imask=-1 3 locfromfile phase refine quiet' on node0592
refine19.txt:Error, running 'classalign2 cls0254.lst finalref saverefs logit=19 keep=0.800000 mask=158 imask=-1 3 locfromfile phase refine quiet' on node0593
refine19.txt:Error, running 'classalign2 cls0254.lst finalref saverefs logit=19 keep=0.800000 mask=158 imask=-1 3 locfromfile phase refine quiet' on node0594
refine19.txt:Error, running 'classalign2 cls0252.lst finalref saverefs logit=19 keep=0.800000 mask=158 imask=-1 3 locfromfile phase refine quiet' on node0594
refine19.txt:Error, running 'make3d classes.19.hed out=threed.19.mrc mask=158 hard=25.00 logit=19 sym=d7 mode=2 pad=400' on node0591
refine19.txt:Error, running 'make3d classes.19.hed out=threed.19.mrc mask=158 hard=25.00 logit=19 sym=d7 mode=2 pad=400' on node0592
refine19.txt:Error, running 'make3d classes.19.hed out=threed.19.mrc mask=158 hard=25.00 logit=19 sym=d7 mode=2 pad=400' on node0593
refine19.txt:Error, running 'make3d classes.19.hed out=threed.19.mrc mask=158 hard=25.00 logit=19 sym=d7 mode=2 pad=400' on node0594
refine19.txt:Error, running 'proc3d threed.19.mrc threed.19a.mrc norm mask=158' on node0591
refine19.txt:Error, running 'proc3d threed.19.mrc threed.19a.mrc norm mask=158' on node0592
refine19.txt:Error, running 'proc3d threed.19.mrc threed.19a.mrc norm mask=158' on node0593
refine19.txt:Error, running 'proc3d threed.19.mrc threed.19a.mrc norm mask=158' on node0594
refine20.txt:Error, running 'classalign2 cls0254.lst finalref saverefs logit=20 keep=0.800000 mask=158 imask=-1 3 locfromfile phase refine quiet' on node0593
refine20.txt:Error, running 'classalign2 cls0254.lst finalref saverefs logit=20 keep=0.800000 mask=158 imask=-1 3 locfromfile phase refine quiet' on node0594
refine20.txt:Error, running 'classalign2 cls0254.lst finalref saverefs logit=20 keep=0.800000 mask=158 imask=-1 3 locfromfile phase refine quiet' on node0591
refine20.txt:Error, running 'classalign2 cls0254.lst finalref saverefs logit=20 keep=0.800000 mask=158 imask=-1 3 locfromfile phase refine quiet' on node0592
refine20.txt:Error, running 'classalign2 cls0253.lst finalref saverefs logit=20 keep=0.800000 mask=158 imask=-1 3 locfromfile phase refine quiet' on node0591
refine20.txt:Error, running 'make3d classes.20.hed out=threed.20.mrc mask=158 hard=25.00 logit=20 sym=d7 mode=2 pad=400' on node0591
refine20.txt:Error, running 'make3d classes.20.hed out=threed.20.mrc mask=158 hard=25.00 logit=20 sym=d7 mode=2 pad=400' on node0592
refine20.txt:Error, running 'make3d classes.20.hed out=threed.20.mrc mask=158 hard=25.00 logit=20 sym=d7 mode=2 pad=400' on node0593
refine20.txt:Error, running 'make3d classes.20.hed out=threed.20.mrc mask=158 hard=25.00 logit=20 sym=d7 mode=2 pad=400' on node0594
refine20.txt:Error, running 'proc3d threed.20.mrc threed.20a.mrc norm mask=158' on node0591
refine20.txt:Error, running 'proc3d threed.20.mrc threed.20a.mrc norm mask=158' on node0592
refine20.txt:Error, running 'proc3d threed.20.mrc threed.20a.mrc norm mask=158' on node0593
refine20.txt:Error, running 'proc3d threed.20.mrc threed.20a.mrc norm mask=158' on node0594
Updated by Eric Hou about 14 years ago
- Assignee changed from Eric Hou to Dmitry Lyumkis
- Priority changed from Normal to Urgent
- Target version set to Appion/Leginon 2.2.0
Updated by Bridget Carragher about 14 years ago
Thanks to Dmityr for figuring out that at least part of the reason that the job crashed and burned because the dmf copy of the stack and model failed.
Thanks to Christopher for figuring out that this seems to be because RC has changed form rsh to ssh for dmf.
Thanks to Christopher for figuring out how to get around this for Bridget. (Lauren you should ask him to set you up the same way as I think some of your problems arise form this)
Christopher is planning to come up with a generic way of doing this for all of us on all external clusters that will make all this happen like fairy dust and completely avoid the use of dmf.
Updated by Bridget Carragher about 14 years ago
- Status changed from Assigned to In Test
- Assignee changed from Dmitry Lyumkis to Bridget Carragher
- Priority changed from Urgent to Low
Updated by Bridget Carragher about 14 years ago
- Status changed from In Test to Closed