Project

General

Profile

Actions

Bug #1087

closed

Difficult to tell when a Guppy job is queued, running or stalled/dead.

Added by Bridget Carragher over 13 years ago. Updated over 13 years ago.

Status:
Closed
Priority:
High
Assignee:
Amber Herold
Category:
-
Target version:
Start date:
12/14/2010
Due date:
% Done:

0%

Estimated time:
Affected Version:
Appion/Leginon 2.1.0
Show in known bugs:
No
Workaround:

Description

I have submitted a job to guppy (trying to figure out if it will run there as it will not run on garibaldi). The job says queued but there does not seem to be anything else running on guppy? Or perhaps there is and I can't see them. How can we check the queue so that we know how long the wait is? If it is queing instead of starting hwo do I get it going?


Files

pastedGraphic.tiff (3.18 MB) pastedGraphic.tiff Amber Herold, 12/14/2010 09:09 PM

Related issues 6 (2 open4 closed)

Related to Appion - Bug #1069: Cluster Job Status not updating properlyClosedAmber Herold12/08/2010

Actions
Related to Appion - Bug #534: Jobs show as qeued instead of runningClosed05/21/2010

Actions
Related to Appion - Bug #747: Appion shows job is queued when job doesn't runNew07/19/2010

Actions
Related to Appion - Bug #706: Reconstruction jobs on Goby are being listed as "queued" when they are runningClosedAmber Herold06/24/2010

Actions
Related to Appion - Bug #612: Stack creation states complete though still runningAssigned06/03/2010

Actions
Related to Appion - Bug #449: updateAppionDB does not always workClosedAmber Herold05/11/2010

Actions
Actions #1

Updated by Amber Herold over 13 years ago

From Bridget:

And just for more info. This is what the page is currently telling me:

I.e. I can;t figure out if it is running or not. The Jobs currently running seems to imply running by the other info is saying queued.
Can someone illuminate me please.

Actions #2

Updated by Amber Herold over 13 years ago

  • Subject changed from Guppy jobs to Difficult to tell when a Guppy job is queued, running or stalled/dead.
  • Priority changed from Normal to High
  • Target version set to Appion/Leginon 2.2.0
Actions #3

Updated by Amber Herold over 13 years ago

  • Deliverable set to 2.2 Bug Reduction
Actions #4

Updated by Bridget Carragher over 13 years ago

So guppy does seem to run and then really does upload. I have a successful job now reported all the way back to the web pages. What is broken is that the queuing vs. running is not reported preperly (or at all) and so it is impossible to monitor the progress of the job. Then when the job is finished it garbles the job report in the upload web pages. But if you just go ahead and try to upload anyway that seems to work. I suspect it is just a little change somewhere that has broken this. Christopher and I chatted and he is going to upgrade Garibaldi to the latest version (just for me, not for everyone) and then we will know if this breaks Garibaldi in the same way (in which case we can search for diffs to see why) and if not we will know it is a Garibaldi vs. guppy difference.

Actions #5

Updated by Amber Herold over 13 years ago

  • Assignee set to Neil Voss

r15197 should fix this. Neil, would you mind reviewing this if you have a chance? Bridget could you please test this again. I tested with an Eman Reconstruction on Guppy.

The fixed code should be available for testing on cronus3/betamyamiweb tomorrow.

Neil, it looks like the running status was being overwritten with Queued after the job is submitted. I noticed that you have some comments that we should be updating the status in the python code rather than in the job file but I'm not sure how that would be implemented. We update the status at the end of the job file to indicate that it is complete and it seems like there should be a better way. Did you already have something in mind?

After we test this out some more it needs to be merged into the branches. This should also fix #706.

Actions #6

Updated by Amber Herold over 13 years ago

  • Status changed from New to In Code Review
Actions #7

Updated by Neil Voss over 13 years ago

I looked at the code, but I am not familiar with what is going wrong enough to really understand it, but it looks okay.

Actions #8

Updated by Amber Herold over 13 years ago

  • Status changed from In Code Review to In Test
  • Assignee changed from Neil Voss to Bridget Carragher

Sorry Neil, the history made it look like you had been involved in this code...Anchi also reviewed it so it's on to test...

Actions #9

Updated by Bridget Carragher over 13 years ago

  • Status changed from In Test to Closed

Looks like this is all working ok now. well done appion team!
but the old zombie jobs still need to be killed. is there some sort of automated procedure we could run over all jobs now and again to clear out all zombie jobs? this is for sure not critical but might be something useful for he overall management of the software.

Actions #10

Updated by Bridget Carragher over 13 years ago

  • Status changed from Closed to Assigned
  • Assignee changed from Bridget Carragher to Amber Herold

Hmm sorry but I think I spoke too soon. The frealign job did not seem to ever complete. So I killed this and tried submitting an eman job. It is jsut sitting there in Q but nothing else seems to be runing.

Actions #11

Updated by Amber Herold over 13 years ago

I believe the job was actually queued and did finish eventually.

Actions #12

Updated by Eric Hou over 13 years ago

Since we spent lots of time on making Frealign to work. Please make sure all the documentation and user guide are up to date.

Thanks.

Eric

Actions #13

Updated by Amber Herold over 13 years ago

  • Status changed from Assigned to Closed

Looks like Lauren and Dmitry have completed documentation updates.

Actions

Also available in: Atom PDF