Bug #1087
closed
Difficult to tell when a Guppy job is queued, running or stalled/dead.
Added by Bridget Carragher almost 14 years ago.
Updated almost 14 years ago.
Affected Version:
Appion/Leginon 2.1.0
Description
I have submitted a job to guppy (trying to figure out if it will run there as it will not run on garibaldi). The job says queued but there does not seem to be anything else running on guppy? Or perhaps there is and I can't see them. How can we check the queue so that we know how long the wait is? If it is queing instead of starting hwo do I get it going?
Files
From Bridget:
And just for more info. This is what the page is currently telling me:
I.e. I can;t figure out if it is running or not. The Jobs currently running seems to imply running by the other info is saying queued.
Can someone illuminate me please.
- Subject changed from Guppy jobs to Difficult to tell when a Guppy job is queued, running or stalled/dead.
- Priority changed from Normal to High
- Target version set to Appion/Leginon 2.2.0
- Deliverable set to 2.2 Bug Reduction
So guppy does seem to run and then really does upload. I have a successful job now reported all the way back to the web pages. What is broken is that the queuing vs. running is not reported preperly (or at all) and so it is impossible to monitor the progress of the job. Then when the job is finished it garbles the job report in the upload web pages. But if you just go ahead and try to upload anyway that seems to work. I suspect it is just a little change somewhere that has broken this. Christopher and I chatted and he is going to upgrade Garibaldi to the latest version (just for me, not for everyone) and then we will know if this breaks Garibaldi in the same way (in which case we can search for diffs to see why) and if not we will know it is a Garibaldi vs. guppy difference.
- Assignee set to Neil Voss
r15197 should fix this. Neil, would you mind reviewing this if you have a chance? Bridget could you please test this again. I tested with an Eman Reconstruction on Guppy.
The fixed code should be available for testing on cronus3/betamyamiweb tomorrow.
Neil, it looks like the running status was being overwritten with Queued after the job is submitted. I noticed that you have some comments that we should be updating the status in the python code rather than in the job file but I'm not sure how that would be implemented. We update the status at the end of the job file to indicate that it is complete and it seems like there should be a better way. Did you already have something in mind?
After we test this out some more it needs to be merged into the branches. This should also fix #706.
- Status changed from New to In Code Review
I looked at the code, but I am not familiar with what is going wrong enough to really understand it, but it looks okay.
- Status changed from In Code Review to In Test
- Assignee changed from Neil Voss to Bridget Carragher
Sorry Neil, the history made it look like you had been involved in this code...Anchi also reviewed it so it's on to test...
- Status changed from In Test to Closed
Looks like this is all working ok now. well done appion team!
but the old zombie jobs still need to be killed. is there some sort of automated procedure we could run over all jobs now and again to clear out all zombie jobs? this is for sure not critical but might be something useful for he overall management of the software.
- Status changed from Closed to Assigned
- Assignee changed from Bridget Carragher to Amber Herold
Hmm sorry but I think I spoke too soon. The frealign job did not seem to ever complete. So I killed this and tried submitting an eman job. It is jsut sitting there in Q but nothing else seems to be runing.
I believe the job was actually queued and did finish eventually.
Since we spent lots of time on making Frealign to work. Please make sure all the documentation and user guide are up to date.
Thanks.
Eric
- Status changed from Assigned to Closed
Looks like Lauren and Dmitry have completed documentation updates.
Also available in: Atom
PDF