Feature #1922
closeduse multiple qsub to launch multiple frealign to refine chunks of particle
0%
Description
This makes frealign able to be divided into more than total available processors since refine step uses single processor per frealign instance. It then gives back the resource while it is doing the time consuming but openmp-only reconstruction step.
Updated by Anchi Cheng over 12 years ago
- Status changed from New to Assigned
- Assignee changed from Anchi Cheng to Amber Herold
- Priority changed from Normal to High
r16954 and r16955 does as much as I can implemented now.
The behavior is the following:
apRefineJobFrealign calls the new setupMPIRun or setupMPRun functions defined in its base class apRefineJob.py which will create the taskSender.py script and set launch_as_shell to True for both the refinement (MPI) and recon (MP) part of Frealign execution. Since we can not start qsub within a qsub, the main job need to be run from headnode and monitor the parallel tasks submitted to queue before executing the next command. This is achieved by saving the task qsub jobid to the file 'task_jobids.save' and clear them after taskMonitor.py finds all tasks in the file done or aborted.
Amber, I have to hand this off to you if you get time to work on it before end of September.
What is left is to find a way to keep AppionJobData status to R while the shell is running. Could not use return code because xxx.appionsub.commands need to be run in the background so that ssh can be closed. The way it is, Appion web would think it is done because I have to give it a fake jobid.
Updated by Anchi Cheng over 11 years ago
- Target version set to Appion/Leginon 3.0.0
Let me know if you can not work on this. We will need a workaround for 3.0 release.
Updated by Amber Herold over 10 years ago
- Status changed from Assigned to Won't Fix or Won't Do
This is no longer relevant as Brad has rewritten the frealign integration.