Project

General

Profile

Actions

Feature #3891

open

Distributed Resource Management Application API

Added by Sargis Dallakyan almost 9 years ago. Updated over 8 years ago.

Status:
Assigned
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
01/20/2016
Due date:
% Done:

0%

Estimated time:

Description

Adding an option to Appion config file so that it can submit jobs using the Slurm or other schedulers.


Related issues 2 (2 open0 closed)

Related to Appion - Feature #3000: support for SGE cluster typeMergeSargis Dallakyan01/14/2015

Actions
Blocks Appion - Feature #2836: bake appion processing recipesAssignedNeil Voss07/14/2014

Actions
Actions #1

Updated by Sargis Dallakyan almost 9 years ago

  • Description updated (diff)
Actions #2

Updated by Sargis Dallakyan over 8 years ago

Adding a link to https://pegasus.isi.edu/ Donny Shrum suggested for Workflow Management System.

Actions #3

Updated by Donny Shrum over 8 years ago

I’ve added support for a parameter named destinationsURL in the format:
destinationsURL=http://127.0.0.1/job_api/handler.php

The workflow when this parameter exists in the .appion.cfg file is as follows:

In processingHost.py -
destinationsURL gets added to the confDict
If destinationsURL is in the confDict the job headers are returned by a webservice using the headersFromWebSevice method in this class. If the parameter does not exist then the job headers are generated in torqueHost /sgeHost etc… just as before.

I’m passing/exposing the command and jobType from Agent class to the processingHost class so that this information may be used in turn by the webservice to customize the job parameters.

I added imports for requests, json and pwd to processingHost.

This code will replace the generateHeaders method in slurmHost.py but I’ll leave that in place as it is consistent with the other scheduler classes. I anticipate we’ll also use this design to replace translateOutput and checkJobStatus in the scheduler class.

As a result of this change we can set job parameters such as nodes, core, memory and queue based on individual users and groups at the RCC (Scott Stagg doesn’t always use the same resources grad students might be using) and we’ve also been able to force some jobs to run locally as opposed to via our cluster.

I’ve also attached the changes in a zip file. If you’d prefer that I check with you prior to committing changes let me know and I’ll do that… I wasn’t sure what makes it easiest for you to take a look at the changes but I was guessing it’s easy for you to take a quick look after I check the code in.

Actions #4

Updated by Sargis Dallakyan over 8 years ago

Actions #5

Updated by Neil Voss over 8 years ago

Actions #6

Updated by Neil Voss over 8 years ago

  • Status changed from New to Assigned
  • Assignee set to Donny Shrum

Putting Donny as the assigned

Actions #7

Updated by Anchi Cheng over 8 years ago

Donny, I am using this the first time. The way you pass command and jobtype to createProcessingHost seems to restrict one command for each call. Is that intended ?

Actions #8

Updated by Donny Shrum over 8 years ago

Anchi Cheng wrote:

Donny, I am using this the first time. The way you pass command and jobtype to createProcessingHost seems to restrict one command for each call. Is that intended ?

Hi Anchi,

That is a result of how generateHeaders is called. The scheduler specific classes (slurmHost.py, sgeHost.py, torqueHost.py) all have a method named generateHeaders that returns lines that are printed to the job file and those lines are based on that method as well as the job parameters contained in the configDict that is built per job.

An example:

#!/usr/bin/sh
#SBATCH -n 8
#SBATCH -t 240:00:00
#SBATCH -p stagg_q
#SBATCH --mem 4GB

Since those headers are generated per job that is where I inserted the webservice as it seemed to be the least disruptive and most logical placed based on the existing code... it works out that we would like to customize those parameters and likely ignore the user input; directing jobs to specific queues with specific settings based on the user submitting the job and the job being run (grad students go to backfill for example) We have a json file on our end that maps settings per user/group/job and Scott will be able to make quick edits to that.

Changes to the web service I have on our end are easy so I'm open to any changes in how that external service is called.

--Donny
FSU RCC

Actions #9

Updated by Anchi Cheng over 8 years ago

Donny,

Should we pass the job parameter into generateHeader in a more specific way rather than the command? I was using generateHeader to make job files with more than one command which can be handled by jobObject by setting its attributes. Here you take one command, and it is not clear what I would need to include in the command for your webservice to parse correctly the job parameters.

A related question: What does jobObject do in your new code? It is totally ignored by headersFromWebService (By the way, I assume WebSevice is a typo ?)

Actions #10

Updated by Donny Shrum over 8 years ago

Hi Anchi,

Should we pass the job parameter into generateHeader in a more specific way rather than the command?

I might be missing something here as I'm not nearly as familiar with the code as I'd like to be :) I'm bypassing the generateHeaders method which appears in multiple scheduler specific classes in the createJobFile method in the processingHost class.
header = None
if (self.destinationsURL):
header = self.headersFromWebSevice(currentJob)
else:
header = self.generateHeaders(currentJob)

I was using generateHeader to make job files with more than one command which can be handled by jobObject by setting its attributes. Here you take one command, and it is not clear what I would need to include in the command for your webservice to parse correctly the job parameters.

My thought is that the implementation of the webservice is site specific. So our webservice is looking at the specific job and the user that submitted the job, and the various queues they can submit to and we (Scott really) can make an edit to a json file that will set submission parameters regarding number of nodes, memory, which queue (partition for slurm.) The webservice receives json from appion that looks like this. {"username": "sstagg", "command": "destinations", "jobType": "partalign", "script": ["maxlikeAlignment.py", "--description=test", "--stack=1", "--lowpass=10", "--highpass=2000", "--num-part=173", "--num-ref=2", "--bin=3", "--angle-interval=5", "--max-iter=15", "--fast", "--fast-mode=normal", "--mirror", "--savemem", "--commit","--converge=normal", "--rundir=/lustre/cryo/lustre/appiondata/15nov02z/align/maxlike14", "--runname=maxlike14", "--projectid=461", "--expid=9681", "--jobtype=partalign", "--ppn=1", "--nodes=1", "--walltime=240", "--jobid=3241"]}

And the custom webservice on our end responds by sending back: {"result":0,"customResponse":{"header":"#!\/usr\/bin\/sh","prefix":"#SBATCH","execCommand":"sbatch","statusCommand":"squeue","options":{"-N":"1","-n":"8","-t":"96:00:00","--mem":"6GB","-p":"condor"}}}

And the headersFromWebService method converts that json into a header that the rest of the code base is expecting. Something like this:
#!/usr/bin/sh
#SBATCH -N 1
#SBATCH -n 8
#SBATCH -t 96:00:00
#SBATCH -p condor
#SBATCH --mem 6GB

That allows us to bury all our hairy site specific code in that web service and send reply to appion with just the header it expects. We can also decide if we'd like to include or ignore parameters that our users plug into the submit form regarding cores / memory as not all our users would know what to use or even what is available to them.

A related question: What does jobObject do in your new code? It is totally ignored by headersFromWebService (By the way, I assume WebSevice is a typo ?)

The jobObject contains all the script parameters that are passed to the headersFromWebService method and ultimately the web service. I might need to look again but as I recall the jobObject is exposed to the Agent Class but not to the processingHost class so I passed the command and job type from the agent class over to the processingHost class so that it could be used to generate the headers.

That was all done in an effort to disturb the existing code as little as possible and allow us to make custom changes to job submissions in a code base that is all on our end.

Donny Shrum
FSU RCC

Actions

Also available in: Atom PDF