Project

General

Profile

Setup job submission server » History » Revision 35

Revision 34 (Neil Voss, 03/02/2012 01:35 PM) → Revision 35/43 (Neil Voss, 03/02/2012 01:36 PM)

h1. Setup job submission server 

 In this case, we are setting up a job submission server that will have all of the data directories mounted and external packages installed (EMAN, Xmipp, etc.) on the compute nodes. Most institutions have a job submission server already, but the data is not accessible. Appion is not set up for this scenario except for large reconstruction jobs.  

 ______ 

 h2. PBS and the Torque Resource Manager 

 PBS stands for a "Portable Batch System":http://en.wikipedia.org/wiki/Portable_Batch_System. It is a job submission system meaning that users submit many jobs and the server prioritizes and executes each job as resources permit. Below we show how to install the popular open source PBS system called "TORQUE":http://en.wikipedia.org/wiki/TORQUE_Resource_Manager.  

 A TORQUE cluster consists of one head node and many compute nodes. The head node runs the *pbs_server daemon* and the compute nodes run the *pbs_mom daemon*. Client commands for submitting and managing jobs can be installed on any host (including hosts not running pbs_server or pbs_mom). More documentation about Torque is "available here.":http://www.clusterresources.com/products/torque/docs/ 

 ______ 

 h2. Head node installation 

 h3. Install Torque-server 

 Torque available with Fedora and CentOS 5.4 (through the EPEL). For YUM based systems type: 

 <pre> 
 sudo yum -y install torque-server torque-scheduler torque-client 
 </pre> 

 h3. Initialize Torque-server, because PATH setting you will need to become root 

 Make sure the directory containing the _pbs_server_ executable is in your PATH. For CentOS this is usually /usr/sbin. 

 <pre> 
 sudo pbs_server -t create 
 </pre> 

 h3. Activate Torque-server 

 Enable the torque pbs_mom daemon on reboot: 

 <pre> 
 sudo /sbin/chkconfig pbs_server on 
 sudo /sbin/service pbs_server restart 
 sudo /sbin/chkconfig pbs_sched on 
 sudo /sbin/service pbs_sched start 
 </pre> 

 h3. Add nodes to Torque-server nodes file: /var/torque/server_priv/nodes 

 The format is: 
 <pre> 
 node-name[:ts] [np=] [properties] 
 </pre> 

 To add the localhost with two processors as a node, you would add: 

 <pre> 
 localhost np=2 
 </pre> 

 You should add every *compute node* to this file, e.g., 

 <pre> 
 node01.INSTITUTE.EDU np=2 
 node02.INSTITUTE.EDU np=4 
 node03.INSTITUTE.EDU np=2 
 </pre> 

 ______ 

 h2. Compute node installation 

 h3. Install Torque-mom 

 Torque available in with Fedora and CentOS 5.4 (through the EPEL). For YUM based systems type: 

 <pre> 
 sudo yum -y install torque-mom torque-client 
 </pre> 

 h3. Configure node to receive jobs from headnode: 

 bq. see http://www.clusterresources.com/products/torque/docs/1.2basicconfig.shtml#initializenode for more details 

 Edit the /var/torque/mom_priv/config (CentOS 5) OR /var/lib/torque/mom_priv/config (CentOS 6) file: 

 <pre> 
 $pbsserver    headnode.INSTITUTE.EDU     # hostname running pbs_server 
 </pre> 

 For the localhost add: 

 <pre> 
 $pbsserver    localhost     # hostname running pbs_server 
 </pre> 

 h3. Activate Torque-mom 

 Enable the torque pbs_mom daemon on reboot: 

 <pre> 
 sudo /sbin/chkconfig pbs_mom on 
 sudo /sbin/service pbs_mom start 
 </pre> 

 h2. Munge 

 http://www.clusterresources.com/torquedocs/1.3advconfig.shtml 

 Munge is a tool to prevent users from certain nodes and other features 

 <pre> 
 sudo create-munge-key 
 sudo /sbin/chkconfig munge on 
 sudo service munge start 
 sudo qmgr -c 'set server authorized_users=user01@host01' 
 sudo qmgr -c 'set server authorized_users=user01@host02' 
 sudo qmgr -c 'set server authorized_users=user01@*' 
 </pre> 
 _________ 

 h2. Test Torque Setup 

 On the head node, see if you can run a @qstat@:<pre>qstat</pre> 

 You can type: 
 <pre> 
 pbsnodes 
 </pre> to check the state of the compute clusters. 

 On the head node, create a job and submit it: 
 <pre> 
 echo "sleep 60" > test.job 
 echo "echo hello" >> test.job 
 qsub test.job 
 qstat 
 </pre> 






 _________ 

 [[Setup Remote Processing|^ Setup Remote Processing]] | [[Install SSH module for PHP|Install SSH module for PHP >]] 

 ______