Project

General

Profile

Setup job submission server » History » Version 20

Neil Voss, 05/27/2010 11:33 AM

1 1 Neil Voss
h1. Setup job submission server
2
3 6 Neil Voss
In this case, we are setting up a job submission server that will have all of the data directories mounted and external packages installed (EMAN, Xmipp, etc.) on the system. Most institutions have a job submission server already, but the data is not accessible. Appion is not set up for this scenario except for large reconstruction jobs. 
4 1 Neil Voss
5 7 Neil Voss
______
6
7 6 Neil Voss
h2. PBS and the Torque Resource Manager
8 1 Neil Voss
9 6 Neil Voss
PBS stands for a "Portable Batch System":http://en.wikipedia.org/wiki/Portable_Batch_System. It is a job submission system meaning that users submit many jobs and the server prioritizes and executes each job as resources permit. Below we show how to install the popular open source PBS system called "TORQUE":http://en.wikipedia.org/wiki/TORQUE_Resource_Manager. 
10 1 Neil Voss
11 6 Neil Voss
A TORQUE cluster consists of one head node and many compute nodes. The head node runs the *pbs_server daemon* and the compute nodes run the *pbs_mom daemon*. Client commands for submitting and managing jobs can be installed on any host (including hosts not running pbs_server or pbs_mom). More documentation about Torque is "available here.":http://www.clusterresources.com/products/torque/docs/
12
13 7 Neil Voss
______
14
15 6 Neil Voss
h2. Head node installation
16
17 14 Neil Voss
h3. Install Torque-server
18 6 Neil Voss
19
Torque available in with Fedora and CentOS 5.4 (through the EPEL). For YUM based systems type:
20
21 1 Neil Voss
<pre>
22 6 Neil Voss
sudo yum -y install torque-server torque-scheduler
23 1 Neil Voss
</pre>
24 7 Neil Voss
25 14 Neil Voss
h3. Initialize Torque-server, because PATH setting you will need to become root
26 8 Neil Voss
27 10 Neil Voss
<pre>
28 12 Neil Voss
sudo su
29
/usr/share/doc/torque-2.3.10/torque.setup root
30
exit
31 10 Neil Voss
</pre>
32 9 Neil Voss
33 8 Neil Voss
h3. Activate Torque-server
34
35
Enable the torque pbs_mom daemon on reboot:
36 1 Neil Voss
37
<pre>
38 9 Neil Voss
sudo /sbin/chkconfig pbs_server on
39 15 Neil Voss
sudo /sbin/service pbs_server restart
40 8 Neil Voss
</pre>
41
42 16 Neil Voss
h3. Add nodes to Torque-server nodes file: /var/torque/server_priv/nodes
43 8 Neil Voss
44 17 Neil Voss
The format is:
45
<pre>
46
node-name[:ts] [np=] [properties]
47
</pre>
48
49
To add the localhost with two processors as a node, you would add:
50
51
<pre>
52
localhost np=2
53
</pre>
54
55
You should add every *compute node* to this file, e.g.,
56
57
<pre>
58
node01.INSTITUTE.EDU np=2
59
node02.INSTITUTE.EDU np=4
60
node03.INSTITUTE.EDU np=2
61
</pre>
62
63 7 Neil Voss
______
64 1 Neil Voss
65 6 Neil Voss
h2. Compute node installation
66
67
h3. Install Torque-mom
68
69
Torque available in with Fedora and CentOS 5.4 (through the EPEL). For YUM based systems type:
70
71
<pre>
72
sudo yum -y install torque-mom torque-client
73
</pre>
74
75 18 Neil Voss
h3. Configure node to receive jobs from headnode:
76
77
bq. see http://www.clusterresources.com/products/torque/docs/1.2basicconfig.shtml#initializenode for more details
78
79
Edit the /var/torque/mom_priv/config file:
80
81
<pre>
82 1 Neil Voss
$pbsserver      headnode          # hostname running pbs_server
83 18 Neil Voss
</pre>
84 1 Neil Voss
85
For the localhost add:
86
87
<pre>
88
$pbsserver      localhost          # hostname running pbs_server
89
</pre>
90 18 Neil Voss
91 19 Neil Voss
h3. Activate Torque-mom
92 18 Neil Voss
93 19 Neil Voss
Enable the torque pbs_mom daemon on reboot:
94
95
<pre>
96
sudo /sbin/chkconfig pbs_mom on
97
sudo /sbin/service pbs_mom start
98
</pre>
99 20 Neil Voss
100
_________
101
102
h2. Test Torque Setup
103
104
On the head node, see if you can run a @qstat@:<pre>qstat</pre>
105
106
On the head node, create a job and submit it:
107
<pre>
108
echo "sleep 60" > test.job
109
echo "echo hello" >> test.job
110
qsub test.job
111
qstat
112
</pre>
113
114
115
116
117
118
119 2 Neil Voss
_________
120
121 5 Neil Voss
[[Setup Remote Processing|^ Setup Remote Processing]] | [[Configure web server to submit jobs|Configure web server to submit jobs >]]
122 2 Neil Voss
123
______