Project

General

Profile

Setup job submission server » History » Version 36

Neil Voss, 03/05/2012 01:20 PM

1 1 Neil Voss
h1. Setup job submission server
2
3 23 Amber Herold
In this case, we are setting up a job submission server that will have all of the data directories mounted and external packages installed (EMAN, Xmipp, etc.) on the compute nodes. Most institutions have a job submission server already, but the data is not accessible. Appion is not set up for this scenario except for large reconstruction jobs. 
4 1 Neil Voss
5 7 Neil Voss
______
6
7 6 Neil Voss
h2. PBS and the Torque Resource Manager
8 1 Neil Voss
9 6 Neil Voss
PBS stands for a "Portable Batch System":http://en.wikipedia.org/wiki/Portable_Batch_System. It is a job submission system meaning that users submit many jobs and the server prioritizes and executes each job as resources permit. Below we show how to install the popular open source PBS system called "TORQUE":http://en.wikipedia.org/wiki/TORQUE_Resource_Manager. 
10 1 Neil Voss
11 6 Neil Voss
A TORQUE cluster consists of one head node and many compute nodes. The head node runs the *pbs_server daemon* and the compute nodes run the *pbs_mom daemon*. Client commands for submitting and managing jobs can be installed on any host (including hosts not running pbs_server or pbs_mom). More documentation about Torque is "available here.":http://www.clusterresources.com/products/torque/docs/
12
13 7 Neil Voss
______
14
15 6 Neil Voss
h2. Head node installation
16
17 14 Neil Voss
h3. Install Torque-server
18 6 Neil Voss
19 24 Amber Herold
Torque available with Fedora and CentOS 5.4 (through the EPEL). For YUM based systems type:
20 6 Neil Voss
21 1 Neil Voss
<pre>
22 25 Amber Herold
sudo yum -y install torque-server torque-scheduler torque-client
23 1 Neil Voss
</pre>
24 7 Neil Voss
25 1 Neil Voss
h3. Initialize Torque-server, because PATH setting you will need to become root
26 25 Amber Herold
27
Make sure the directory containing the _pbs_server_ executable is in your PATH. For CentOS this is usually /usr/sbin.
28 8 Neil Voss
29 10 Neil Voss
<pre>
30 31 Neil Voss
sudo pbs_server -t create
31 10 Neil Voss
</pre>
32 9 Neil Voss
33 8 Neil Voss
h3. Activate Torque-server
34
35
Enable the torque pbs_mom daemon on reboot:
36 1 Neil Voss
37
<pre>
38 9 Neil Voss
sudo /sbin/chkconfig pbs_server on
39 15 Neil Voss
sudo /sbin/service pbs_server restart
40 22 Neil Voss
sudo /sbin/chkconfig pbs_sched on
41
sudo /sbin/service pbs_sched start
42 8 Neil Voss
</pre>
43
44 16 Neil Voss
h3. Add nodes to Torque-server nodes file: /var/torque/server_priv/nodes
45 8 Neil Voss
46 17 Neil Voss
The format is:
47
<pre>
48
node-name[:ts] [np=] [properties]
49
</pre>
50
51
To add the localhost with two processors as a node, you would add:
52
53
<pre>
54
localhost np=2
55
</pre>
56
57
You should add every *compute node* to this file, e.g.,
58
59
<pre>
60
node01.INSTITUTE.EDU np=2
61
node02.INSTITUTE.EDU np=4
62
node03.INSTITUTE.EDU np=2
63
</pre>
64
65 7 Neil Voss
______
66 1 Neil Voss
67 6 Neil Voss
h2. Compute node installation
68
69
h3. Install Torque-mom
70
71
Torque available in with Fedora and CentOS 5.4 (through the EPEL). For YUM based systems type:
72
73
<pre>
74
sudo yum -y install torque-mom torque-client
75
</pre>
76
77 18 Neil Voss
h3. Configure node to receive jobs from headnode:
78
79
bq. see http://www.clusterresources.com/products/torque/docs/1.2basicconfig.shtml#initializenode for more details
80
81 29 Neil Voss
Edit the /var/torque/mom_priv/config (CentOS 5) OR /var/lib/torque/mom_priv/config (CentOS 6) file:
82 18 Neil Voss
83
<pre>
84 21 Neil Voss
$pbsserver  headnode.INSTITUTE.EDU   # hostname running pbs_server
85 18 Neil Voss
</pre>
86 1 Neil Voss
87
For the localhost add:
88
89
<pre>
90 21 Neil Voss
$pbsserver  localhost   # hostname running pbs_server
91 1 Neil Voss
</pre>
92 18 Neil Voss
93 19 Neil Voss
h3. Activate Torque-mom
94 18 Neil Voss
95 19 Neil Voss
Enable the torque pbs_mom daemon on reboot:
96
97
<pre>
98
sudo /sbin/chkconfig pbs_mom on
99
sudo /sbin/service pbs_mom start
100
</pre>
101 20 Neil Voss
102 32 Neil Voss
h2. Munge
103
104
http://www.clusterresources.com/torquedocs/1.3advconfig.shtml
105
106 35 Neil Voss
Munge is a tool to prevent users from certain nodes and other features
107
108 32 Neil Voss
<pre>
109 33 Neil Voss
sudo create-munge-key
110 1 Neil Voss
sudo /sbin/chkconfig munge on
111
sudo service munge start
112 35 Neil Voss
sudo qmgr -c 'set server authorized_users=user01@host01'
113
sudo qmgr -c 'set server authorized_users=user01@host02'
114
sudo qmgr -c 'set server authorized_users=user01@*'
115 32 Neil Voss
</pre>
116 20 Neil Voss
_________
117
118
h2. Test Torque Setup
119
120
On the head node, see if you can run a @qstat@:<pre>qstat</pre>
121 1 Neil Voss
122 29 Neil Voss
You can type:
123
<pre>
124
pbsnodes
125
</pre> to check the state of the compute clusters.
126 26 Amber Herold
127 20 Neil Voss
On the head node, create a job and submit it:
128
<pre>
129
echo "sleep 60" > test.job
130
echo "echo hello" >> test.job
131
qsub test.job
132
qstat
133
</pre>
134
135 36 Neil Voss
get all settings
136
<pre>
137
sudo qmgr -c 'list server'
138
</pre>
139 20 Neil Voss
140
141
142
143 2 Neil Voss
_________
144
145 27 Amber Herold
[[Setup Remote Processing|^ Setup Remote Processing]] | [[Install SSH module for PHP|Install SSH module for PHP >]]
146 2 Neil Voss
147
______