Project

General

Profile

Setup job submission server » History » Version 40

Amber Herold, 01/08/2014 10:16 AM

1 1 Neil Voss
h1. Setup job submission server
2
3 23 Amber Herold
In this case, we are setting up a job submission server that will have all of the data directories mounted and external packages installed (EMAN, Xmipp, etc.) on the compute nodes. Most institutions have a job submission server already, but the data is not accessible. Appion is not set up for this scenario except for large reconstruction jobs. 
4 1 Neil Voss
5 38 Amber Herold
h2. .appion.cfg config file
6
7
The .appion.cfg config file is used to automatically create and submit job files to your job submission server. The sample config file provided in the [[Processing Server Installation]] instructions was created for the Torque Resource Manager. If a different resource manager is used, the .appion.cfg file will need to be modified appropriately.
8
9 7 Neil Voss
______
10
11 40 Amber Herold
12 6 Neil Voss
h2. PBS and the Torque Resource Manager
13 1 Neil Voss
14 6 Neil Voss
PBS stands for a "Portable Batch System":http://en.wikipedia.org/wiki/Portable_Batch_System. It is a job submission system meaning that users submit many jobs and the server prioritizes and executes each job as resources permit. Below we show how to install the popular open source PBS system called "TORQUE":http://en.wikipedia.org/wiki/TORQUE_Resource_Manager. 
15 1 Neil Voss
16 6 Neil Voss
A TORQUE cluster consists of one head node and many compute nodes. The head node runs the *pbs_server daemon* and the compute nodes run the *pbs_mom daemon*. Client commands for submitting and managing jobs can be installed on any host (including hosts not running pbs_server or pbs_mom). More documentation about Torque is "available here.":http://www.clusterresources.com/products/torque/docs/
17
18 7 Neil Voss
______
19
20 40 Amber Herold
h2. Alternate instructions
21
22
It may be helpful to review the [[Install Torque|installation notes]] from a recent installation on CentOS 6.
23
24
25 6 Neil Voss
h2. Head node installation
26
27 14 Neil Voss
h3. Install Torque-server
28 6 Neil Voss
29 24 Amber Herold
Torque available with Fedora and CentOS 5.4 (through the EPEL). For YUM based systems type:
30 6 Neil Voss
31 1 Neil Voss
<pre>
32 25 Amber Herold
sudo yum -y install torque-server torque-scheduler torque-client
33 1 Neil Voss
</pre>
34 7 Neil Voss
35 1 Neil Voss
h3. Initialize Torque-server, because PATH setting you will need to become root
36 25 Amber Herold
37
Make sure the directory containing the _pbs_server_ executable is in your PATH. For CentOS this is usually /usr/sbin.
38 8 Neil Voss
39 10 Neil Voss
<pre>
40 31 Neil Voss
sudo pbs_server -t create
41 10 Neil Voss
</pre>
42 9 Neil Voss
43 8 Neil Voss
h3. Activate Torque-server
44
45
Enable the torque pbs_mom daemon on reboot:
46 1 Neil Voss
47
<pre>
48 9 Neil Voss
sudo /sbin/chkconfig pbs_server on
49 15 Neil Voss
sudo /sbin/service pbs_server restart
50 22 Neil Voss
sudo /sbin/chkconfig pbs_sched on
51
sudo /sbin/service pbs_sched start
52 8 Neil Voss
</pre>
53
54 39 Amber Herold
h3. Add nodes to Torque-server nodes file: /var/lib/torque/server_priv/nodes
55 8 Neil Voss
56 17 Neil Voss
The format is:
57
<pre>
58
node-name[:ts] [np=] [properties]
59
</pre>
60
61
To add the localhost with two processors as a node, you would add:
62
63
<pre>
64
localhost np=2
65
</pre>
66
67
You should add every *compute node* to this file, e.g.,
68
69
<pre>
70
node01.INSTITUTE.EDU np=2
71
node02.INSTITUTE.EDU np=4
72
node03.INSTITUTE.EDU np=2
73
</pre>
74
75 7 Neil Voss
______
76 1 Neil Voss
77 40 Amber Herold
78 6 Neil Voss
h2. Compute node installation
79
80
h3. Install Torque-mom
81
82
Torque available in with Fedora and CentOS 5.4 (through the EPEL). For YUM based systems type:
83
84
<pre>
85
sudo yum -y install torque-mom torque-client
86
</pre>
87
88 18 Neil Voss
h3. Configure node to receive jobs from headnode:
89
90
bq. see http://www.clusterresources.com/products/torque/docs/1.2basicconfig.shtml#initializenode for more details
91
92 29 Neil Voss
Edit the /var/torque/mom_priv/config (CentOS 5) OR /var/lib/torque/mom_priv/config (CentOS 6) file:
93 18 Neil Voss
94
<pre>
95 21 Neil Voss
$pbsserver  headnode.INSTITUTE.EDU   # hostname running pbs_server
96 18 Neil Voss
</pre>
97 1 Neil Voss
98
For the localhost add:
99
100
<pre>
101 21 Neil Voss
$pbsserver  localhost   # hostname running pbs_server
102 1 Neil Voss
</pre>
103 18 Neil Voss
104 19 Neil Voss
h3. Activate Torque-mom
105 18 Neil Voss
106 19 Neil Voss
Enable the torque pbs_mom daemon on reboot:
107
108
<pre>
109
sudo /sbin/chkconfig pbs_mom on
110
sudo /sbin/service pbs_mom start
111
</pre>
112 20 Neil Voss
113 32 Neil Voss
h2. Munge
114
115
http://www.clusterresources.com/torquedocs/1.3advconfig.shtml
116
117 37 Anchi Cheng
Munge is an authentication service that creates and validates user credentials and other features
118 35 Neil Voss
119 32 Neil Voss
<pre>
120 33 Neil Voss
sudo create-munge-key
121 1 Neil Voss
sudo /sbin/chkconfig munge on
122
sudo service munge start
123 35 Neil Voss
sudo qmgr -c 'set server authorized_users=user01@host01'
124
sudo qmgr -c 'set server authorized_users=user01@host02'
125
sudo qmgr -c 'set server authorized_users=user01@*'
126 32 Neil Voss
</pre>
127 40 Amber Herold
128 20 Neil Voss
_________
129 40 Amber Herold
130 20 Neil Voss
131
h2. Test Torque Setup
132
133
On the head node, see if you can run a @qstat@:<pre>qstat</pre>
134 1 Neil Voss
135 29 Neil Voss
You can type:
136
<pre>
137
pbsnodes
138
</pre> to check the state of the compute clusters.
139 26 Amber Herold
140 20 Neil Voss
On the head node, create a job and submit it:
141
<pre>
142
echo "sleep 60" > test.job
143
echo "echo hello" >> test.job
144
qsub test.job
145
qstat
146
</pre>
147
148 36 Neil Voss
get all settings
149
<pre>
150
sudo qmgr -c 'list server'
151
</pre>
152 20 Neil Voss
153
154
155
156 2 Neil Voss
_________
157
158 27 Amber Herold
[[Setup Remote Processing|^ Setup Remote Processing]] | [[Install SSH module for PHP|Install SSH module for PHP >]]
159 2 Neil Voss
160
______