Project

General

Profile

Setup job submission server » History » Version 42

Amber Herold, 01/08/2014 10:23 AM

1 1 Neil Voss
h1. Setup job submission server
2
3 23 Amber Herold
In this case, we are setting up a job submission server that will have all of the data directories mounted and external packages installed (EMAN, Xmipp, etc.) on the compute nodes. Most institutions have a job submission server already, but the data is not accessible. Appion is not set up for this scenario except for large reconstruction jobs. 
4 1 Neil Voss
5 38 Amber Herold
h2. .appion.cfg config file
6
7
The .appion.cfg config file is used to automatically create and submit job files to your job submission server. The sample config file provided in the [[Processing Server Installation]] instructions was created for the Torque Resource Manager. If a different resource manager is used, the .appion.cfg file will need to be modified appropriately.
8
9 7 Neil Voss
______
10
11 40 Amber Herold
12 6 Neil Voss
h2. PBS and the Torque Resource Manager
13 1 Neil Voss
14 6 Neil Voss
PBS stands for a "Portable Batch System":http://en.wikipedia.org/wiki/Portable_Batch_System. It is a job submission system meaning that users submit many jobs and the server prioritizes and executes each job as resources permit. Below we show how to install the popular open source PBS system called "TORQUE":http://en.wikipedia.org/wiki/TORQUE_Resource_Manager. 
15 1 Neil Voss
16 6 Neil Voss
A TORQUE cluster consists of one head node and many compute nodes. The head node runs the *pbs_server daemon* and the compute nodes run the *pbs_mom daemon*. Client commands for submitting and managing jobs can be installed on any host (including hosts not running pbs_server or pbs_mom). More documentation about Torque is "available here.":http://www.clusterresources.com/products/torque/docs/
17
18 7 Neil Voss
______
19
20 40 Amber Herold
h2. Alternate instructions
21
22 41 Amber Herold
It may be helpful to review the [[Install Torque|head node installation notes]] and [[Install Torque Client|client installation notes]] from a recent installation on CentOS 6.
23 40 Amber Herold
24 42 Amber Herold
______
25
26 40 Amber Herold
27 6 Neil Voss
h2. Head node installation
28
29 14 Neil Voss
h3. Install Torque-server
30 6 Neil Voss
31 24 Amber Herold
Torque available with Fedora and CentOS 5.4 (through the EPEL). For YUM based systems type:
32 6 Neil Voss
33 1 Neil Voss
<pre>
34 25 Amber Herold
sudo yum -y install torque-server torque-scheduler torque-client
35 1 Neil Voss
</pre>
36 7 Neil Voss
37 1 Neil Voss
h3. Initialize Torque-server, because PATH setting you will need to become root
38 25 Amber Herold
39
Make sure the directory containing the _pbs_server_ executable is in your PATH. For CentOS this is usually /usr/sbin.
40 8 Neil Voss
41 10 Neil Voss
<pre>
42 31 Neil Voss
sudo pbs_server -t create
43 10 Neil Voss
</pre>
44 9 Neil Voss
45 8 Neil Voss
h3. Activate Torque-server
46
47
Enable the torque pbs_mom daemon on reboot:
48 1 Neil Voss
49
<pre>
50 9 Neil Voss
sudo /sbin/chkconfig pbs_server on
51 15 Neil Voss
sudo /sbin/service pbs_server restart
52 22 Neil Voss
sudo /sbin/chkconfig pbs_sched on
53
sudo /sbin/service pbs_sched start
54 8 Neil Voss
</pre>
55
56 39 Amber Herold
h3. Add nodes to Torque-server nodes file: /var/lib/torque/server_priv/nodes
57 8 Neil Voss
58 17 Neil Voss
The format is:
59
<pre>
60
node-name[:ts] [np=] [properties]
61
</pre>
62
63
To add the localhost with two processors as a node, you would add:
64
65
<pre>
66
localhost np=2
67
</pre>
68
69
You should add every *compute node* to this file, e.g.,
70
71
<pre>
72
node01.INSTITUTE.EDU np=2
73
node02.INSTITUTE.EDU np=4
74
node03.INSTITUTE.EDU np=2
75
</pre>
76
77 7 Neil Voss
______
78 1 Neil Voss
79 40 Amber Herold
80 6 Neil Voss
h2. Compute node installation
81
82
h3. Install Torque-mom
83
84
Torque available in with Fedora and CentOS 5.4 (through the EPEL). For YUM based systems type:
85
86
<pre>
87
sudo yum -y install torque-mom torque-client
88
</pre>
89
90 18 Neil Voss
h3. Configure node to receive jobs from headnode:
91
92
bq. see http://www.clusterresources.com/products/torque/docs/1.2basicconfig.shtml#initializenode for more details
93
94 29 Neil Voss
Edit the /var/torque/mom_priv/config (CentOS 5) OR /var/lib/torque/mom_priv/config (CentOS 6) file:
95 18 Neil Voss
96
<pre>
97 21 Neil Voss
$pbsserver  headnode.INSTITUTE.EDU   # hostname running pbs_server
98 18 Neil Voss
</pre>
99 1 Neil Voss
100
For the localhost add:
101
102
<pre>
103 21 Neil Voss
$pbsserver  localhost   # hostname running pbs_server
104 1 Neil Voss
</pre>
105 18 Neil Voss
106 19 Neil Voss
h3. Activate Torque-mom
107 18 Neil Voss
108 19 Neil Voss
Enable the torque pbs_mom daemon on reboot:
109
110
<pre>
111
sudo /sbin/chkconfig pbs_mom on
112
sudo /sbin/service pbs_mom start
113
</pre>
114 20 Neil Voss
115 32 Neil Voss
h2. Munge
116
117
http://www.clusterresources.com/torquedocs/1.3advconfig.shtml
118
119 37 Anchi Cheng
Munge is an authentication service that creates and validates user credentials and other features
120 35 Neil Voss
121 32 Neil Voss
<pre>
122 33 Neil Voss
sudo create-munge-key
123 1 Neil Voss
sudo /sbin/chkconfig munge on
124
sudo service munge start
125 35 Neil Voss
sudo qmgr -c 'set server authorized_users=user01@host01'
126
sudo qmgr -c 'set server authorized_users=user01@host02'
127
sudo qmgr -c 'set server authorized_users=user01@*'
128 32 Neil Voss
</pre>
129 40 Amber Herold
130 20 Neil Voss
_________
131 40 Amber Herold
132 20 Neil Voss
133
h2. Test Torque Setup
134
135
On the head node, see if you can run a @qstat@:<pre>qstat</pre>
136 1 Neil Voss
137 29 Neil Voss
You can type:
138
<pre>
139
pbsnodes
140
</pre> to check the state of the compute clusters.
141 26 Amber Herold
142 20 Neil Voss
On the head node, create a job and submit it:
143
<pre>
144
echo "sleep 60" > test.job
145
echo "echo hello" >> test.job
146
qsub test.job
147
qstat
148
</pre>
149
150 36 Neil Voss
get all settings
151
<pre>
152
sudo qmgr -c 'list server'
153
</pre>
154 20 Neil Voss
155
156
157
158 2 Neil Voss
_________
159
160 27 Amber Herold
[[Setup Remote Processing|^ Setup Remote Processing]] | [[Install SSH module for PHP|Install SSH module for PHP >]]
161 2 Neil Voss
162
______