Project

General

Profile

Setup job submission server » History » Version 43

Sargis Dallakyan, 05/26/2015 10:39 AM

1 1 Neil Voss
h1. Setup job submission server
2
3 23 Amber Herold
In this case, we are setting up a job submission server that will have all of the data directories mounted and external packages installed (EMAN, Xmipp, etc.) on the compute nodes. Most institutions have a job submission server already, but the data is not accessible. Appion is not set up for this scenario except for large reconstruction jobs. 
4 1 Neil Voss
5 38 Amber Herold
h2. .appion.cfg config file
6
7
The .appion.cfg config file is used to automatically create and submit job files to your job submission server. The sample config file provided in the [[Processing Server Installation]] instructions was created for the Torque Resource Manager. If a different resource manager is used, the .appion.cfg file will need to be modified appropriately.
8
9 7 Neil Voss
______
10
11 40 Amber Herold
12 6 Neil Voss
h2. PBS and the Torque Resource Manager
13 1 Neil Voss
14 6 Neil Voss
PBS stands for a "Portable Batch System":http://en.wikipedia.org/wiki/Portable_Batch_System. It is a job submission system meaning that users submit many jobs and the server prioritizes and executes each job as resources permit. Below we show how to install the popular open source PBS system called "TORQUE":http://en.wikipedia.org/wiki/TORQUE_Resource_Manager. 
15 1 Neil Voss
16 43 Sargis Dallakyan
A TORQUE cluster consists of one head node and many compute nodes. The head node runs the *pbs_server daemon* and the compute nodes run the *pbs_mom daemon*. Client commands for submitting and managing jobs can be installed on any host (including hosts not running pbs_server or pbs_mom). More documentation about Torque is "available here.":http://docs.adaptivecomputing.com/torque/5-1-0/help.htm#topics/torque/0-intro/introduction.htm
17 7 Neil Voss
______
18
19 40 Amber Herold
h2. Alternate instructions
20
21 41 Amber Herold
It may be helpful to review the [[Install Torque|head node installation notes]] and [[Install Torque Client|client installation notes]] from a recent installation on CentOS 6.
22 40 Amber Herold
23 42 Amber Herold
______
24
25 40 Amber Herold
26 6 Neil Voss
h2. Head node installation
27
28 14 Neil Voss
h3. Install Torque-server
29 6 Neil Voss
30 24 Amber Herold
Torque available with Fedora and CentOS 5.4 (through the EPEL). For YUM based systems type:
31 6 Neil Voss
32 1 Neil Voss
<pre>
33 25 Amber Herold
sudo yum -y install torque-server torque-scheduler torque-client
34 1 Neil Voss
</pre>
35 7 Neil Voss
36 1 Neil Voss
h3. Initialize Torque-server, because PATH setting you will need to become root
37 25 Amber Herold
38
Make sure the directory containing the _pbs_server_ executable is in your PATH. For CentOS this is usually /usr/sbin.
39 8 Neil Voss
40 10 Neil Voss
<pre>
41 31 Neil Voss
sudo pbs_server -t create
42 10 Neil Voss
</pre>
43 9 Neil Voss
44 8 Neil Voss
h3. Activate Torque-server
45
46
Enable the torque pbs_mom daemon on reboot:
47 1 Neil Voss
48
<pre>
49 9 Neil Voss
sudo /sbin/chkconfig pbs_server on
50 15 Neil Voss
sudo /sbin/service pbs_server restart
51 22 Neil Voss
sudo /sbin/chkconfig pbs_sched on
52
sudo /sbin/service pbs_sched start
53 8 Neil Voss
</pre>
54
55 39 Amber Herold
h3. Add nodes to Torque-server nodes file: /var/lib/torque/server_priv/nodes
56 8 Neil Voss
57 17 Neil Voss
The format is:
58
<pre>
59
node-name[:ts] [np=] [properties]
60
</pre>
61
62
To add the localhost with two processors as a node, you would add:
63
64
<pre>
65
localhost np=2
66
</pre>
67
68
You should add every *compute node* to this file, e.g.,
69
70
<pre>
71
node01.INSTITUTE.EDU np=2
72
node02.INSTITUTE.EDU np=4
73
node03.INSTITUTE.EDU np=2
74
</pre>
75
76 7 Neil Voss
______
77 1 Neil Voss
78 40 Amber Herold
79 6 Neil Voss
h2. Compute node installation
80
81
h3. Install Torque-mom
82
83
Torque available in with Fedora and CentOS 5.4 (through the EPEL). For YUM based systems type:
84
85
<pre>
86
sudo yum -y install torque-mom torque-client
87
</pre>
88
89 1 Neil Voss
h3. Configure node to receive jobs from headnode:
90 18 Neil Voss
91 43 Sargis Dallakyan
bq. see http://docs.adaptivecomputing.com/torque/5-1-0/help.htm#topics/torque/1-installConfig/computeNodes.htm for more details
92 18 Neil Voss
93 29 Neil Voss
Edit the /var/torque/mom_priv/config (CentOS 5) OR /var/lib/torque/mom_priv/config (CentOS 6) file:
94 18 Neil Voss
95
<pre>
96 21 Neil Voss
$pbsserver  headnode.INSTITUTE.EDU   # hostname running pbs_server
97 18 Neil Voss
</pre>
98 1 Neil Voss
99
For the localhost add:
100
101
<pre>
102 21 Neil Voss
$pbsserver  localhost   # hostname running pbs_server
103 1 Neil Voss
</pre>
104 18 Neil Voss
105 19 Neil Voss
h3. Activate Torque-mom
106 18 Neil Voss
107 19 Neil Voss
Enable the torque pbs_mom daemon on reboot:
108
109
<pre>
110
sudo /sbin/chkconfig pbs_mom on
111
sudo /sbin/service pbs_mom start
112
</pre>
113 1 Neil Voss
114 32 Neil Voss
h2. Munge
115
116 43 Sargis Dallakyan
http://docs.adaptivecomputing.com/torque/5-1-0/help.htm#topics/torque/1-installConfig/serverConfig.htm#usingMUNGEAuth
117 32 Neil Voss
118 37 Anchi Cheng
Munge is an authentication service that creates and validates user credentials and other features
119 35 Neil Voss
120 32 Neil Voss
<pre>
121 33 Neil Voss
sudo create-munge-key
122 1 Neil Voss
sudo /sbin/chkconfig munge on
123
sudo service munge start
124 35 Neil Voss
sudo qmgr -c 'set server authorized_users=user01@host01'
125
sudo qmgr -c 'set server authorized_users=user01@host02'
126
sudo qmgr -c 'set server authorized_users=user01@*'
127 32 Neil Voss
</pre>
128 40 Amber Herold
129 20 Neil Voss
_________
130 40 Amber Herold
131 20 Neil Voss
132
h2. Test Torque Setup
133
134
On the head node, see if you can run a @qstat@:<pre>qstat</pre>
135 1 Neil Voss
136 29 Neil Voss
You can type:
137
<pre>
138
pbsnodes
139
</pre> to check the state of the compute clusters.
140 26 Amber Herold
141 20 Neil Voss
On the head node, create a job and submit it:
142
<pre>
143
echo "sleep 60" > test.job
144
echo "echo hello" >> test.job
145
qsub test.job
146
qstat
147
</pre>
148
149 36 Neil Voss
get all settings
150
<pre>
151
sudo qmgr -c 'list server'
152
</pre>
153 20 Neil Voss
154
155
156
157 2 Neil Voss
_________
158
159 27 Amber Herold
[[Setup Remote Processing|^ Setup Remote Processing]] | [[Install SSH module for PHP|Install SSH module for PHP >]]
160 2 Neil Voss
161
______