Project

General

Profile

Setup job submission server » History » Version 38

Amber Herold, 11/20/2013 04:12 PM

1 1 Neil Voss
h1. Setup job submission server
2
3 23 Amber Herold
In this case, we are setting up a job submission server that will have all of the data directories mounted and external packages installed (EMAN, Xmipp, etc.) on the compute nodes. Most institutions have a job submission server already, but the data is not accessible. Appion is not set up for this scenario except for large reconstruction jobs. 
4 1 Neil Voss
5 38 Amber Herold
h2. .appion.cfg config file
6
7
The .appion.cfg config file is used to automatically create and submit job files to your job submission server. The sample config file provided in the [[Processing Server Installation]] instructions was created for the Torque Resource Manager. If a different resource manager is used, the .appion.cfg file will need to be modified appropriately.
8
9 7 Neil Voss
______
10
11 6 Neil Voss
h2. PBS and the Torque Resource Manager
12 1 Neil Voss
13 6 Neil Voss
PBS stands for a "Portable Batch System":http://en.wikipedia.org/wiki/Portable_Batch_System. It is a job submission system meaning that users submit many jobs and the server prioritizes and executes each job as resources permit. Below we show how to install the popular open source PBS system called "TORQUE":http://en.wikipedia.org/wiki/TORQUE_Resource_Manager. 
14 1 Neil Voss
15 6 Neil Voss
A TORQUE cluster consists of one head node and many compute nodes. The head node runs the *pbs_server daemon* and the compute nodes run the *pbs_mom daemon*. Client commands for submitting and managing jobs can be installed on any host (including hosts not running pbs_server or pbs_mom). More documentation about Torque is "available here.":http://www.clusterresources.com/products/torque/docs/
16
17 7 Neil Voss
______
18
19 6 Neil Voss
h2. Head node installation
20
21 14 Neil Voss
h3. Install Torque-server
22 6 Neil Voss
23 24 Amber Herold
Torque available with Fedora and CentOS 5.4 (through the EPEL). For YUM based systems type:
24 6 Neil Voss
25 1 Neil Voss
<pre>
26 25 Amber Herold
sudo yum -y install torque-server torque-scheduler torque-client
27 1 Neil Voss
</pre>
28 7 Neil Voss
29 1 Neil Voss
h3. Initialize Torque-server, because PATH setting you will need to become root
30 25 Amber Herold
31
Make sure the directory containing the _pbs_server_ executable is in your PATH. For CentOS this is usually /usr/sbin.
32 8 Neil Voss
33 10 Neil Voss
<pre>
34 31 Neil Voss
sudo pbs_server -t create
35 10 Neil Voss
</pre>
36 9 Neil Voss
37 8 Neil Voss
h3. Activate Torque-server
38
39
Enable the torque pbs_mom daemon on reboot:
40 1 Neil Voss
41
<pre>
42 9 Neil Voss
sudo /sbin/chkconfig pbs_server on
43 15 Neil Voss
sudo /sbin/service pbs_server restart
44 22 Neil Voss
sudo /sbin/chkconfig pbs_sched on
45
sudo /sbin/service pbs_sched start
46 8 Neil Voss
</pre>
47
48 16 Neil Voss
h3. Add nodes to Torque-server nodes file: /var/torque/server_priv/nodes
49 8 Neil Voss
50 17 Neil Voss
The format is:
51
<pre>
52
node-name[:ts] [np=] [properties]
53
</pre>
54
55
To add the localhost with two processors as a node, you would add:
56
57
<pre>
58
localhost np=2
59
</pre>
60
61
You should add every *compute node* to this file, e.g.,
62
63
<pre>
64
node01.INSTITUTE.EDU np=2
65
node02.INSTITUTE.EDU np=4
66
node03.INSTITUTE.EDU np=2
67
</pre>
68
69 7 Neil Voss
______
70 1 Neil Voss
71 6 Neil Voss
h2. Compute node installation
72
73
h3. Install Torque-mom
74
75
Torque available in with Fedora and CentOS 5.4 (through the EPEL). For YUM based systems type:
76
77
<pre>
78
sudo yum -y install torque-mom torque-client
79
</pre>
80
81 18 Neil Voss
h3. Configure node to receive jobs from headnode:
82
83
bq. see http://www.clusterresources.com/products/torque/docs/1.2basicconfig.shtml#initializenode for more details
84
85 29 Neil Voss
Edit the /var/torque/mom_priv/config (CentOS 5) OR /var/lib/torque/mom_priv/config (CentOS 6) file:
86 18 Neil Voss
87
<pre>
88 21 Neil Voss
$pbsserver  headnode.INSTITUTE.EDU   # hostname running pbs_server
89 18 Neil Voss
</pre>
90 1 Neil Voss
91
For the localhost add:
92
93
<pre>
94 21 Neil Voss
$pbsserver  localhost   # hostname running pbs_server
95 1 Neil Voss
</pre>
96 18 Neil Voss
97 19 Neil Voss
h3. Activate Torque-mom
98 18 Neil Voss
99 19 Neil Voss
Enable the torque pbs_mom daemon on reboot:
100
101
<pre>
102
sudo /sbin/chkconfig pbs_mom on
103
sudo /sbin/service pbs_mom start
104
</pre>
105 20 Neil Voss
106 32 Neil Voss
h2. Munge
107
108
http://www.clusterresources.com/torquedocs/1.3advconfig.shtml
109
110 37 Anchi Cheng
Munge is an authentication service that creates and validates user credentials and other features
111 35 Neil Voss
112 32 Neil Voss
<pre>
113 33 Neil Voss
sudo create-munge-key
114 1 Neil Voss
sudo /sbin/chkconfig munge on
115
sudo service munge start
116 35 Neil Voss
sudo qmgr -c 'set server authorized_users=user01@host01'
117
sudo qmgr -c 'set server authorized_users=user01@host02'
118
sudo qmgr -c 'set server authorized_users=user01@*'
119 32 Neil Voss
</pre>
120 20 Neil Voss
_________
121
122
h2. Test Torque Setup
123
124
On the head node, see if you can run a @qstat@:<pre>qstat</pre>
125 1 Neil Voss
126 29 Neil Voss
You can type:
127
<pre>
128
pbsnodes
129
</pre> to check the state of the compute clusters.
130 26 Amber Herold
131 20 Neil Voss
On the head node, create a job and submit it:
132
<pre>
133
echo "sleep 60" > test.job
134
echo "echo hello" >> test.job
135
qsub test.job
136
qstat
137
</pre>
138
139 36 Neil Voss
get all settings
140
<pre>
141
sudo qmgr -c 'list server'
142
</pre>
143 20 Neil Voss
144
145
146
147 2 Neil Voss
_________
148
149 27 Amber Herold
[[Setup Remote Processing|^ Setup Remote Processing]] | [[Install SSH module for PHP|Install SSH module for PHP >]]
150 2 Neil Voss
151
______