Setup job submission server » History » Version 43
Sargis Dallakyan, 05/26/2015 10:39 AM
1 | 1 | Neil Voss | h1. Setup job submission server |
---|---|---|---|
2 | |||
3 | 23 | Amber Herold | In this case, we are setting up a job submission server that will have all of the data directories mounted and external packages installed (EMAN, Xmipp, etc.) on the compute nodes. Most institutions have a job submission server already, but the data is not accessible. Appion is not set up for this scenario except for large reconstruction jobs. |
4 | 1 | Neil Voss | |
5 | 38 | Amber Herold | h2. .appion.cfg config file |
6 | |||
7 | The .appion.cfg config file is used to automatically create and submit job files to your job submission server. The sample config file provided in the [[Processing Server Installation]] instructions was created for the Torque Resource Manager. If a different resource manager is used, the .appion.cfg file will need to be modified appropriately. |
||
8 | |||
9 | 7 | Neil Voss | ______ |
10 | |||
11 | 40 | Amber Herold | |
12 | 6 | Neil Voss | h2. PBS and the Torque Resource Manager |
13 | 1 | Neil Voss | |
14 | 6 | Neil Voss | PBS stands for a "Portable Batch System":http://en.wikipedia.org/wiki/Portable_Batch_System. It is a job submission system meaning that users submit many jobs and the server prioritizes and executes each job as resources permit. Below we show how to install the popular open source PBS system called "TORQUE":http://en.wikipedia.org/wiki/TORQUE_Resource_Manager. |
15 | 1 | Neil Voss | |
16 | 43 | Sargis Dallakyan | A TORQUE cluster consists of one head node and many compute nodes. The head node runs the *pbs_server daemon* and the compute nodes run the *pbs_mom daemon*. Client commands for submitting and managing jobs can be installed on any host (including hosts not running pbs_server or pbs_mom). More documentation about Torque is "available here.":http://docs.adaptivecomputing.com/torque/5-1-0/help.htm#topics/torque/0-intro/introduction.htm |
17 | 7 | Neil Voss | ______ |
18 | |||
19 | 40 | Amber Herold | h2. Alternate instructions |
20 | |||
21 | 41 | Amber Herold | It may be helpful to review the [[Install Torque|head node installation notes]] and [[Install Torque Client|client installation notes]] from a recent installation on CentOS 6. |
22 | 40 | Amber Herold | |
23 | 42 | Amber Herold | ______ |
24 | |||
25 | 40 | Amber Herold | |
26 | 6 | Neil Voss | h2. Head node installation |
27 | |||
28 | 14 | Neil Voss | h3. Install Torque-server |
29 | 6 | Neil Voss | |
30 | 24 | Amber Herold | Torque available with Fedora and CentOS 5.4 (through the EPEL). For YUM based systems type: |
31 | 6 | Neil Voss | |
32 | 1 | Neil Voss | <pre> |
33 | 25 | Amber Herold | sudo yum -y install torque-server torque-scheduler torque-client |
34 | 1 | Neil Voss | </pre> |
35 | 7 | Neil Voss | |
36 | 1 | Neil Voss | h3. Initialize Torque-server, because PATH setting you will need to become root |
37 | 25 | Amber Herold | |
38 | Make sure the directory containing the _pbs_server_ executable is in your PATH. For CentOS this is usually /usr/sbin. |
||
39 | 8 | Neil Voss | |
40 | 10 | Neil Voss | <pre> |
41 | 31 | Neil Voss | sudo pbs_server -t create |
42 | 10 | Neil Voss | </pre> |
43 | 9 | Neil Voss | |
44 | 8 | Neil Voss | h3. Activate Torque-server |
45 | |||
46 | Enable the torque pbs_mom daemon on reboot: |
||
47 | 1 | Neil Voss | |
48 | <pre> |
||
49 | 9 | Neil Voss | sudo /sbin/chkconfig pbs_server on |
50 | 15 | Neil Voss | sudo /sbin/service pbs_server restart |
51 | 22 | Neil Voss | sudo /sbin/chkconfig pbs_sched on |
52 | sudo /sbin/service pbs_sched start |
||
53 | 8 | Neil Voss | </pre> |
54 | |||
55 | 39 | Amber Herold | h3. Add nodes to Torque-server nodes file: /var/lib/torque/server_priv/nodes |
56 | 8 | Neil Voss | |
57 | 17 | Neil Voss | The format is: |
58 | <pre> |
||
59 | node-name[:ts] [np=] [properties] |
||
60 | </pre> |
||
61 | |||
62 | To add the localhost with two processors as a node, you would add: |
||
63 | |||
64 | <pre> |
||
65 | localhost np=2 |
||
66 | </pre> |
||
67 | |||
68 | You should add every *compute node* to this file, e.g., |
||
69 | |||
70 | <pre> |
||
71 | node01.INSTITUTE.EDU np=2 |
||
72 | node02.INSTITUTE.EDU np=4 |
||
73 | node03.INSTITUTE.EDU np=2 |
||
74 | </pre> |
||
75 | |||
76 | 7 | Neil Voss | ______ |
77 | 1 | Neil Voss | |
78 | 40 | Amber Herold | |
79 | 6 | Neil Voss | h2. Compute node installation |
80 | |||
81 | h3. Install Torque-mom |
||
82 | |||
83 | Torque available in with Fedora and CentOS 5.4 (through the EPEL). For YUM based systems type: |
||
84 | |||
85 | <pre> |
||
86 | sudo yum -y install torque-mom torque-client |
||
87 | </pre> |
||
88 | |||
89 | 1 | Neil Voss | h3. Configure node to receive jobs from headnode: |
90 | 18 | Neil Voss | |
91 | 43 | Sargis Dallakyan | bq. see http://docs.adaptivecomputing.com/torque/5-1-0/help.htm#topics/torque/1-installConfig/computeNodes.htm for more details |
92 | 18 | Neil Voss | |
93 | 29 | Neil Voss | Edit the /var/torque/mom_priv/config (CentOS 5) OR /var/lib/torque/mom_priv/config (CentOS 6) file: |
94 | 18 | Neil Voss | |
95 | <pre> |
||
96 | 21 | Neil Voss | $pbsserver headnode.INSTITUTE.EDU # hostname running pbs_server |
97 | 18 | Neil Voss | </pre> |
98 | 1 | Neil Voss | |
99 | For the localhost add: |
||
100 | |||
101 | <pre> |
||
102 | 21 | Neil Voss | $pbsserver localhost # hostname running pbs_server |
103 | 1 | Neil Voss | </pre> |
104 | 18 | Neil Voss | |
105 | 19 | Neil Voss | h3. Activate Torque-mom |
106 | 18 | Neil Voss | |
107 | 19 | Neil Voss | Enable the torque pbs_mom daemon on reboot: |
108 | |||
109 | <pre> |
||
110 | sudo /sbin/chkconfig pbs_mom on |
||
111 | sudo /sbin/service pbs_mom start |
||
112 | </pre> |
||
113 | 1 | Neil Voss | |
114 | 32 | Neil Voss | h2. Munge |
115 | |||
116 | 43 | Sargis Dallakyan | http://docs.adaptivecomputing.com/torque/5-1-0/help.htm#topics/torque/1-installConfig/serverConfig.htm#usingMUNGEAuth |
117 | 32 | Neil Voss | |
118 | 37 | Anchi Cheng | Munge is an authentication service that creates and validates user credentials and other features |
119 | 35 | Neil Voss | |
120 | 32 | Neil Voss | <pre> |
121 | 33 | Neil Voss | sudo create-munge-key |
122 | 1 | Neil Voss | sudo /sbin/chkconfig munge on |
123 | sudo service munge start |
||
124 | 35 | Neil Voss | sudo qmgr -c 'set server authorized_users=user01@host01' |
125 | sudo qmgr -c 'set server authorized_users=user01@host02' |
||
126 | sudo qmgr -c 'set server authorized_users=user01@*' |
||
127 | 32 | Neil Voss | </pre> |
128 | 40 | Amber Herold | |
129 | 20 | Neil Voss | _________ |
130 | 40 | Amber Herold | |
131 | 20 | Neil Voss | |
132 | h2. Test Torque Setup |
||
133 | |||
134 | On the head node, see if you can run a @qstat@:<pre>qstat</pre> |
||
135 | 1 | Neil Voss | |
136 | 29 | Neil Voss | You can type: |
137 | <pre> |
||
138 | pbsnodes |
||
139 | </pre> to check the state of the compute clusters. |
||
140 | 26 | Amber Herold | |
141 | 20 | Neil Voss | On the head node, create a job and submit it: |
142 | <pre> |
||
143 | echo "sleep 60" > test.job |
||
144 | echo "echo hello" >> test.job |
||
145 | qsub test.job |
||
146 | qstat |
||
147 | </pre> |
||
148 | |||
149 | 36 | Neil Voss | get all settings |
150 | <pre> |
||
151 | sudo qmgr -c 'list server' |
||
152 | </pre> |
||
153 | 20 | Neil Voss | |
154 | |||
155 | |||
156 | |||
157 | 2 | Neil Voss | _________ |
158 | |||
159 | 27 | Amber Herold | [[Setup Remote Processing|^ Setup Remote Processing]] | [[Install SSH module for PHP|Install SSH module for PHP >]] |
160 | 2 | Neil Voss | |
161 | ______ |