Project

General

Profile

Remote processing job trouble

Added by Jason van Rooyen over 10 years ago

Hi,

I am having trouble submitting remote processing jobs on my newly-installed appion web interface. Any jobs launched from the interface return : "Error: Job submission ###.ac.za failed.
No error code has been set by the system".

My setup:
(1) Virtualized server running myami web 3.0 , mysql database , and PBS server.
Details: Linux ########.ac.za 2.6.32-431.17.1.el6.x86_64 #1 SMP Wed May 7 23:32:49 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

(2) 24-core processing box with pbs client and all image processing software.
Details: Linux #### 3.2.0-60-generic #91-Ubuntu SMP Wed Feb 19 03:54:44 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

PBS works and the users has permission to submit jobs from the command-line. The proof is that if I open a terminal (as the appion user) and run “runJob.py pyace2.py….”, the job is submitted, runs remotely on the cluster, and results are visible in the appion web interface. Job status is also monitored by checkappionJob.php.

However, I get the error stated above when I login as the same user and try to submit via the web interface. I figure this means that the problem lays between the web interface and the shell, i.e., most likely user permissions. My info.php shows that php-ssh modules are installed and I have also restarted the server so I have run out of options.

Are there any log files I can access to see why the appion user logged into the website cannot run jobs?

Thanks for your help,
Jason


Replies (4)

RE: Remote processing job trouble - Added by Jason van Rooyen over 10 years ago

Hi guys,

Just an update after finding (http://emg.nysbc.org/boards/15/topics/1791?r=1801#message-1801).

I have checked the /var/log/secure file and my web-interface user is being granted access:

Jul 4 09:39:18 srvcntleg001 sshd[4091]: pam_unix(sshd:session): session opened for user appion_user by (uid=0)
Jul 4 09:39:29 srvcntleg001 sshd[4093]: Received disconnect from 137.158.154.160: 11: PECL/ssh2 (http://pecl.php.net/packages/ssh2)
Jul 4 09:39:29 srvcntleg001 sshd[4091]: pam_unix(sshd:session): session closed for user appion_user

I have also changed the PROCESSING_HOST field in config.php to be the head node of the cluster. Neither of these made a difference unfortunately. The problem still persists.

In order to narrow down the issue I tried running php scripts calling the ssh2_connect function (as inspired by http://kvz.io/blog/2007/07/24/make-ssh-connections-with-php/). Interestingly, from the terminal (logged in as me and not appion_user) if I run the attached "test_ssh_shell.php", which launches a shell and runs "job.py", the job is submitted to the pbs queue. However, running "test_ssh_exec_pyjob.php", which calls ssh2_execute, fails.

To be clear, the ssh2_exec function can run terminal commands such as " ls -la" and returns the expected output. It just seems that runJob.py is not executing.

Could this be an enivronment variable issue i.e. the ssh2_execute function is not receiving the correct variables to run runJob?

Any advice would be appreciated.

Thanks,
Jason

py.job (325 Bytes) py.job script calling runJob.py
test_ssh_exec_pyjob.php (958 Bytes) test_ssh_exec_pyjob.php php script calling ssh2_exec
test_ssh_shell.php (1.04 KB) test_ssh_shell.php php script calling ssh2_shell

RE: Remote processing job trouble - Added by Jason van Rooyen over 10 years ago

Apologies for not figuring out my own configuration issues first. But hopefully this might help someone else in future.

All of our program environment variables were set in /etc/profile. This file is only read automatically by the system when an interactive shell is initiated (http://www.linuxquestions.org/questions/linux-general-1/etc-profile-v-s-etc-bashrc-273992/). The php command ssh2_exec runs commands in non-interactive shells (obvious in retrospect, I know) and therefore the /bin/appion scripts were not in the path. Environment variables therefore need to be sourced in .bashrc

Having sourced the /etc/profiles file in appion_user's .bashrc file, everything works and the web-server can submit jobs to the cluster :)

RE: Remote processing job trouble - Added by Amber Herold over 10 years ago

Thanks for the update Jason. Glad you are up and running.

RE: Remote processing job trouble - Added by Jason van Rooyen over 10 years ago

One more thing to consider when setting up the remote processing cluster.

After solving the remote job submission permission problem I found that log files and images were not being displayed in the appion web interface.

After much digging, it turns out that you need to change the default umask settings in pbs_mom/config on the cluster client in order for the apache user to be able to see the appion processing result files. The default umask is set to 0600 and changing this to 0002 gives the apache user read access.

    (1-4/4)