reduxd segfault
Added by Wei Guo almost 5 years ago
Hi, We frequently encounter the error that the reduxd is dead of segfault like
kernel: reduxd10184: segfault at 7fffbb727f70 ip 00007f04500522aa sp 00007fffbb727f40 error 6 in libpython2.7.so.1.0[7f044ff5e000+17e000]
When it fails, the imageviewer complains "redux failed to return anything" and it cannot display images.
At this time, we have to restart reduxd service.
I can get the abrt next time but it would be appreciated if you can give some clue.
Thanks
Wei
Replies (8)
RE: reduxd segfault - Added by Anchi Cheng almost 5 years ago
These are unusual error in the library.
It is likely has to do with your supporting package versions. Please give details about them, OS, Python version etc.
We technically only support CentOS but there cases that it can be made to work.
RE: reduxd segfault - Added by Wei Guo almost 5 years ago
We are using rhel 7.7, kernel version 3.10.0-1062.9.1.el7.x86_64
- cat /etc/*release
NAME="Red Hat Enterprise Linux Server"
VERSION="7.7 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="7.7"
PRETTY_NAME="Red Hat Enterprise Linux"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:7.7:GA:server"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.7
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="7.7"
Red Hat Enterprise Linux Server release 7.7 (Maipo)
Red Hat Enterprise Linux Server release 7.7 (Maipo)
- uname -a
Linux splpcrodb01 3.10.0-1062.9.1.el7.x86_64 #1 SMP Mon Dec 2 08:31:54 EST 2019 x86_64 x86_64 x86_64 GNU/Linux
- python --version
Python 2.7.5
I think the difference between Centos and RHEL 7 should be marginal regarding the libraries.
Thanks Anchi.
RE: reduxd segfault - Added by Anchi Cheng almost 5 years ago
Thanks for the info.
IT here just informed me that he saw similar error on our new CentOS7 webserver yesterday so I might be able to reproduce this and see if there is a patch that I can add.
"redux failed to return anything" is just a general error message when the server is not connectable after the segfault. The debug will need to be done in redux code or maybe searching for differences in the libraries used. Please do abrt so I can get a traceback look of it and see if I can learn some common pattern between yours and ours.
In the mean time, you can probably set up autostart to restart it when it fails.
RE: reduxd segfault - Added by Wei Guo almost 5 years ago
Thanks. Please see attached core_backtrace.
$ cat environ
SHELL=/bin/bash
USER=appdpcryolab
PATH=/sbin:/usr/sbin:/bin:/usr/bin
PWD=/
LANG=en_US.UTF-8
SHLVL=1
HOME=/home/appdpcryolab
LOGNAME=appdpcryolab
_=/bin/reduxd
$ cat executable
/usr/bin/python2.7
- cat reason
python2.7 killed by SIGSEGV
Feb 14 18:43:31 splpcrodb01 kernel: reduxd29948: segfault at 7ffea864dff8 ip 00007f68c921547a sp 00007ffea864dfb0 error 6 in libpython2.7.so.1.0[7f68c918b000+17e000]
Feb 14 18:43:31 splpcrodb01 abrt-hook-ccpp20715: Process 29948 (python2.7) of user 1018095 killed by SIGSEGV - dumping core
core_backtrace (337 KB) core_backtrace |
RE: reduxd segfault - Added by Wei Guo over 4 years ago
Hello, Dr. Cheng, is there any news on this issue? Thanks.
RE: reduxd segfault - Added by Swapnil Bhatkar over 4 years ago
Hi this is Swapnil from SEMC IT
Whenever we see that error, we simply restart the reduxd server
To stop it from happening more frequently, we recently moved the redux server to a compute node with higher memory (384 GB)
RE: reduxd segfault - Added by Anchi Cheng over 4 years ago
The trace indicates it is related to fftw3 which redux uses to create power spectrum images on the webviewer. I am having the local IT reply to this. The code we use to execute is based on pyFFTW and was done a long time ago. From internet search, it may be related to lack of memory allocation which is beyond me how to add that in. The only thing I can think of considering is to make wisdom files of all the dimensions you may encounter first. If there are problems, it would show up during that process. Most likely you will have enough memory on the system and get that through. This is the hacking and workaround, unfortunately. I've asked one our IT guys to find a different way to handle this, but it has not progress fast enough for you to use it. Sorry about that.