Leginon got stucked when using the K2 super mode
Added by Xing Zhang about 11 years ago
Hi,
We are testing a Gatan K2 summit camera. Anything works fine in counting mode, but when using the K2 super mode, leginon will constantly get stucked until we quit the software and start again. Sometime show error message, and sometime no error message at all. Attached are the screen shots and error messages when leginon got stucked.
Error message:
-----------------
Leginon version: pre3.0
DARK (0, 1) 500.0
RAW (0, 1, 2, 3, 4) 200.0
MUTLIPLIER 1.0
remove zeros
Exception in thread data binder handler thread:
Traceback (most recent call last):
File "/usr/lib64/python2.4/threading.py", line 442, in bootstrap
self.run()
File "/usr/lib64/python2.4/threading.py", line 422, in run
self.__target(self.__args, **self.__kwargs)
File "/usr/lib/python2.4/site-packages/leginon/databinder.py", line 131, in handleData
method(args)
File "/usr/lib/python2.4/site-packages/leginon/transformmanager.py", line 449, in handleTransformTargetEvent
newtarget = self.transformTarget(oldtarget, level)
File "/usr/lib/python2.4/site-packages/leginon/transformmanager.py", line 243, in transformTarget
newparentimage = self.reacquire(newparenttarget)
File "/usr/lib/python2.4/site-packages/leginon/transformmanager.py", line 406, in reacquire
imagedata = self.acquireCorrectedCameraImageData(channel)
File "/usr/lib/python2.4/site-packages/leginon/correctorclient.py", line 33, in acquireCorrectedCameraImageData
imagedata = self.acquireCameraImageData(*kwargs)
File "/usr/lib/python2.4/site-packages/leginon/cameraclient.py", line 93, in acquireCameraImageData
imagedata['image'] = self.instrument.ccdcamera.Image
File "/usr/lib/python2.4/site-packages/leginon/remotecall.py", line 220, in __getattr
return self._objectservice._call(self._nodename, self._name, name, 'r')
File "/usr/lib/python2.4/site-packages/leginon/remotecall.py", line 354, in _call
return self.clients[node].send(request)
File "/usr/lib/python2.4/site-packages/leginon/datatransport.py", line 62, in send
raise result
error: (10053, 'An established connection was aborted by the software in your host machine')
------------------------
The way I setup the super mode is:
1. Setup everything used for super mode by counting mode as usual;
2.then simply switch the camera of “ed” preset to K2super.
Exposure.jpg (449 KB) Exposure.jpg | |||
Target_Adjustment.jpg (450 KB) Target_Adjustment.jpg | |||
Focus.jpg (122 KB) Focus.jpg | |||
error.txt (1.78 KB) error.txt |
Replies (15)
RE: Leginon got stucked when using the K2 super mode - Added by Anchi Cheng about 11 years ago
The setup is reasonable. The error message is a generic WIndows socket connection error.
How often does this occur? Every time it needs adjust target for Exposure or every once a while when it goes through this. If you can give us a count of how many ed images are taken after each successfully target adjustment and the number of target adjustement and holes are processed before this occurs, we can try to reproduce it here. Is there any software on your system can cause a time-out in the socket connection? Some kind of server failure on the Gatan camera computer is causing this.
The other simple thing to try, just because that is how we do it at NRAMM, is to save the frame stack on D drive not X and see if it is more stable.
Anchi
RE: Leginon got stucked when using the K2 super mode - Added by Xing Zhang about 11 years ago
This only happend after we use the K2super camera, and it never happended when we use the K2count camera in leginon. From our observation, this always happends at the step of "Target_adjustment", not during exposure nor focusing steps. This usually happends around every 3.5 hours, and it is ~30 ed images. But sometimes it happend just several minutes after I re-starting everything. In most cases, after restarting leginon and clients, it works fine for another few hours.
I keep an alive ping from the leginon PC to the Gatan PC, and the connection is alright when this problem happend. We do use a new Gatan computer last week when start testing the K2super mode. Maybe it is this new computer that caused the problem.
RE: Leginon got stucked when using the K2 super mode - Added by Anchi Cheng about 11 years ago
Xing,
My test is still running. No problem yet. It will run 100 target list (meaning 100 target adjustment and 200 es images plus 100 drift measurement and auto focusing. The frame stacks are saved to D drive.
I am monitoring the activity on X drive. K2 saves stuff on it and then erase it after processing it. This makes me wonder if you had filled up the X drive when the problem occurred. Please try saving on D drive. Another reason for me to suspect that DM is stopping the socket connection is that I just noticed that I got the same error when I closed DM without restarting Leginon client. The socket connection was made to talk to DM after all.
RE: Leginon got stucked when using the K2 super mode - Added by Anchi Cheng about 11 years ago
My test finished without any problem.
RE: Leginon got stucked when using the K2 super mode - Added by Xing Zhang about 11 years ago
Our situation looks improved after I upgrading the leginon PC memory from 4G to 8G, but still get stucked eventually after 8-9 hours. Attached are the new screenshots and the following is the error message from the leginon server.
----
MUTLIPLIER 1.0
remove zeros
DARK (0, 1) 500.0
RAW (0, 1) 500.0
MUTLIPLIER 1.0
remove zeros
Exception in thread data binder handler thread:
Traceback (most recent call last):
File "/usr/lib64/python2.4/threading.py", line 442, in bootstrap
self.run()
File "/usr/lib64/python2.4/threading.py", line 422, in run
self.__target(self.__args, **self.__kwargs)
File "/usr/lib/python2.4/site-packages/leginon/databinder.py", line 131, in handleData
method(args)
File "/usr/lib/python2.4/site-packages/leginon/transformmanager.py", line 449, in handleTransformTargetEvent
newtarget = self.transformTarget(oldtarget, level)
File "/usr/lib/python2.4/site-packages/leginon/transformmanager.py", line 243, in transformTarget
newparentimage = self.reacquire(newparenttarget)
File "/usr/lib/python2.4/site-packages/leginon/transformmanager.py", line 406, in reacquire
imagedata = self.acquireCorrectedCameraImageData(channel)
File "/usr/lib/python2.4/site-packages/leginon/correctorclient.py", line 33, in acquireCorrectedCameraImageData
imagedata = self.acquireCameraImageData(*kwargs)
File "/usr/lib/python2.4/site-packages/leginon/cameraclient.py", line 93, in acquireCameraImageData
imagedata['image'] = self.instrument.ccdcamera.Image
File "/usr/lib/python2.4/site-packages/leginon/remotecall.py", line 220, in __getattr
return self._objectservice._call(self._nodename, self._name, name, 'r')
File "/usr/lib/python2.4/site-packages/leginon/remotecall.py", line 354, in _call
return self.clients[node].send(request)
File "/usr/lib/python2.4/site-packages/leginon/datatransport.py", line 62, in send
raise result
error: (10053, 'An established connection was aborted by the software in your host machine')
Exposure.jpg (499 KB) Exposure.jpg | |||
Focus.jpg (553 KB) Focus.jpg | |||
Preset_Manager.jpg (185 KB) Preset_Manager.jpg | |||
Target_Adjustment.jpg (505 KB) Target_Adjustment.jpg | |||
error.txt (1.72 KB) error.txt |
RE: Leginon got stucked when using the K2 super mode - Added by Anchi Cheng about 11 years ago
Xing,
Have you checked X drive if it is filling up when this happens. I asked Gatan about it. It is bad if X is filling up. DM needs it for processing. Don't use X Y drive if you can not transfer files off fast enough to not to accumulate. Save it on D or directly to your network data storage.
RE: Leginon got stucked when using the K2 super mode - Added by Anchi Cheng about 11 years ago
Xing,
More thought. Jim also feels that monitoring memory, disk usage etc. on the Gatan PC side is likely get us the cause of the problem. For example, are you transferring off the data through rawtransfer script during your long run? Can it keep up? How many frames are you saving per image? Are you using background collection feature? The background collection might be a problem here.
RE: Leginon got stucked when using the K2 super mode - Added by Xing Zhang about 11 years ago
Hi, Anchi,
The X-drive never was filled up, and it alway has a free space > 100G. So the disk usage of the K2 PC should not be the cause. To transfer data, I just move them to an external disk without using the rawtransfer script. The problem happended even at the time I didn't transfer any data, and with or without "background collection". For each images of super mode, I saved 30 and 60 frames, and both has the same problem.
Right now, I switched back to use the counting mode, and it is ok for about two days without any problem. So looks it is the super mode that somehow causes the problem. I don't monitor memory usage on the K2 PC yet and will monitor it later when we use the supe mode again.
RE: Leginon got stucked when using the K2 super mode - Added by Anchi Cheng about 11 years ago
Xing,
When you try super mode again, do try to write to D drive. We've never had problem there. I am out of idea. It is hard to help you since I can not reproduce it here. I did a bit of looking at the code. I don't think background collection works with target adjustment.
RE: Leginon got stucked when using the K2 super mode - Added by Xing Zhang about 11 years ago
Unfortunately even at counting mode, leginon was stucked yesterday night, with the same phenomenon as in super mode. The K2 PC has 32G memory which should more sufficient, and it is hard to monitor the memory usage as it is unexpected when it would get stucked.
Error message is following:
------
MUTLIPLIER 1.0
remove zeros
DARK (0, 1) 500.0
RAW (0, 1, 2, 3, 4) 200.0
MUTLIPLIER 1.0
remove zeros
Exception in thread data binder handler thread:
Traceback (most recent call last):
File "/usr/lib64/python2.4/threading.py", line 442, in bootstrap
self.run()
File "/usr/lib64/python2.4/threading.py", line 422, in run
self.__target(self.__args, **self.__kwargs)
File "/usr/lib/python2.4/site-packages/leginon/databinder.py", line 131, in handleData
method(args)
File "/usr/lib/python2.4/site-packages/leginon/transformmanager.py", line 449, in handleTransformTargetEvent
newtarget = self.transformTarget(oldtarget, level)
File "/usr/lib/python2.4/site-packages/leginon/transformmanager.py", line 243, in transformTarget
newparentimage = self.reacquire(newparenttarget)
File "/usr/lib/python2.4/site-packages/leginon/transformmanager.py", line 406, in reacquire
imagedata = self.acquireCorrectedCameraImageData(channel)
File "/usr/lib/python2.4/site-packages/leginon/correctorclient.py", line 33, in acquireCorrectedCameraImageData
imagedata = self.acquireCameraImageData(*kwargs)
File "/usr/lib/python2.4/site-packages/leginon/cameraclient.py", line 93, in acquireCameraImageData
imagedata['image'] = self.instrument.ccdcamera.Image
File "/usr/lib/python2.4/site-packages/leginon/remotecall.py", line 220, in __getattr
return self._objectservice._call(self._nodename, self._name, name, 'r')
File "/usr/lib/python2.4/site-packages/leginon/remotecall.py", line 354, in _call
return self.clients[node].send(request)
File "/usr/lib/python2.4/site-packages/leginon/datatransport.py", line 62, in send
raise result
error: (10053, 'An established connection was aborted by the software in your host machine')
RE: Leginon got stucked when using the K2 super mode - Added by Anchi Cheng about 11 years ago
Please try D drive.
RE: Leginon got stucked when using the K2 super mode - Added by Xing Zhang about 11 years ago
Forgot saying that I have been using the D drive this time when got stucked.
RE: Leginon got stucked when using the K2 super mode - Added by Anchi Cheng about 11 years ago
Issu #2518 is created for further debugging of this problem
RE: Leginon got stucked when using the K2 super mode - Added by Xing Zhang about 11 years ago
leginon got stucked this morning. Attacheda are some screenshots, and the the gatnsocket.log is too big (~230MB)I will upload the gatnsocket.log to our ftp server for you to download.
Thanks.
RE: Leginon got stucked when using the K2 super mode - Added by Anchi Cheng almost 11 years ago
Update on this:
After two months of investigation, Peng Ge at Xing's facility finally found a likely cause and a solution for the problem.
The K2 computer at Xing's facility is on their university network without protection. Something on the internet has been scanning open ports on that computer and been able to interrupt the communication at SERIALEMCCD_PORT and those opened by Leginon Client.
Isolating the network containing K2 computer (and Leginon) from the unsecured university network stopped the attack, and the 'An established connection was aborted by the software in your host machine' problem has not been seen since.