Feature #6748
openProvide native Appion support for Topaz particle picking
0%
Description
Based on Micah's recent presentation, Neil is trying to get Topaz into Appion:
Scope of Project:
Implement Topaz deep learning particle picker within Appion particle picking framework and provide compatibility with Relion Processing Software.
Project Stages:
1. Python backend to process images and launch Topaz picking sessions
A. Image processing, prepare micrographs for Topaz
B. Copy manual, template, or DoG particle picks from Appion and give to Topaz for initial training
C. Implement Topaz training, choose default or user custom values
D. Apply Topaz neural network and process all micrographs
E. Python script to extract particles from Topaz (star file) for Appion use
F. Apply threshold and upload particles to Appion database
G. Automatically generate Star file for going straight into Relion
2. Topaz GPU and parameter optimization
3. Web interface
A. Create standard PHP web interface to launch Topaz jobs; initial implementation will be simple for new users to use
B. Create PHP web interface for re-processing images with a different threshold
Limitations of proposed implementation, that could be updated in future:
• Limited feedback: only standard particle picking tools and images will be available.
• Requires an existing particle picking run to be used to train the neural network.
• Complex model training methods will be unavailable in first implementation (requires more user interaction during training process than Appion is built for).
• Neural network models are saved but not easily transferred to other data sets.
• Neural network is trained from scratch for each job submission.
• Particle picking threshold is chosen at initial launch and cannot be changed without launching another job (design proposed above).
Files
Updated by Neil Voss almost 6 years ago
- File topaz_instructions.pptx topaz_instructions.pptx added
Updated by Sargis Dallakyan almost 6 years ago
- File topaz_input_prep.py topaz_input_prep.py added
Updated by Neil Voss almost 6 years ago
It looks like as of Feb 1, 2019 topaz now just uses MRC files, no more TIFF file requirement.
I would prefer stick to MRC, which means you would need to upgrade Topaz to a newer version. Do you see any problem with this?
Updated by Sargis Dallakyan almost 6 years ago
I don't see problem. As a side note, we have topaz running in a singularity container.
[root@SEMC-head ~]# more /gpfs/sw/bin/topaz #!/bin/bash /gpfs/sw/bin/singularity exec --nv -B /gpfs:/gpfs /gpfs/sw/singularity/images/centos7 topaz "$@"
Updated by Alex Noble almost 6 years ago
- File topaz_GUI_full.html topaz_GUI_full.html added
Topaz training needs to be done on preprocessed images (normalization in particular is a must). 'topaz preprocess' can output as mrcs, tiffs, and pngs. Note that mrcs and tiffs use floats while pngs are integers.
Things have been simplified some since the workflow that Micah wrote up. Here are some ways:
1) 'topaz train' now accepts a single folder --train-images. All that is required is that the file passed in --train-targets points to images in that directory.
2) 'topaz train' doesn't need --test-images or --test-targets. Just use the --k-fold option to subdivide the training data by a factor of 1/kfold where by default one fold is held out (--fold=0). E.g. --k-fold=5 will divide the training data into 5 chunks and the first chunk (20%) will be used as the test data [this has actually always been true].
3) 'topaz train' now accepts star, tab-delimited, and CSV particle coordinates (CSV is from the standalone GUI).
4) 'topaz convert' now exists to convert between the above formats.
5) The Relion issue with the extracted star file is fixed.
Also, I am actively writing a standalone GUI that includes command generation and particle picking all in one html file. The preprocessing command generation is done. I am working on the train and extract commands now. I have attached the current dev version. A release version will be ready in possibly a few days.
Tristan is also working on a basic tutorial, a workflow for cross-validation, and implementation of a method that will allow for pi to be optimized.
Updated by Neil Voss almost 6 years ago
Hi Micah,
Can you upload a sample of all_images.txt, all_filtered_images.txt, and all_coord.txt? I am trying to recreate these files from an Appion dump, so having an actual file would be helpful.
Updated by Neil Voss almost 6 years ago
Hi again,
I managed to find a copy. Computer was just running slow.
Updated by Neil Voss almost 6 years ago
One question.
I see that when topaz splits the test and training sets, it selects whole micrographs for each category. Would it be better to have test and training particles within the same micrograph?
Updated by Alex Noble almost 6 years ago
Hi Neil,
Tristan and I have just released Topaz v0.1.0 with a full working GUI:
https://github.com/tbepler/topaz/
You can use this GUI to understand the whole workflow and which options are useful. For instance, the user only needs to provide a directory to micrographs and particle picks (in either .star, .box, Topaz .txt, or Topaz GUI .csv files); the K-fold option takes care of splitting the dataset into test/train.
To answer your question, I think that Topaz requires test and train sets to be different sets of micrographs.
Updated by Neil Voss almost 6 years ago
Hi Alex,
I know I am kind of re-inventing the wheel here, but as we discussed my initial hope was to create a fully-automated Topaz picker. I do worry that no one will use my fully automated picker, because you would have to have particle picks in Appion to launch Topaz. Let me know if I am doing something worthless to the group.
So, instead of using 15 Topaz scripts, I am doing all the heavy lifting in one simple Appion script. From the format of the input txt files, it looks like I could split the training and test sets in the same micrograph, but their might be unexpected memory limitations on the GPU side, because now we are using all 30 micrographs for training instead of 20 for training and 10 for testing.
Neil
Updated by Alex Noble almost 6 years ago
Hi Neil,
NYSBC will use your Appion implementation. Several internal users/staff are already using it with success and pushing resolution/isotropy with it. We want it to become routine and will likely begin including Topaz in our regular Appion workshops.
Right now, Topaz requires 3 or 4 commands that will take you from {micrographs + particle picks} to {particle coordinates}, not 15. The GUI goes through these steps and fills in most of the values for the user. I have attached it - please look it over. There is also a basic tutorial now on the GihHub page that goes over the same steps with less verbosity: https://github.com/tbepler/topaz/blob/gui/tutorial/01_quick_start_guide.ipynb
I don't think RAM considerations are an issue - someone internally tried training on 60,000 particles across several thousand images and it worked fine. It just took a little longer.
BTW, I am going on a 2-week vacation in a couple days, so I won't be available very soon. Tristan will be available: tbepler at gmail
Updated by Alex Noble almost 6 years ago
Hi Neil,
What is the status on Topaz integration, and can Tristan and I help? Have you tried the Topaz GUI? It might be easy to just throw that into Appion and connect it to the database.
Thanks,
-Alex
Updated by Sargis Dallakyan over 5 years ago
- Assignee changed from Neil Voss to Sargis Dallakyan
Thanks Alex, Topaz GUI looks good. I've added it to myamiweb/processing. For now, I made sure it's positioned correctly with the rest of Appion UI. I'll work on pre-processing next; will need to change the input/output and add an option to run jobs from the web UI.
Updated by Bridget Carragher over 5 years ago
Cool! I am delighted that this might be a possibility - Topaz is very popular with our users and I would lvoe for it to be an integral Appion option.
Updated by Sargis Dallakyan over 5 years ago
Finished pre-processing part. Logged in users can now submit pre-processing jobs. I've created a new submitJob function based on submitAppionJob so that it would return back to Topaz GUI, loop 100 times til it finds the output files, then select Picking tab and load images there.
Since topaz.html has around 11k lines, it takes a lot of scrolling to go up and down. Should we split it to .css and .js to make development process faster?
Updated by Bridget Carragher over 5 years ago
This sounds great! Shall we have a little mini meeting to chat this through? We can also chat at Appion developers tomorrow of course...
Updated by Sargis Dallakyan over 5 years ago
Sounds good. Would be great to have feedback from users and developers as we move forward on this project.
Updated by Alex Noble over 5 years ago
Hi Sargis,
Awesome work, this is a great start!
I took a look at the code difference and it seems that apart from adding the 'name=' to the input tags, there are only a few code changes. This is nice because it allows for easier re-integration of future GUI releases. I think the remaining parts of the GUI will be similar.
Do you have time some day to skype with me and show me what was done on the php side? I may be able to finish integration if it's relatively straight-forward.
Thanks!
-Alex
Updated by Sargis Dallakyan over 5 years ago
Hi Alex,
Sounds good. I also have saving particle picks working. I will commit that code later this afternoon. The rest should be relatively straight-forward. I have a meeting with Bridget tomorrow at 3pm EST (Tuesday July 2nd). If you can join us, we can go through code changes. If not, we can schedule a meeting some other time.
Thanks,
Sargis
Updated by Alex Noble over 5 years ago
Hi Sargis,
Great, I will join you tomorrow.
Best,
-Alex
Updated by Sargis Dallakyan over 5 years ago
Added Topaz denoiser; it's in between 'CTF Estimation' and 'Object Selection' in processing menu. Binning is set to 2, otherwise it might run out of GPU memory. It selects the last preset from the drop down. Since it takes a while for Topaz warm up, I run topaz denoise
for all images at once in postLoopFunctions instead of doing one image at a time. Once it's done, it adds a new preset (+_a_td) with corresponding denoised image in the viewer.
Updated by Alex Noble over 5 years ago
Thank you so much for working on this Sargis!
I just had Kotaro try it on 19jul11a efn images, on krios02buffer and it instantly skipped all of the images. We tried deleting the donedict and re-running on krios04buffer where there were no other processes running, and it skipped them all again. `which topaz` correctly reports /gpfs/sw/bin/topaz
Here's the command used:
/opt/myamisnap/bin/appion topazDenoiser.py --bin=1 --denoiselabel=td --patchsize=2048 --patchpadding=128 --runname=topaz_denoise2 --rundir=/gpfs/appion/kkelley/19jul11a/topaz_denoise/topaz_denoise1 --preset=efn --commit --projectid=614 --session=19jul11a --no-rejects --continue --expid=9403 --jobtype=topazdenoise
I'm not sure what is wrong.
Also, could you set the default appended label to 'td', the default binning to 1, and the default patch size to 2048? Patches break up the images for loading on the GPU, and images denoise best when unbinned.
Thank you Sargis!
Updated by Alex Noble over 5 years ago
Correction on the command used:
/opt/myamisnap/bin/appion topazDenoiser.py --bin=1 --denoiselabel=td --patchsize=2048 --patchpadding=128 --runname=topaz_denoise1 --rundir=/gpfs/appion/kkelley/19jul11a/topaz_denoise/topaz_denoise1 --preset=efn --commit --projectid=614 --session=19jul11a --no-rejects --continue --expid=9403 --jobtype=topazdenoise
Updated by Sargis Dallakyan over 5 years ago
Thank you too Alex. I moved running topaz denoise
from postLoopFunctions to processImage. This way it can run while data is collected. I also changed the default appended label to 'td', the default binning to 1, and the default patch size to 2048. Please try again tomorrow after nightly updates.
Updated by Sargis Dallakyan over 5 years ago
Buffer4 was not responding. I've rebooted it and rerun the same command with updated topazDenoiser.py. It seems to be working; created efn_td preset and processed 7 images so far.
Updated by Alex Noble over 5 years ago
Hi Sargis,
Thank you!!
I missed one important thing: Can you make the --device option in `topaz denoise` defaulted to 0, and add the option to the Appion webpage? The options are: positive integers for GPU id, or -1 for CPU.
Best,
-Alex
Updated by Alex Noble over 5 years ago
Hi Sargis,
Oh this is weird... It seems that the device is set to 0, but it doesn't use the GPU and instead uses the GPU... I am testing on semccatchup01 and krios04buffer where nobody else is running anything. Do you see the same behavior? Here is what the output is for me:
Starting image 5 ( skip:468, remain:5 ) id:12182678, file: 19jul10e_grid1new_022gr_01sq_02hln_014enn
... Pixel size: 0.854905
...
... # using device=0 with cuda=False
- using model: L2
- 1 of 1 completed.
==== Committing data to database ====
SUMMARY: topazDenoiser
------------------------------------------
TIME: 1 min 21 sec
AVG TIME: 1.31 +/- 0.64 min
(- REMAINING TIME: 7 min 50 sec for 4 images)
-----------------------------------------
When I run a `topaz denoise` command outside of the loop, it has the same problem.
BUT when I run the same topaz_denoiser.py command on node44, it uses the GPU properly! Any idea whats the issue?
Also, could the ice thickness measurement be retained for the denoised image? And could the `topaz denoise` command be printed to the screen in the Appion loop?
Thanks!
-Alex
Updated by Alex Noble over 5 years ago
Hi Sargis,
Since Topaz Denoiser works best on non-frame aligned images, early return creates an issue: the enn images become a single frame rather than a sum. Since there is no flag in the database saying whether early return was used, could you put a flag n the Appion form asking the user if early return was used? If yes, then before denoising a normal summed enn.mrc image should be created to replace the original enn.mrc image that had one frame (using either whatever default method is used in Leginon for returning enn.mrc or motioncor2 with -Align set to 0).
This would be very useful because at least 1/3 of datasets use early return, from my estimation.
Thanks!
-Alex
Updated by Sargis Dallakyan over 5 years ago
Hi Alex,
Great thanks, I plan on making the following changes:
- Print `topaz denoise` command to the screen in the Appion loop.
- Add --device option and help text (positive integers for GPU id, or -1 for CPU). If we leave it blank, does topaz automatically picks the right device? I don't want to set the default to 0, in case someone else is already using device 0.
When I run on krios04buffer yesterday with no --device option, it was using GPU correctly. Do you mean that on semccatchup01 you asked for GPU and it used CPU instead?
I'll look into ice thickness measurement. I copy image info and change preset and file name in the Appion loop. The ice thickness measurement are taken from another table. I'll need to create new entries in that table or modify viewer to handle it.
I'll need to read up on what early return is and how to implement related changes.
Updated by Alex Noble over 5 years ago
Hi Sargis,
Awesome, thank you! You're the best!
When I run `topaz denoise --device 0 [...]` on either semccatchup01 or krios04buffer, it recognizes that it should use the GPU, but doesn't. ie. it says 'using device=0 with cuda=False', but it should say 'using device=0 with cuda=True'. When I monitor nvidia-smi using `watch -n 1 nvidia-smi`, it never has any processes running, but `top` shows high CPU usage by topaz and it takes about 1.5 minutes to denoise an image. When I run it on a cluster GPU node like node44, it says 'using device=0 with cuda=True', which is correct; nvidia-smi shows a topaz process running and it takes 13 seconds to denoise an image. I'm not sure what the problem here might be.
Early return is an option on the K2 camera to just return the first frame of a movie so that the camera takes less time to return control to Leginon, which speeds up collection overall. Leginon then simply writes the one frame to enn.mrc (or whatever the preset is). Since Topaz Denoiser works best on non-frame aligned images, it would be great to have the Topaz Denoiser Appion integration ask the user whether early return was used during collection - if it was, then the frames should be summed and gain corrected (not frame aligned) before denoising. It might be best to just leave the single-frame enn.mrc files alone and just add the summing+gain correction step to the denoising loop. Or maybe enn.mrc should also be replaced by the sum.
Let me know if I can explain further.
Best,
-Alex
Updated by Sargis Dallakyan over 5 years ago
Buffer4 is using Cuda 8, topaz singularity container is using Cuda 9 and node44 is Cuda 10. Maybe this is why on buffer4 it's using CPU instead of GPU.
Updated by Alex Noble over 5 years ago
Hi Sargis,
You're exactly correct! Topaz requires CUDA 9+. I didn't realize some nodes have different versions. Can the nodes with 8 be upgraded, or would that break things?
I told Tristan so that he will add a warning for the user in stdout.
Thank you!
-Alex
Updated by Sargis Dallakyan over 5 years ago
Hi Alex,
Buffers use CUDA 8 because motioncoor2 we currently use (MotionCor2_1.2.1-Cuda80-Centos6). It would break frame alignment if we upgrade.
I've just committed changes for adding --device option, copying ice thickness info and printing topaz denoise command. This will be live tomorrow after nightly updates. I'll work on early return related changes next week.
Thank you,
Sargis
Updated by Alex Noble over 5 years ago
Hi Sargis,
Motioncor2 v1.2.6 now supports CUDA 8.0, 9.2, and 10.1
https://docs.google.com/forms/d/e/1FAIpQLSfAQm5MA81qTx90W9JL6ClzSrM77tytsvyyHh1ZZWrFByhmfQ/viewform
Thank you for the updates!
-Alex
Updated by Alex Noble over 5 years ago
Hi Sargis,
Sometimes the denoising loop skips images for no apparent reason. See the beginning of 19jul18g, for example. Do you know why this happens?
Thanks!
-Alex
Updated by Alex Noble over 5 years ago
I found the answer to my previous question. It was running out of memory for some images...
Sargis, could you change the default for 'patch size' from 2048 to 1536?
Thanks!
-Alex
Updated by Sargis Dallakyan over 5 years ago
Hi Alex,
Good find. I've changed the default 'patch size' to 1536. This will be live tomorrow after nightly updates.
Thanks.
Updated by Alex Noble over 5 years ago
Hi Sargis,
Denoising has been working great here! Thank you.
Has there been progress on making early return collections compatible so that the enn images can be denoised?
Thank you,
-Alex
Updated by Sargis Dallakyan over 5 years ago
Hi Alex,
Glad to hear that denoising has been working great! I've made some progress with early return option, but sidetracked into other projects recently.
I've read Leginon code and couldn't find any code that would sum and gain correct the frames. This made me think that early return option is something that is implemented in Gatan software.
I asked Anchi where to find a code to sum the frames and do gain correction. She directed me to a right place (apDDprocess.py); thanks Anchi. We have a code to do this for mrc files but not for tiff files. I recently wrote a code to read individual frames from LZW compressed tiff movies (#7713). I'll use that to sum the frames and do gain correction. Hope to have it working by the end of next week, unless something more urgent comes up.
Thank you,
Sargis
Updated by Alex Noble over 5 years ago
Wonderful, thank you Sargis! You're awesome.
Another way would be to use motioncor2 with the flag '-Align 0', and it should just sum and gain correct (I think).
Best,
-Alex
Updated by Sargis Dallakyan over 5 years ago
Added early return option. This will be available tomorrow after nightly updates.
I've used apK2process.GatanK2Processing.correctFrameImage to sum and gain correct the frames. Seems to be working fine.
Updated by Anchi Cheng over 5 years ago
To be consistent with "-" used in aligned image preset so that I can stop denoised preset to be imported to the next leginon session, I changed the presetname-postfix to "-td". Also added a flag in AcquisitionImageData to flage if it is denoised. This will help future query of total session images etc.
Updated by Sargis Dallakyan over 5 years ago
Finished last step of Topaz picks extraction and committing them to Appion db. For Train and Extract step that require gpu, I've added a new parameter called Queue that defaults to gpu1 to match our queue on SEMC cluster. I've also added ppn=12 because otherwise it might run a job in a gpu node where someone else is already running a gpu job. This is again specific to SEMC cluster.
For Topaz picks extraction, it sets particle diameter as 2*radius*scale which is in pixels. DoG picker, on the other hand, sets particle diameter in Ångstroms.
View picks in multi-assessor doesn't work for Topaz picks but P button in the viewer shows them.
Some of the input/output folders might be better set as read only because changing defaults might break things downstream. It only commits final picks in Appion db; it saves all intermediate steps on a file system.
Updated by Alex Noble over 5 years ago
Thank you Sargis. I have begun testing Topaz specific picking (v0.2.1) in Appion and so far it works as advertised! I have only tested on hole mag images to quickly go through... I will test on some real datasets soon.
Shaker and Swapnil: Tristan has a developmental version of Topaz that includes a general picker. It is currently in the dev branch of the github:
https://github.com/tbepler/topaz/tree/dev
Can you make another Singularity container for the dev branch and link/alias it to 'topaz-dev' so that we can test the general picker on the cluster in real-time as images come off the krioses?
Updated by Alex Noble over 5 years ago
Can the topaz-dev Singularity be updated from the dev branch:
https://github.com/tbepler/topaz/tree/dev
Tristan just pushed a critical update.
Thanks!
Updated by Alex Noble over 5 years ago
Can the topaz-dev Singularity be updated from the dev branch:
https://github.com/tbepler/topaz/tree/dev
Tristan just pushed a critical update.
Thanks!
Updated by Alex Noble over 5 years ago
Can the topaz-dev Singularity be updated from the dev branch:
https://github.com/tbepler/topaz/tree/dev
Tristan just pushed a critical update.
Thanks!
Updated by Alex Noble over 5 years ago
Can the topaz-dev Singularity be updated from the dev branch:
https://github.com/tbepler/topaz/tree/dev
Tristan just pushed another update.
Thanks!
Updated by Alex Noble over 5 years ago
Can the topaz-dev Singularity be updated from the dev branch:
https://github.com/tbepler/topaz/tree/dev
Tristan just pushed another update.
Thanks!
Updated by Alex Noble over 5 years ago
- File topaz.html topaz.html added
Could http://emgweb.nysbc.org/topaz.html be updated to the attached html file?
Thanks!