Project

General

Profile

Actions

Feature #3971

open

Add RELION 2D alignment and classification

Added by Neil Voss over 8 years ago. Updated almost 8 years ago.

Status:
In Test
Priority:
Normal
Assignee:
Carl Negro
Category:
-
Target version:
-
Start date:
02/19/2016
Due date:
% Done:

0%

Estimated time:

Files


Related issues 2 (1 open1 closed)

Related to Appion - Bug #4216: add function to get box size with EMAN1In TestCarl Negro05/26/2016

Actions
Related to Appion - Bug #4396: Relion 2d alignment fails on upload with empty classesClosedNeil Voss08/18/2016

Actions
Actions #1

Updated by Neil Voss about 8 years ago

Reference for program:
http://www2.mrc-lmb.cam.ac.uk/relion/index.php/Calculate_2D_class_averages

Installed RELION inside docker
Needed to use FLTK v1.3.0 (CentOS only had v1.1.10)
Native FFTW 3.2.1 (RELION came with 3.2.2)

Annoyed that I had to open the GUI to figure out what program to run. Appears to use relion_refine as its main program.

+++ RELION: command line arguments (with defaults for optional ones between parantheses) +++
====== General options ===== 
                                --i : Input images (in a star-file or a stack)
                                --o : Output rootname
                           --angpix : Pixel size (in Angstroms)
                        --iter (50) : Maximum number of iterations to perform
                   --tau2_fudge (1) : Regularisation parameter (values higher than 1 give more weight to the data)
                            --K (1) : Number of references to be refined
           --particle_diameter (-1) : Diameter of the circular mask that will be applied to the experimental images (in Angstroms)
                --zero_mask (false) : Mask surrounding background in particles to zero (by default the solvent area is filled with random noise)
          --flatten_solvent (false) : Perform masking on the references as well?
              --solvent_mask (None) : User-provided mask for the references (default is to use spherical mask with particle_diameter)
             --solvent_mask2 (None) : User-provided secondary mask (with its own average density)
                       --tau (None) : STAR file with input tau2-spectrum (to be kept constant)
      --split_random_halves (false) : Refine two random halves of the data completely separately
       --low_resol_join_halves (-1) : Resolution (in Angstrom) up to which the two random half-reconstructions will not be independent to prevent diverging orientations
====== Initialisation ===== 
                       --ref (None) : Image, stack or star-file with the reference(s). (Compulsory for 3D refinement!)
                       --offset (3) : Initial estimated stddev for the origin offsets
             --firstiter_cc (false) : Perform CC-calculation in the first iteration (use this if references are not on the absolute intensity scale)
                    --ini_high (-1) : Resolution (in Angstroms) to which to limit refinement in the first iteration 
====== Orientations ===== 
                 --oversampling (1) : Adaptive oversampling order to speed-up calculations (0=no oversampling, 1=2x, 2=4x, etc)
                --healpix_order (2) : Healpix order for the angular sampling (before oversampling) on the (3D) sphere: hp2=15deg, hp3=7.5deg, etc
                    --psi_step (-1) : Sampling rate (before oversampling) for the in-plane angle (default=10deg for 2D, hp sampling for 3D)
                 --limit_tilt (-91) : Limited tilt angle: positive for keeping side views, negative for keeping top views
                         --sym (c1) : Symmetry group
                 --offset_range (6) : Search range for origin offsets (in pixels)
                  --offset_step (2) : Sampling rate (before oversampling) for origin offsets (in pixels)
                    --perturb (0.5) : Perturbation factor for the angular sampling (0=no perturb; 0.5=perturb)
              --auto_refine (false) : Perform 3D auto-refine procedure?
     --auto_local_healpix_order (4) : Minimum healpix order (before oversampling) from which autosampling procedure will use local searches
                   --sigma_ang (-1) : Stddev on all three Euler angles for local angular searches (of +/- 3 stddev)
                   --sigma_rot (-1) : Stddev on the first Euler angle for local angular searches (of +/- 3 stddev)
                  --sigma_tilt (-1) : Stddev on the second Euler angle for local angular searches (of +/- 3 stddev)
                   --sigma_psi (-1) : Stddev on the in-plane angle for local angular searches (of +/- 3 stddev)
               --skip_align (false) : Skip orientational assignment (only classify)?
              --skip_rotate (false) : Skip rotational assignment (only translate and classify)?
====== Corrections ===== 
                      --ctf (false) : Perform CTF correction?
    --ctf_intact_first_peak (false) : Ignore CTFs until their first peak?
        --ctf_corrected_ref (false) : Have the input references been CTF-amplitude corrected?
        --ctf_phase_flipped (false) : Have the data been CTF phase-flipped?
         --only_flip_phases (false) : Only perform CTF phase-flipping? (default is full amplitude-correction)
                     --norm (false) : Perform normalisation-error correction?
                    --scale (false) : Perform intensity-scale corrections on image groups?
====== Computation ===== 
                            --j (1) : Number of threads to run in parallel (only useful on multi-core machines)
            --memory_per_thread (2) : Available RAM (in Gb) for each thread
  --dont_combine_weights_via_disc (false) : Send the large arrays of summed weights through the MPI network, instead of writing large files to disc
          --onthefly_shifts (false) : Calculate shifted images on-the-fly, do not store precalculated ones in memory
      --no_parallel_disc_io (false) : Do NOT let parallel (MPI) processes access the disc simultaneously (use this option with NFS)
           --preread_images (false) : Use this to let the master process read all particles into memory. Be careful you have enough RAM for large data sets!
====== Expert options ===== 
                          --pad (2) : Oversampling factor for the Fourier transforms of the references
                       --NN (false) : Perform nearest-neighbour instead of linear Fourier-space interpolation?
                    --r_min_nn (10) : Minimum number of Fourier shells to perform linear Fourier-space interpolation
                         --verb (1) : Verbosity (1=normal, 0=silent)
                 --random_seed (-1) : Number for the random seed generator
                 --coarse_size (-1) : Maximum image size for the first pass of the adaptive sampling approach
        --adaptive_fraction (0.999) : Fraction of the weights to be considered in the first pass of adaptive oversampling 
                     --maskedge (5) : Width of the soft edge of the spherical mask (in pixels)
          --fix_sigma_noise (false) : Fix the experimental noise spectra?
         --fix_sigma_offset (false) : Fix the stddev in the origin offsets?
                   --incr_size (10) : Number of Fourier shells beyond the current resolution to be included in refinement
    --print_metadata_labels (false) : Print a table with definitions of all metadata labels, and exit
       --print_symmetry_ops (false) : Print all symmetry transformation matrices, and exit
          --strict_highres_exp (-1) : Resolution limit (in Angstrom) to restrict probability calculations in the expectation step
          --dont_check_norm (false) : Skip the check whether the images are normalised correctly
                --always_cc (false) : Perform CC-calculation in all iterations (useful for faster denovo model generation?)

Updated by Neil Voss about 8 years ago

Alright I thought I would move the conversation here.

Hi Carl. Why do we need to have SO MANY cluster parameters. Xmipp maximum likelihood has 1 cluster parameter; RELION has 9+ cluster parameters (attached). They are basically the same program. Do we really need this fine control? How can we streamline this?

Actions #3

Updated by Carl Negro about 8 years ago

Hi Neil,

We found that with certain jobs relionmaxlike would break or hang forever unless we micromanaged the number of processors and memory. Not sure how we could get around this. We can probably set the number of threads to one for all jobs. The number of MPI nodes is the same as the number of nodes in the cluster parameters, but I couldn't figure out how to pass this value over properly.

Actions #4

Updated by Neil Voss about 8 years ago

Maybe we could use the boxsize and the number classes (or other parameters) to make a calculation of the needed memory. I will check the documentation.

Actions #5

Updated by Neil Voss about 8 years ago

Uploader is working, but there is no way to create command from web at the moment. I want to dig into the databases to look at how the information is stored.

Easiest way to test is to go the alignment directory and give it the project number:

cd /emg/data/appion/06jul12a/align/maxlike1
uploadRelion2DMaxlikeAlign.py --commit --projectid=1

Actions #6

Updated by Neil Voss about 8 years ago

Hi Anchi, I was looking at the database tables and I am very confused.

Amber created this file 'checkAlignJobs.php' and it looks like it was for SPARX ISAC, but I dunno.

How do upload the data from Xmipp CL2D? Can we upload it? Or does it automatically upload after the job is done? Is Xmipp Maximum likelihood the only alignment where alignment and uploading is separated? The alignment section seems pretty broken.

Actions #7

Updated by Anchi Cheng about 8 years ago

I asked the users. Xmipp CL2D is uploaded automatically without a separate upload step. They said only Xmipp Max Likelihood runs need separate upload but neigther of them use ISAC.

Actions #8

Updated by Neil Voss about 8 years ago

Thanks for checking. So, I guess I have a philosophy question. It is easier to just make them upload after finishing, but in the past if the alignment runs for more than 3 days (or whatever is set in the mysql config) then python loses the database connection and cannot upload.

Should we assume RELION 2d alignment will take more than 3 days and do a separate upload, or should I just plug them together into one file?

Actions #9

Updated by Neil Voss about 8 years ago

Or do you see any reason that at the end of the run the python program could launch the upload script using the subprocess.Popen command. I could change Xmipp maxlike to do this too.

Actions #10

Updated by Bridget Carragher about 8 years ago

Can we have it so that it automatically uploads if it can and if not offers the user the option of the upload?

Actions #11

Updated by Neil Voss about 8 years ago

  • Status changed from Assigned to In Test
  • Assignee changed from Neil Voss to Carl Negro

I am calling this done. Need a tester.

Actions #12

Updated by Carl Negro about 8 years ago

I get the following error when running uploadRelion2DMaxlikealign.py:

=============================
Oversampling= 1 NrHiddenVariableSamplingPoints= 66816
OrientationalSampling= 2.5 NrOrientations= 576
TranslationalSampling= 1 NrTranslations= 116 =============================
Estimated memory for expectation step > 0.101925 Gb, available memory = 2 Gb.
Estimated memory for maximization step > 0.000118688 Gb, available memory = 2 Gb.
Expectation iteration 30 of 30
0/ 0 sec ............................................................~~(,_,">
Maximization ...
0/ 0 sec ............................................................~~(,_,">
... Sorting files into clean folders
... Sorted 155 iteration files
... Sorted 154 reference files
Reading star format file: ref16may26m53_final_data.star
Looking for Data Block named data_images...
Found Data Block: data_images
001 -- 36.8 - 36.803607
002 -
23.2 -- 23.196393
003 -- 119.3 - 119.303607
004 -
124.3 - 124.303607
005 -
30.7 -- 30.696393
006 -- 111.8 - 111.803607
007 -
74.3 - 74.303607
008 -
126.8 - 126.803607
... read rotation and shift parameters for 8 references
Reading star format file: part16may26m53_final_data.star
Looking for Data Block named data_images...
Found Data Block: data_images
001 -
112.7 - 143.352276
002 -
85.9 - 38.409389
003 -- 92.6 -- 61.894234
004 -- 61.9 -- 178.807379
005 -
119.1 -- 88.409389
006 -- 125.9 - 156.590611
007 -
24.9 - 48.105766
008 -
74.9 - 105.605766
009 -
43.4 - -74.090611
... read rotation and shift parameters for 3969 particles
... rotating and shifting particles at Thu May 26 12:59:47 2016
........................................
... writing aligned particles to file alignstack3969.hed
... 3969 particles in alignstack3969.img (11.9 MB)
... found 3969 particles
... size match 11.9 MB vs. 11.9 MB
... alignstack3969.hed (3969 kB)
... wrote 3969 particles to file alignstack.hed
... size match 3.9 MB vs. 3.9 MB
... finished stack merge of alignstack.hed in 181.69 msec
... rotated and shifted 3969 particles in 5.18 sec
/bin/sh: iminfo: command not found
Traceback (most recent call last):
File "/home/cnegro/myami-trunk/appion/bin/uploadRelion2DMaxlikeAlign.py", line 537, in <module>
maxLike.start()
File "/home/cnegro/myami-trunk/appion/bin/uploadRelion2DMaxlikeAlign.py", line 521, in start
apStack.averageStack(alignimagicfile, msg=False)
File "/home/cnegro/myami-trunk/appion/appionlib/apStack.py", line 332, in averageStack
avgStack.start(stackfile, partlist)
File "/home/cnegro/myami-trunk/appion/appionlib/apImagicFile.py", line 1016, in start
self.processStack(stackarray)
File "/home/cnegro/myami-trunk/appion/appionlib/apStack.py", line 355, in processStack
self.average += stackarray.sum(0)
ValueError: invalid return array shape

Actions #13

Updated by Neil Voss about 8 years ago

  • Related to Bug #4216: add function to get box size with EMAN1 added
Actions #14

Updated by Neil Voss about 8 years ago

First, I have no idea why it would crash there.

I did find it interesting "/bin/sh: iminfo: command not found" Why is iminfo being called? apFile.getBoxSize() uses EMAN1 to get boxsize.

So it could be dying on that command.

Actions #15

Updated by Bridget Carragher about 8 years ago

I am happy to test and have done so. It is cool but there are a few bugs for sure. See:
#4229
in which I can’t upload so can’t check if any of the metadata tracking is working.
There are also some issues with doc pop up meanings and defaults - the worst one is that the diameter default if wrong and means that only the very center of the images is focused on. I think the diameter default should either be something that was entered earlier or based on the box size.

Actions #16

Updated by Carl Negro almost 8 years ago

I added a couple of parameters, invert and normalization error checking, and cleaned up the web interface and pop up help files. The uploader is working as intended.

Actions #17

Updated by Neil Voss almost 8 years ago

Why do we need invert? I thought we force the white particles idea.

Actions #18

Updated by Neil Voss almost 8 years ago

  • Related to Bug #4396: Relion 2d alignment fails on upload with empty classes added
Actions

Also available in: Atom PDF