Project

General

Profile

How to add a new refinement method » History » Version 33

Dmitry Lyumkis, 08/01/2011 11:41 PM

1 1 Dmitry Lyumkis
h1. How to add a new refinement method
2
3
h2. database architecture for refinement methods
4
5
The current database scheme for every refinement method (both single-model and multi-model) is shown below:
6
7 7 Dmitry Lyumkis
"database architecture for refinements":http://emg.nysbc.org/attachments/955/database_scheme_diagram.pdf
8 1 Dmitry Lyumkis
9
For reference, below is a diagram of the modifications to the refinement pipeline that have been performed for the refactoring. Color coding is as follows: 
10 7 Dmitry Lyumkis
11
"changes to the database architecture for refinements":http://emg.nysbc.org/attachments/954/database_scheme_diagram_changes.pdf
12 1 Dmitry Lyumkis
13
* all previous database tables / pointers that have remained unchanged during refactoring are blue. 
14
* database tables that are completely new are outlined AND filled in red
15
* database tables that have existed, but are modified are outlined in red, filled in white. The new additions are highlighted
16
* new pointers to other database tables are red; unmodified pointers are blue
17
* pointers to other database tables are all combined under "REFS"; if "REFS" is highlighted, this means that new pointers have been added
18
19 8 Dmitry Lyumkis
h2. How to add a new refinement
20
21
# determine the name of the new table in the database. In most cases, this will only be called "ApYourPackageRefineIterData." Unless there are specific parameters for each particle that you would like to save, this should probably contain all of your package-specific parameters. 
22
# write a job setup script in python (see example below). 
23 31 Dmitry Lyumkis
# write an upload script in python (see example below). Another option would be to have a script that converts your parameters into Appion / 3DEM format (see below), then upload as an external package (see below). 
24 8 Dmitry Lyumkis
25 26 Dmitry Lyumkis
h2. What's being done in the background
26
27
the ReconUploader base class takes care of a many different functions, specifically:
28
29
* general error-checking
30
* untarring results and creating necessary directories
31
* querying database for stack / model parameters
32
* reading .pickle file for run parameters or, if absent, calling a package-specific parameter parsing function (should be in the uploadYourPackageRefine.py script)
33
* reading particle data file (the general .txt file that contains all particle parameters)
34
* determining which iterations can / should be uploaded
35
* inserting all data associated with refinement tables, including the package-specific parameters; the latter is defined in the subclass. 
36
* creates metadata associated with parameters (e.g. Euler plots, Euler jumper calculations, FSC insertions, etc.)
37
* verifies the number of inserted iterations
38
39 25 Dmitry Lyumkis
h2. Job setup script in python
40
41
NOT DONE YET ... NEED TO TALK TO ANCHI
42
43 8 Dmitry Lyumkis
h2. Upload refinement script in python
44
45
The script should be titled 'uploadYourPackageRefine.py'
46
47
This script performs all of the basic operations that are needed to upload a refinement to the database, such that it can be displayed in AppionWeb. The bulk of the job is performed with the ReconUploader.py base class, which is inherited by each new uploadYourPackageRefine.py subclass script. this means that the developer's job is simply to make sure that all of the particle / package parameters are being passed in a specific format. Effectively, the only things that need to be written to this script are: 
48
49 32 Dmitry Lyumkis
# define the basic operations that will be performed: this will setup basic package parameters and call on converter functions. The simplest case is the external refinement package uploader, in which case only the general refinement parameters are uploaded to the database:
50
51
<pre>
52
NOT DONE YET ...
53
</pre>
54
55
56
In the single-model refinement case (example Xmipp projection-matching):
57 8 Dmitry Lyumkis
<pre>
58 33 Dmitry Lyumkis
def __init__(self):
59
	###	DEFINE THE NAME OF THE PACKAGE
60
	self.package = "Xmipp"
61
	self.multiModelRefinementRun = False
62
	super(uploadXmippProjectionMatchingRefinementScript, self).__init__()
63
64 8 Dmitry Lyumkis
def start(self):
65
	
66
	### database entry parameters
67
	package_table = 'ApXmippRefineIterData|xmippParams'
68
	
69
	### set projection-matching path
70
	self.projmatchpath = os.path.abspath(os.path.join(self.params['rundir'], self.runparams['package_params']['WorkingDir']))
71
72
	### check for variable root directories between file systems
73
	apXmipp.checkSelOrDocFileRootDirectoryInDirectoryTree(self.params['rundir'], self.runparams['cluster_root_path'], self.runparams['upload_root_path'])
74
75
	### determine which iterations to upload
76
	lastiter = self.findLastCompletedIteration()
77
	uploadIterations = self.verifyUploadIterations(lastiter)	
78
79
	### upload each iteration
80
	for iteration in uploadIterations:
81
	
82
		apDisplay.printColor("uploading iteration %d" % iteration, "cyan")
83
	
84
		### set package parameters, as they will appear in database entries
85
		package_database_object = self.instantiateProjMatchParamsData(iteration)
86
		
87
		### move FSC file to results directory
88
		oldfscfile = os.path.join(self.projmatchpath, "Iter_%d" % iteration, "Iter_%d_resolution.fsc" % iteration)
89
		newfscfile = os.path.join(self.resultspath, "recon_%s_it%.3d_vol001.fsc" % (self.params['timestamp'],iteration))
90
		if os.path.exists(oldfscfile):
91
			shutil.copyfile(oldfscfile, newfscfile)
92
		
93
		### create a stack of class averages and reprojections (optional)
94
		self.compute_stack_of_class_averages_and_reprojections(iteration)
95
			
96
		### create a text file with particle information
97
		self.createParticleDataFile(iteration)
98
				
99
		### create mrc file of map for iteration and reference number
100
		oldvol = os.path.join(self.projmatchpath, "Iter_%d" % iteration, "Iter_%d_reconstruction.vol" % iteration)
101
		newvol = os.path.join(self.resultspath, "recon_%s_it%.3d_vol001.mrc" % (self.params['timestamp'], iteration))
102
		mrccmd = "proc3d %s %s apix=%.3f" % (oldvol, newvol, self.runparams['apix'])
103
		apParam.runCmd(mrccmd, "EMAN")
104
		
105
		### make chimera snapshot of volume
106
		self.createChimeraVolumeSnapshot(newvol, iteration)
107
		
108
		### instantiate database objects
109
		self.insertRefinementRunData(iteration)
110
		self.insertRefinementIterationData(package_table, package_database_object, iteration)
111
			
112
	### calculate Euler jumps
113
	self.calculateEulerJumpsAndGoodBadParticles(uploadIterations)	
114
	
115
	### query the database for the completed refinements BEFORE deleting any files ... returns a dictionary of lists
116
	### e.g. {1: [5, 4, 3, 2, 1]} means 5 iters completed for refine 1
117
	complete_refinements = self.verifyNumberOfCompletedRefinements(multiModelRefinementRun=False)
118
	if self.params['cleanup_files'] is True:
119
		self.cleanupFiles(complete_refinements)
120
</pre>
121 9 Dmitry Lyumkis
in the multi-model refinement case (example Xmipp ML3D):
122 8 Dmitry Lyumkis
<pre>
123 33 Dmitry Lyumkis
def __init__(self):
124
	###	DEFINE THE NAME OF THE PACKAGE
125
	self.package = "XmippML3D"
126
	self.multiModelRefinementRun = True
127
	super(uploadXmippML3DScript, self).__init__()
128
129 8 Dmitry Lyumkis
def start(self):
130
	
131
	### database entry parameters
132
	package_table = 'ApXmippML3DRefineIterData|xmippML3DParams'
133
			
134
	### set ml3d path
135
	self.ml3dpath = os.path.abspath(os.path.join(self.params['rundir'], self.runparams['package_params']['WorkingDir'], "RunML3D"))
136
		
137
	### check for variable root directories between file systems
138
	apXmipp.checkSelOrDocFileRootDirectoryInDirectoryTree(self.params['rundir'], self.runparams['cluster_root_path'], self.runparams['upload_root_path'])
139
					
140
	### determine which iterations to upload
141
	lastiter = self.findLastCompletedIteration()
142
	uploadIterations = self.verifyUploadIterations(lastiter)				
143
144
	### create ml3d_lib.doc file somewhat of a workaround, but necessary to make projections
145
	total_num_2d_classes = self.createModifiedLibFile()
146
	
147
	### upload each iteration
148
	for iteration in uploadIterations:
149
		
150
		### set package parameters, as they will appear in database entries
151
		package_database_object = self.instantiateML3DParamsData(iteration)
152
		
153
		for j in range(self.runparams['package_params']['NumberOfReferences']):
154
			
155
			### calculate FSC for each iteration using split selfile (selfile requires root directory change)
156
			self.calculateFSCforIteration(iteration, j+1)
157
			
158
			### create a stack of class averages and reprojections (optional)
159
			self.compute_stack_of_class_averages_and_reprojections(iteration, j+1)
160
				
161
			### create a text file with particle information
162
			self.createParticleDataFile(iteration, j+1, total_num_2d_classes)
163
					
164
			### create mrc file of map for iteration and reference number
165
			oldvol = os.path.join(self.ml3dpath, "ml3d_it%.6d_vol%.6d.vol" % (iteration, j+1))
166
			newvol = os.path.join(self.resultspath, "recon_%s_it%.3d_vol%.3d.mrc" % (self.params['timestamp'], iteration, j+1))
167
			mrccmd = "proc3d %s %s apix=%.3f" % (oldvol, newvol, self.runparams['apix'])
168
			apParam.runCmd(mrccmd, "EMAN")
169
			
170
			### make chimera snapshot of volume
171
			self.createChimeraVolumeSnapshot(newvol, iteration, j+1)
172
			
173
			### instantiate database objects
174
			self.insertRefinementRunData(iteration, j+1)
175
			self.insertRefinementIterationData(package_table, package_database_object, iteration, j+1)
176
			
177
	### calculate Euler jumps
178
	self.calculateEulerJumpsAndGoodBadParticles(uploadIterations)			
179
		
180
	### query the database for the completed refinements BEFORE deleting any files ... returns a dictionary of lists
181
	### e.g. {1: [5, 4, 3, 2, 1], 2: [6, 5, 4, 3, 2, 1]} means 5 iters completed for refine 1 & 6 iters completed for refine 2
182
	complete_refinements = self.verifyNumberOfCompletedRefinements(multiModelRefinementRun=True)
183
	if self.params['cleanup_files'] is True:
184 1 Dmitry Lyumkis
		self.cleanupFiles(complete_refinements)
185
</pre>
186 11 Dmitry Lyumkis
# write python functions that will convert parameters. Examples of these converters can be found in the python scripts below:
187
188 15 Dmitry Lyumkis
http://emg.nysbc.org/svn/myami/trunk/appion/bin/uploadXmippRefine.py (simplest)
189
http://emg.nysbc.org/svn/myami/trunk/appion/bin/uploadXmippML3DRefine.py (simple multi-model refinement case)
190
http://emg.nysbc.org/svn/myami/trunk/appion/bin/uploadEMANRefine.py (complicated, due to additional features / add-ons)
191 11 Dmitry Lyumkis
192
Below is a list of necessary functions, everything else is optional: 
193 9 Dmitry Lyumkis
194 10 Dmitry Lyumkis
* def __init__(): defines the name of the package
195
* def findLastCompletedIteration(): finds the last completed iteration in the refinement protocol
196
* def instantiateProjMatchParamsData(): this is for projection-matching in Xmipp; it needs to be specific to each package that is added
197
* def compute_stack_of_class_averages_and_reprojections(): creates .img/.hed files that show, for each angular increment: (1) projection and (2) class average correspond to that projection
198
* def createParticleDataFile(): this makes a .txt file that will put all parameters in Appion format. Information in this file is read by ReconUploader.py class and uploaded to the database. 
199
* def cleanupFiles(): this will remove all the redundant or unwanted files that have been created during the refinement procedure. 
200
* (optional) def some_function_for_computing_FSC_into_standard_format(): this will be called in start(). It should only be written if the FSC file is not in the specified format 
201 1 Dmitry Lyumkis
* (optional) def parseFileForRunParameters(): This is a BACKUP. It parses the output files created by the refinement to determine the parameters that have been specified. It is only needed if the parameters were not found in the .pickle created during the job setup. 
202 15 Dmitry Lyumkis
203 30 Dmitry Lyumkis
h2. Appion parameter format
204 15 Dmitry Lyumkis
205
In order to utilize the base class ReconUploader.py to upload all parameters associated with the refinement the following files must exist: 
206
207 24 Dmitry Lyumkis
# an "FSC file":http://emg.nysbc.org/attachments/964/recon_11jul18z_it001_vol001.fsc. Lines that are not read should begin with a "#". Otherwise, the first column must have values in inverse pixels. The second column must have the Fourier shell correlation for that spatial frequency. You can have as many additional columns as you would like, but they will be skipped by ReconUploader.py
208 28 Dmitry Lyumkis
# .img/.hed files describing projections from the model and class averages belonging to those Euler angles. The format is as follows: image 1 - projection 1, image 2 - class average 1, image 3 - projection 2, image 4 - class average 2, etc., see below !projections_and_averages.png!
209 15 Dmitry Lyumkis
# the 3D volume in mrc format
210 29 Dmitry Lyumkis
# a text file describing the parameters for each particle. NOTE: PARTICLE NUMBERING STARTS WITH 1, NOT 0. An "example file":http://emg.nysbc.org/attachments/963/particle_data_11jul18z_it001_vol001.txt is attached. The columns are as follows:
211 23 Dmitry Lyumkis
## particle number - starts with 1!!!
212
## phi Euler angle - rotation Euler angle around Z, in degrees
213
## theta Euler angle - rotation Euler angle around new Y, in degrees
214
## omega Euler angle - rotation Euler angle around new Z (in-plane rotation), in degrees
215
## shiftx - in pixels
216
## shifty - in pixels
217 16 Dmitry Lyumkis
## mirror - specify 1 if particle is mirrored, 0 otherwise. If mirrors are NOT handled in the package, and are represented by different Euler angles, leave as 0
218
## 3D reference # - 1, 2, 3, etc. Use 1 for single-model refinement case
219
## 2D class # - the number of the class to which the particle belongs. Leave as 0 if these are not defined
220 22 Dmitry Lyumkis
## quality factor - leave as 0 if not defined 
221 16 Dmitry Lyumkis
## kept particle - specifies whether or not the particle was discarded during the reconstruction routine. If it was KEPT, specify 1, if it was DISCARDED, specify 0. If all particles are kept, all should have a 1. 
222
## post Refine kept particle (optional) - in most cases just leave as 1 for all particles