Project

General

Profile

How to add a new refinement method » History » Version 38

Anchi Cheng, 08/08/2011 01:36 PM

1 1 Dmitry Lyumkis
h1. How to add a new refinement method
2
3
h2. database architecture for refinement methods
4
5
The current database scheme for every refinement method (both single-model and multi-model) is shown below:
6
7 7 Dmitry Lyumkis
"database architecture for refinements":http://emg.nysbc.org/attachments/955/database_scheme_diagram.pdf
8 1 Dmitry Lyumkis
9
For reference, below is a diagram of the modifications to the refinement pipeline that have been performed for the refactoring. Color coding is as follows: 
10 7 Dmitry Lyumkis
11
"changes to the database architecture for refinements":http://emg.nysbc.org/attachments/954/database_scheme_diagram_changes.pdf
12 1 Dmitry Lyumkis
13
* all previous database tables / pointers that have remained unchanged during refactoring are blue. 
14
* database tables that are completely new are outlined AND filled in red
15
* database tables that have existed, but are modified are outlined in red, filled in white. The new additions are highlighted
16
* new pointers to other database tables are red; unmodified pointers are blue
17
* pointers to other database tables are all combined under "REFS"; if "REFS" is highlighted, this means that new pointers have been added
18
19 8 Dmitry Lyumkis
h2. How to add a new refinement
20
21
# determine the name of the new table in the database. In most cases, this will only be called "ApYourPackageRefineIterData." Unless there are specific parameters for each particle that you would like to save, this should probably contain all of your package-specific parameters. 
22 38 Anchi Cheng
# write a refinement preparation script in python (see example below).
23
* write the refinement job creation script in python. 
24 31 Dmitry Lyumkis
# write an upload script in python (see example below). Another option would be to have a script that converts your parameters into Appion / 3DEM format (see below), then upload as an external package (see below). 
25 8 Dmitry Lyumkis
26 26 Dmitry Lyumkis
h2. What's being done in the background
27
28
the ReconUploader base class takes care of a many different functions, specifically:
29
30
* general error-checking
31
* untarring results and creating necessary directories
32
* querying database for stack / model parameters
33
* reading .pickle file for run parameters or, if absent, calling a package-specific parameter parsing function (should be in the uploadYourPackageRefine.py script)
34
* reading particle data file (the general .txt file that contains all particle parameters)
35
* determining which iterations can / should be uploaded
36
* inserting all data associated with refinement tables, including the package-specific parameters; the latter is defined in the subclass. 
37
* creates metadata associated with parameters (e.g. Euler plots, Euler jumper calculations, FSC insertions, etc.)
38 1 Dmitry Lyumkis
* verifies the number of inserted iterations
39
40 38 Anchi Cheng
h2. Write refinement preparation script in python
41 1 Dmitry Lyumkis
42 38 Anchi Cheng
* In /myami/appion/bin create a new python module with a name starting with prepRefine and define in your module a subclass of Prep3DRefinement found in appionlib/apPrepRefine.py .  An working example is in myami/appion/bin/PrepRefine.py.  You may copy that as a start.
43
* Required function definitions are: setRefineMethod(), setFormat(), addStackToSend(), and add ModelToSend().  Please see the base class code for what variables they take.
44
* If there are options that need to be defined for the refinement method that is not in the base class, you may redefine setupParserOptions().  See /myami/appion/bin/prepRefineFrealign.py for an example of this.
45
46
h2. Write refinement job script in python
47
48
* In /myami/appion/appionlib, create a new module with a name starting with apRefineJob and define in your module a subclass of RefineJob found in appionlib/apRefineJob.py.  An working example is in myami/appion/appionlib/apRefineJobEman.py and apRefineJobFrealign.py (MPI setup used).
49
* For testing, use /myami/appion/bin/testRefine.py in place of runJob.py when construction the appion script command with all options.  When thie script runs successfully, it only print out the list of the commands without doing any real job creation and running.
50 25 Dmitry Lyumkis
51 34 Christopher Irving
h3. Add job type to Agent.
52 35 Christopher Irving
53 34 Christopher Irving
After you have added the new refinement methods job class it needs to be added to the job running agent by editting the file apAgent.py in appionlib.  
54 1 Dmitry Lyumkis
55 35 Christopher Irving
# Add the name of the module you created to the import statements at the top of the file.
56
# In the method _createJobInst_ add the new refinment job type to the condition statements.
57 34 Christopher Irving
<pre>
58
  Ex.
59
  elif "newJobType" == jobType:
60
            jobInstance = newModuleName.NewRefinementClass(command)
61
</pre>
62
63 8 Dmitry Lyumkis
h2. Upload refinement script in python
64
65
The script should be titled 'uploadYourPackageRefine.py'
66
67
This script performs all of the basic operations that are needed to upload a refinement to the database, such that it can be displayed in AppionWeb. The bulk of the job is performed with the ReconUploader.py base class, which is inherited by each new uploadYourPackageRefine.py subclass script. this means that the developer's job is simply to make sure that all of the particle / package parameters are being passed in a specific format. Effectively, the only things that need to be written to this script are: 
68
69 32 Dmitry Lyumkis
# define the basic operations that will be performed: this will setup basic package parameters and call on converter functions. The simplest case is the external refinement package uploader, in which case only the general refinement parameters are uploaded to the database:
70
71
<pre>
72 36 Dmitry Lyumkis
def __init__(self):
73
	###	DEFINE THE NAME OF THE PACKAGE
74
	self.package = "external_package"
75
	super(uploadExternalPackageScript, self).__init__()
76
77
#=====================
78
def start(self):
79
					
80
	### determine which iterations to upload; last iter is defaulted to infinity
81
	uploadIterations = self.verifyUploadIterations()				
82 37 Dmitry Lyumkis
				
83 36 Dmitry Lyumkis
	### upload each iteration
84
	for iteration in uploadIterations:
85
		for j in range(self.runparams['numberOfReferences']):
86
									
87 37 Dmitry Lyumkis
			### general error checking, these are the minimum files that are needed
88 1 Dmitry Lyumkis
			vol = os.path.join(self.resultspath, "recon_%s_it%.3d_vol%.3d.mrc" % (self.params['timestamp'], iteration, j+1))
89 37 Dmitry Lyumkis
			particledatafile = os.path.join(self.resultspath, "particle_data_%s_it%.3d_vol%.3d.txt" % (self.params['timestamp'], iteration, j+1))
90
			if not os.path.isfile(vol):
91
				apDisplay.printError("you must have an mrc volume file in the 'external_package_results' directory")
92
			if not os.path.isfile(particledatafile):
93
				apDisplay.printError("you must have a particle data file in the 'external_package_results' directory")										
94
									
95
			### make chimera snapshot of volume
96 36 Dmitry Lyumkis
			self.createChimeraVolumeSnapshot(vol, iteration, j+1)
97
			
98
			### instantiate database objects
99
			self.insertRefinementRunData(iteration, j+1)
100
			self.insertRefinementIterationData(iteration, j+1)
101
			
102
	### calculate Euler jumps
103
	self.calculateEulerJumpsAndGoodBadParticles(uploadIterations)
104 32 Dmitry Lyumkis
</pre>
105
106
107
In the single-model refinement case (example Xmipp projection-matching):
108 8 Dmitry Lyumkis
<pre>
109 33 Dmitry Lyumkis
def __init__(self):
110
	###	DEFINE THE NAME OF THE PACKAGE
111
	self.package = "Xmipp"
112
	self.multiModelRefinementRun = False
113
	super(uploadXmippProjectionMatchingRefinementScript, self).__init__()
114
115 8 Dmitry Lyumkis
def start(self):
116
	
117
	### database entry parameters
118
	package_table = 'ApXmippRefineIterData|xmippParams'
119
	
120
	### set projection-matching path
121
	self.projmatchpath = os.path.abspath(os.path.join(self.params['rundir'], self.runparams['package_params']['WorkingDir']))
122
123
	### check for variable root directories between file systems
124
	apXmipp.checkSelOrDocFileRootDirectoryInDirectoryTree(self.params['rundir'], self.runparams['cluster_root_path'], self.runparams['upload_root_path'])
125
126
	### determine which iterations to upload
127
	lastiter = self.findLastCompletedIteration()
128
	uploadIterations = self.verifyUploadIterations(lastiter)	
129
130
	### upload each iteration
131
	for iteration in uploadIterations:
132
	
133
		apDisplay.printColor("uploading iteration %d" % iteration, "cyan")
134
	
135
		### set package parameters, as they will appear in database entries
136
		package_database_object = self.instantiateProjMatchParamsData(iteration)
137
		
138
		### move FSC file to results directory
139
		oldfscfile = os.path.join(self.projmatchpath, "Iter_%d" % iteration, "Iter_%d_resolution.fsc" % iteration)
140
		newfscfile = os.path.join(self.resultspath, "recon_%s_it%.3d_vol001.fsc" % (self.params['timestamp'],iteration))
141
		if os.path.exists(oldfscfile):
142
			shutil.copyfile(oldfscfile, newfscfile)
143
		
144
		### create a stack of class averages and reprojections (optional)
145
		self.compute_stack_of_class_averages_and_reprojections(iteration)
146
			
147
		### create a text file with particle information
148
		self.createParticleDataFile(iteration)
149
				
150
		### create mrc file of map for iteration and reference number
151
		oldvol = os.path.join(self.projmatchpath, "Iter_%d" % iteration, "Iter_%d_reconstruction.vol" % iteration)
152
		newvol = os.path.join(self.resultspath, "recon_%s_it%.3d_vol001.mrc" % (self.params['timestamp'], iteration))
153
		mrccmd = "proc3d %s %s apix=%.3f" % (oldvol, newvol, self.runparams['apix'])
154
		apParam.runCmd(mrccmd, "EMAN")
155
		
156
		### make chimera snapshot of volume
157
		self.createChimeraVolumeSnapshot(newvol, iteration)
158
		
159
		### instantiate database objects
160
		self.insertRefinementRunData(iteration)
161
		self.insertRefinementIterationData(package_table, package_database_object, iteration)
162
			
163
	### calculate Euler jumps
164
	self.calculateEulerJumpsAndGoodBadParticles(uploadIterations)	
165
	
166
	### query the database for the completed refinements BEFORE deleting any files ... returns a dictionary of lists
167
	### e.g. {1: [5, 4, 3, 2, 1]} means 5 iters completed for refine 1
168
	complete_refinements = self.verifyNumberOfCompletedRefinements(multiModelRefinementRun=False)
169
	if self.params['cleanup_files'] is True:
170
		self.cleanupFiles(complete_refinements)
171
</pre>
172 9 Dmitry Lyumkis
in the multi-model refinement case (example Xmipp ML3D):
173 8 Dmitry Lyumkis
<pre>
174 33 Dmitry Lyumkis
def __init__(self):
175
	###	DEFINE THE NAME OF THE PACKAGE
176
	self.package = "XmippML3D"
177
	self.multiModelRefinementRun = True
178
	super(uploadXmippML3DScript, self).__init__()
179
180 8 Dmitry Lyumkis
def start(self):
181
	
182
	### database entry parameters
183
	package_table = 'ApXmippML3DRefineIterData|xmippML3DParams'
184
			
185
	### set ml3d path
186
	self.ml3dpath = os.path.abspath(os.path.join(self.params['rundir'], self.runparams['package_params']['WorkingDir'], "RunML3D"))
187
		
188
	### check for variable root directories between file systems
189
	apXmipp.checkSelOrDocFileRootDirectoryInDirectoryTree(self.params['rundir'], self.runparams['cluster_root_path'], self.runparams['upload_root_path'])
190
					
191
	### determine which iterations to upload
192
	lastiter = self.findLastCompletedIteration()
193
	uploadIterations = self.verifyUploadIterations(lastiter)				
194
195
	### create ml3d_lib.doc file somewhat of a workaround, but necessary to make projections
196
	total_num_2d_classes = self.createModifiedLibFile()
197
	
198
	### upload each iteration
199
	for iteration in uploadIterations:
200
		
201
		### set package parameters, as they will appear in database entries
202
		package_database_object = self.instantiateML3DParamsData(iteration)
203
		
204
		for j in range(self.runparams['package_params']['NumberOfReferences']):
205
			
206
			### calculate FSC for each iteration using split selfile (selfile requires root directory change)
207
			self.calculateFSCforIteration(iteration, j+1)
208
			
209
			### create a stack of class averages and reprojections (optional)
210
			self.compute_stack_of_class_averages_and_reprojections(iteration, j+1)
211
				
212
			### create a text file with particle information
213
			self.createParticleDataFile(iteration, j+1, total_num_2d_classes)
214
					
215
			### create mrc file of map for iteration and reference number
216
			oldvol = os.path.join(self.ml3dpath, "ml3d_it%.6d_vol%.6d.vol" % (iteration, j+1))
217
			newvol = os.path.join(self.resultspath, "recon_%s_it%.3d_vol%.3d.mrc" % (self.params['timestamp'], iteration, j+1))
218
			mrccmd = "proc3d %s %s apix=%.3f" % (oldvol, newvol, self.runparams['apix'])
219
			apParam.runCmd(mrccmd, "EMAN")
220
			
221
			### make chimera snapshot of volume
222
			self.createChimeraVolumeSnapshot(newvol, iteration, j+1)
223
			
224
			### instantiate database objects
225
			self.insertRefinementRunData(iteration, j+1)
226
			self.insertRefinementIterationData(package_table, package_database_object, iteration, j+1)
227
			
228
	### calculate Euler jumps
229
	self.calculateEulerJumpsAndGoodBadParticles(uploadIterations)			
230
		
231
	### query the database for the completed refinements BEFORE deleting any files ... returns a dictionary of lists
232
	### e.g. {1: [5, 4, 3, 2, 1], 2: [6, 5, 4, 3, 2, 1]} means 5 iters completed for refine 1 & 6 iters completed for refine 2
233
	complete_refinements = self.verifyNumberOfCompletedRefinements(multiModelRefinementRun=True)
234
	if self.params['cleanup_files'] is True:
235 1 Dmitry Lyumkis
		self.cleanupFiles(complete_refinements)
236
</pre>
237 11 Dmitry Lyumkis
# write python functions that will convert parameters. Examples of these converters can be found in the python scripts below:
238
239 15 Dmitry Lyumkis
http://emg.nysbc.org/svn/myami/trunk/appion/bin/uploadXmippRefine.py (simplest)
240
http://emg.nysbc.org/svn/myami/trunk/appion/bin/uploadXmippML3DRefine.py (simple multi-model refinement case)
241
http://emg.nysbc.org/svn/myami/trunk/appion/bin/uploadEMANRefine.py (complicated, due to additional features / add-ons)
242 11 Dmitry Lyumkis
243
Below is a list of necessary functions, everything else is optional: 
244 9 Dmitry Lyumkis
245 10 Dmitry Lyumkis
* def __init__(): defines the name of the package
246
* def findLastCompletedIteration(): finds the last completed iteration in the refinement protocol
247
* def instantiateProjMatchParamsData(): this is for projection-matching in Xmipp; it needs to be specific to each package that is added
248
* def compute_stack_of_class_averages_and_reprojections(): creates .img/.hed files that show, for each angular increment: (1) projection and (2) class average correspond to that projection
249
* def createParticleDataFile(): this makes a .txt file that will put all parameters in Appion format. Information in this file is read by ReconUploader.py class and uploaded to the database. 
250
* def cleanupFiles(): this will remove all the redundant or unwanted files that have been created during the refinement procedure. 
251
* (optional) def some_function_for_computing_FSC_into_standard_format(): this will be called in start(). It should only be written if the FSC file is not in the specified format 
252 1 Dmitry Lyumkis
* (optional) def parseFileForRunParameters(): This is a BACKUP. It parses the output files created by the refinement to determine the parameters that have been specified. It is only needed if the parameters were not found in the .pickle created during the job setup. 
253 15 Dmitry Lyumkis
254 30 Dmitry Lyumkis
h2. Appion parameter format
255 15 Dmitry Lyumkis
256
In order to utilize the base class ReconUploader.py to upload all parameters associated with the refinement the following files must exist: 
257
258 24 Dmitry Lyumkis
# an "FSC file":http://emg.nysbc.org/attachments/964/recon_11jul18z_it001_vol001.fsc. Lines that are not read should begin with a "#". Otherwise, the first column must have values in inverse pixels. The second column must have the Fourier shell correlation for that spatial frequency. You can have as many additional columns as you would like, but they will be skipped by ReconUploader.py
259 28 Dmitry Lyumkis
# .img/.hed files describing projections from the model and class averages belonging to those Euler angles. The format is as follows: image 1 - projection 1, image 2 - class average 1, image 3 - projection 2, image 4 - class average 2, etc., see below !projections_and_averages.png!
260 15 Dmitry Lyumkis
# the 3D volume in mrc format
261 29 Dmitry Lyumkis
# a text file describing the parameters for each particle. NOTE: PARTICLE NUMBERING STARTS WITH 1, NOT 0. An "example file":http://emg.nysbc.org/attachments/963/particle_data_11jul18z_it001_vol001.txt is attached. The columns are as follows:
262 23 Dmitry Lyumkis
## particle number - starts with 1!!!
263
## phi Euler angle - rotation Euler angle around Z, in degrees
264
## theta Euler angle - rotation Euler angle around new Y, in degrees
265
## omega Euler angle - rotation Euler angle around new Z (in-plane rotation), in degrees
266
## shiftx - in pixels
267
## shifty - in pixels
268 16 Dmitry Lyumkis
## mirror - specify 1 if particle is mirrored, 0 otherwise. If mirrors are NOT handled in the package, and are represented by different Euler angles, leave as 0
269
## 3D reference # - 1, 2, 3, etc. Use 1 for single-model refinement case
270
## 2D class # - the number of the class to which the particle belongs. Leave as 0 if these are not defined
271 22 Dmitry Lyumkis
## quality factor - leave as 0 if not defined 
272 16 Dmitry Lyumkis
## kept particle - specifies whether or not the particle was discarded during the reconstruction routine. If it was KEPT, specify 1, if it was DISCARDED, specify 0. If all particles are kept, all should have a 1. 
273
## post Refine kept particle (optional) - in most cases just leave as 1 for all particles