Project

General

Profile

How to add a new refinement method » History » Version 36

Dmitry Lyumkis, 08/02/2011 12:15 PM

1 1 Dmitry Lyumkis
h1. How to add a new refinement method
2
3
h2. database architecture for refinement methods
4
5
The current database scheme for every refinement method (both single-model and multi-model) is shown below:
6
7 7 Dmitry Lyumkis
"database architecture for refinements":http://emg.nysbc.org/attachments/955/database_scheme_diagram.pdf
8 1 Dmitry Lyumkis
9
For reference, below is a diagram of the modifications to the refinement pipeline that have been performed for the refactoring. Color coding is as follows: 
10 7 Dmitry Lyumkis
11
"changes to the database architecture for refinements":http://emg.nysbc.org/attachments/954/database_scheme_diagram_changes.pdf
12 1 Dmitry Lyumkis
13
* all previous database tables / pointers that have remained unchanged during refactoring are blue. 
14
* database tables that are completely new are outlined AND filled in red
15
* database tables that have existed, but are modified are outlined in red, filled in white. The new additions are highlighted
16
* new pointers to other database tables are red; unmodified pointers are blue
17
* pointers to other database tables are all combined under "REFS"; if "REFS" is highlighted, this means that new pointers have been added
18
19 8 Dmitry Lyumkis
h2. How to add a new refinement
20
21
# determine the name of the new table in the database. In most cases, this will only be called "ApYourPackageRefineIterData." Unless there are specific parameters for each particle that you would like to save, this should probably contain all of your package-specific parameters. 
22
# write a job setup script in python (see example below). 
23 31 Dmitry Lyumkis
# write an upload script in python (see example below). Another option would be to have a script that converts your parameters into Appion / 3DEM format (see below), then upload as an external package (see below). 
24 8 Dmitry Lyumkis
25 26 Dmitry Lyumkis
h2. What's being done in the background
26
27
the ReconUploader base class takes care of a many different functions, specifically:
28
29
* general error-checking
30
* untarring results and creating necessary directories
31
* querying database for stack / model parameters
32
* reading .pickle file for run parameters or, if absent, calling a package-specific parameter parsing function (should be in the uploadYourPackageRefine.py script)
33
* reading particle data file (the general .txt file that contains all particle parameters)
34
* determining which iterations can / should be uploaded
35
* inserting all data associated with refinement tables, including the package-specific parameters; the latter is defined in the subclass. 
36
* creates metadata associated with parameters (e.g. Euler plots, Euler jumper calculations, FSC insertions, etc.)
37
* verifies the number of inserted iterations
38
39 25 Dmitry Lyumkis
h2. Job setup script in python
40
41
NOT DONE YET ... NEED TO TALK TO ANCHI
42
43 34 Christopher Irving
h3. Add job type to Agent.
44 35 Christopher Irving
45 34 Christopher Irving
After you have added the new refinement methods job class it needs to be added to the job running agent by editting the file apAgent.py in appionlib.  
46 1 Dmitry Lyumkis
47 35 Christopher Irving
# Add the name of the module you created to the import statements at the top of the file.
48
# In the method _createJobInst_ add the new refinment job type to the condition statements.
49 34 Christopher Irving
<pre>
50
  Ex.
51
  elif "newJobType" == jobType:
52
            jobInstance = newModuleName.NewRefinementClass(command)
53
</pre>
54
55 8 Dmitry Lyumkis
h2. Upload refinement script in python
56
57
The script should be titled 'uploadYourPackageRefine.py'
58
59
This script performs all of the basic operations that are needed to upload a refinement to the database, such that it can be displayed in AppionWeb. The bulk of the job is performed with the ReconUploader.py base class, which is inherited by each new uploadYourPackageRefine.py subclass script. this means that the developer's job is simply to make sure that all of the particle / package parameters are being passed in a specific format. Effectively, the only things that need to be written to this script are: 
60
61 32 Dmitry Lyumkis
# define the basic operations that will be performed: this will setup basic package parameters and call on converter functions. The simplest case is the external refinement package uploader, in which case only the general refinement parameters are uploaded to the database:
62
63
<pre>
64 36 Dmitry Lyumkis
def __init__(self):
65
	###	DEFINE THE NAME OF THE PACKAGE
66
	self.package = "external_package"
67
	super(uploadExternalPackageScript, self).__init__()
68
69
#=====================
70
def start(self):
71
					
72
	### determine which iterations to upload; last iter is defaulted to infinity
73
	uploadIterations = self.verifyUploadIterations()				
74
	
75
	### upload each iteration
76
	for iteration in uploadIterations:
77
		for j in range(self.runparams['numberOfReferences']):
78
									
79
			### make chimera snapshot of volume
80
			vol = os.path.join(self.resultspath, "recon_%s_it%.3d_vol%.3d.mrc" % (self.params['timestamp'], iteration, j+1))
81
			self.createChimeraVolumeSnapshot(vol, iteration, j+1)
82
			
83
			### instantiate database objects
84
			self.insertRefinementRunData(iteration, j+1)
85
			self.insertRefinementIterationData(iteration, j+1)
86
			
87
	### calculate Euler jumps
88
	self.calculateEulerJumpsAndGoodBadParticles(uploadIterations)
89 32 Dmitry Lyumkis
</pre>
90
91
92
In the single-model refinement case (example Xmipp projection-matching):
93 8 Dmitry Lyumkis
<pre>
94 33 Dmitry Lyumkis
def __init__(self):
95
	###	DEFINE THE NAME OF THE PACKAGE
96
	self.package = "Xmipp"
97
	self.multiModelRefinementRun = False
98
	super(uploadXmippProjectionMatchingRefinementScript, self).__init__()
99
100 8 Dmitry Lyumkis
def start(self):
101
	
102
	### database entry parameters
103
	package_table = 'ApXmippRefineIterData|xmippParams'
104
	
105
	### set projection-matching path
106
	self.projmatchpath = os.path.abspath(os.path.join(self.params['rundir'], self.runparams['package_params']['WorkingDir']))
107
108
	### check for variable root directories between file systems
109
	apXmipp.checkSelOrDocFileRootDirectoryInDirectoryTree(self.params['rundir'], self.runparams['cluster_root_path'], self.runparams['upload_root_path'])
110
111
	### determine which iterations to upload
112
	lastiter = self.findLastCompletedIteration()
113
	uploadIterations = self.verifyUploadIterations(lastiter)	
114
115
	### upload each iteration
116
	for iteration in uploadIterations:
117
	
118
		apDisplay.printColor("uploading iteration %d" % iteration, "cyan")
119
	
120
		### set package parameters, as they will appear in database entries
121
		package_database_object = self.instantiateProjMatchParamsData(iteration)
122
		
123
		### move FSC file to results directory
124
		oldfscfile = os.path.join(self.projmatchpath, "Iter_%d" % iteration, "Iter_%d_resolution.fsc" % iteration)
125
		newfscfile = os.path.join(self.resultspath, "recon_%s_it%.3d_vol001.fsc" % (self.params['timestamp'],iteration))
126
		if os.path.exists(oldfscfile):
127
			shutil.copyfile(oldfscfile, newfscfile)
128
		
129
		### create a stack of class averages and reprojections (optional)
130
		self.compute_stack_of_class_averages_and_reprojections(iteration)
131
			
132
		### create a text file with particle information
133
		self.createParticleDataFile(iteration)
134
				
135
		### create mrc file of map for iteration and reference number
136
		oldvol = os.path.join(self.projmatchpath, "Iter_%d" % iteration, "Iter_%d_reconstruction.vol" % iteration)
137
		newvol = os.path.join(self.resultspath, "recon_%s_it%.3d_vol001.mrc" % (self.params['timestamp'], iteration))
138
		mrccmd = "proc3d %s %s apix=%.3f" % (oldvol, newvol, self.runparams['apix'])
139
		apParam.runCmd(mrccmd, "EMAN")
140
		
141
		### make chimera snapshot of volume
142
		self.createChimeraVolumeSnapshot(newvol, iteration)
143
		
144
		### instantiate database objects
145
		self.insertRefinementRunData(iteration)
146
		self.insertRefinementIterationData(package_table, package_database_object, iteration)
147
			
148
	### calculate Euler jumps
149
	self.calculateEulerJumpsAndGoodBadParticles(uploadIterations)	
150
	
151
	### query the database for the completed refinements BEFORE deleting any files ... returns a dictionary of lists
152
	### e.g. {1: [5, 4, 3, 2, 1]} means 5 iters completed for refine 1
153
	complete_refinements = self.verifyNumberOfCompletedRefinements(multiModelRefinementRun=False)
154
	if self.params['cleanup_files'] is True:
155
		self.cleanupFiles(complete_refinements)
156
</pre>
157 9 Dmitry Lyumkis
in the multi-model refinement case (example Xmipp ML3D):
158 8 Dmitry Lyumkis
<pre>
159 33 Dmitry Lyumkis
def __init__(self):
160
	###	DEFINE THE NAME OF THE PACKAGE
161
	self.package = "XmippML3D"
162
	self.multiModelRefinementRun = True
163
	super(uploadXmippML3DScript, self).__init__()
164
165 8 Dmitry Lyumkis
def start(self):
166
	
167
	### database entry parameters
168
	package_table = 'ApXmippML3DRefineIterData|xmippML3DParams'
169
			
170
	### set ml3d path
171
	self.ml3dpath = os.path.abspath(os.path.join(self.params['rundir'], self.runparams['package_params']['WorkingDir'], "RunML3D"))
172
		
173
	### check for variable root directories between file systems
174
	apXmipp.checkSelOrDocFileRootDirectoryInDirectoryTree(self.params['rundir'], self.runparams['cluster_root_path'], self.runparams['upload_root_path'])
175
					
176
	### determine which iterations to upload
177
	lastiter = self.findLastCompletedIteration()
178
	uploadIterations = self.verifyUploadIterations(lastiter)				
179
180
	### create ml3d_lib.doc file somewhat of a workaround, but necessary to make projections
181
	total_num_2d_classes = self.createModifiedLibFile()
182
	
183
	### upload each iteration
184
	for iteration in uploadIterations:
185
		
186
		### set package parameters, as they will appear in database entries
187
		package_database_object = self.instantiateML3DParamsData(iteration)
188
		
189
		for j in range(self.runparams['package_params']['NumberOfReferences']):
190
			
191
			### calculate FSC for each iteration using split selfile (selfile requires root directory change)
192
			self.calculateFSCforIteration(iteration, j+1)
193
			
194
			### create a stack of class averages and reprojections (optional)
195
			self.compute_stack_of_class_averages_and_reprojections(iteration, j+1)
196
				
197
			### create a text file with particle information
198
			self.createParticleDataFile(iteration, j+1, total_num_2d_classes)
199
					
200
			### create mrc file of map for iteration and reference number
201
			oldvol = os.path.join(self.ml3dpath, "ml3d_it%.6d_vol%.6d.vol" % (iteration, j+1))
202
			newvol = os.path.join(self.resultspath, "recon_%s_it%.3d_vol%.3d.mrc" % (self.params['timestamp'], iteration, j+1))
203
			mrccmd = "proc3d %s %s apix=%.3f" % (oldvol, newvol, self.runparams['apix'])
204
			apParam.runCmd(mrccmd, "EMAN")
205
			
206
			### make chimera snapshot of volume
207
			self.createChimeraVolumeSnapshot(newvol, iteration, j+1)
208
			
209
			### instantiate database objects
210
			self.insertRefinementRunData(iteration, j+1)
211
			self.insertRefinementIterationData(package_table, package_database_object, iteration, j+1)
212
			
213
	### calculate Euler jumps
214
	self.calculateEulerJumpsAndGoodBadParticles(uploadIterations)			
215
		
216
	### query the database for the completed refinements BEFORE deleting any files ... returns a dictionary of lists
217
	### e.g. {1: [5, 4, 3, 2, 1], 2: [6, 5, 4, 3, 2, 1]} means 5 iters completed for refine 1 & 6 iters completed for refine 2
218
	complete_refinements = self.verifyNumberOfCompletedRefinements(multiModelRefinementRun=True)
219
	if self.params['cleanup_files'] is True:
220 1 Dmitry Lyumkis
		self.cleanupFiles(complete_refinements)
221
</pre>
222 11 Dmitry Lyumkis
# write python functions that will convert parameters. Examples of these converters can be found in the python scripts below:
223
224 15 Dmitry Lyumkis
http://emg.nysbc.org/svn/myami/trunk/appion/bin/uploadXmippRefine.py (simplest)
225
http://emg.nysbc.org/svn/myami/trunk/appion/bin/uploadXmippML3DRefine.py (simple multi-model refinement case)
226
http://emg.nysbc.org/svn/myami/trunk/appion/bin/uploadEMANRefine.py (complicated, due to additional features / add-ons)
227 11 Dmitry Lyumkis
228
Below is a list of necessary functions, everything else is optional: 
229 9 Dmitry Lyumkis
230 10 Dmitry Lyumkis
* def __init__(): defines the name of the package
231
* def findLastCompletedIteration(): finds the last completed iteration in the refinement protocol
232
* def instantiateProjMatchParamsData(): this is for projection-matching in Xmipp; it needs to be specific to each package that is added
233
* def compute_stack_of_class_averages_and_reprojections(): creates .img/.hed files that show, for each angular increment: (1) projection and (2) class average correspond to that projection
234
* def createParticleDataFile(): this makes a .txt file that will put all parameters in Appion format. Information in this file is read by ReconUploader.py class and uploaded to the database. 
235
* def cleanupFiles(): this will remove all the redundant or unwanted files that have been created during the refinement procedure. 
236
* (optional) def some_function_for_computing_FSC_into_standard_format(): this will be called in start(). It should only be written if the FSC file is not in the specified format 
237 1 Dmitry Lyumkis
* (optional) def parseFileForRunParameters(): This is a BACKUP. It parses the output files created by the refinement to determine the parameters that have been specified. It is only needed if the parameters were not found in the .pickle created during the job setup. 
238 15 Dmitry Lyumkis
239 30 Dmitry Lyumkis
h2. Appion parameter format
240 15 Dmitry Lyumkis
241
In order to utilize the base class ReconUploader.py to upload all parameters associated with the refinement the following files must exist: 
242
243 24 Dmitry Lyumkis
# an "FSC file":http://emg.nysbc.org/attachments/964/recon_11jul18z_it001_vol001.fsc. Lines that are not read should begin with a "#". Otherwise, the first column must have values in inverse pixels. The second column must have the Fourier shell correlation for that spatial frequency. You can have as many additional columns as you would like, but they will be skipped by ReconUploader.py
244 28 Dmitry Lyumkis
# .img/.hed files describing projections from the model and class averages belonging to those Euler angles. The format is as follows: image 1 - projection 1, image 2 - class average 1, image 3 - projection 2, image 4 - class average 2, etc., see below !projections_and_averages.png!
245 15 Dmitry Lyumkis
# the 3D volume in mrc format
246 29 Dmitry Lyumkis
# a text file describing the parameters for each particle. NOTE: PARTICLE NUMBERING STARTS WITH 1, NOT 0. An "example file":http://emg.nysbc.org/attachments/963/particle_data_11jul18z_it001_vol001.txt is attached. The columns are as follows:
247 23 Dmitry Lyumkis
## particle number - starts with 1!!!
248
## phi Euler angle - rotation Euler angle around Z, in degrees
249
## theta Euler angle - rotation Euler angle around new Y, in degrees
250
## omega Euler angle - rotation Euler angle around new Z (in-plane rotation), in degrees
251
## shiftx - in pixels
252
## shifty - in pixels
253 16 Dmitry Lyumkis
## mirror - specify 1 if particle is mirrored, 0 otherwise. If mirrors are NOT handled in the package, and are represented by different Euler angles, leave as 0
254
## 3D reference # - 1, 2, 3, etc. Use 1 for single-model refinement case
255
## 2D class # - the number of the class to which the particle belongs. Leave as 0 if these are not defined
256 22 Dmitry Lyumkis
## quality factor - leave as 0 if not defined 
257 16 Dmitry Lyumkis
## kept particle - specifies whether or not the particle was discarded during the reconstruction routine. If it was KEPT, specify 1, if it was DISCARDED, specify 0. If all particles are kept, all should have a 1. 
258
## post Refine kept particle (optional) - in most cases just leave as 1 for all particles