Feature #2915
opena way to archive old leginon session database by project so that they can be removed from the production one
0%
Description
leginondata is getting too big. The queies are slow.
Updated by Anchi Cheng about 10 years ago
- Subject changed from archiving old leginon session database by project so that they can be removed from the working one to a way to archive old leginon session database by project so that they can be removed from the production one
r18540 is done in myami-dbcopy branch. This feature is not possible in the main branch.
Sargis, if you want to look at the code, you should only need the sinedon subpackage.
I will upload an example on how I use it later.
Updated by Anchi Cheng about 10 years ago
r18544 adds the script for archiving a project by project id.
To use:
1. Create a database read/writable by the mysql user for leginon to import the data to.
2. Add a module [importdata] in sinedon.cfg and define the archive database in 1. as its database.
3. Make sure myami-dbcopy/sinedon will be used when the python file archive_project.py is run.
4. run the script in myami/dbschema like this, the project id in this case is 1.
python archive_project.py 1
Updated by Sargis Dallakyan about 10 years ago
Thank you Anchi. I tried this on one of the projects and I got the following error:
In [34]: %run /home/sargis/ami-workspace/myami-trunk/dbschema/archive_project.py 231 --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /usr/lib/python2.7/site-packages/IPython/utils/py3compat.pyc in execfile(fname, *where) 176 else: 177 filename = fname --> 178 __builtin__.execfile(filename, *where) /home/sargis/ami-workspace/myami-trunk/dbschema/archive_project.py in <module>() 708 709 checkSinedon() --> 710 archiveProject(projectid) /home/sargis/ami-workspace/myami-trunk/dbschema/archive_project.py in archiveProject(projectid) 688 p = projectdata.projects().direct_query(projectid) 689 source_sessions = projectdata.projectexperiments(project=p).query() --> 690 session_names = map((lambda x:x['session']['name']),source_sessions) 691 session_names.reverse() #oldest first 692 for session_name in session_names: /home/sargis/ami-workspace/myami-trunk/dbschema/archive_project.py in <lambda>(x) 688 p = projectdata.projects().direct_query(projectid) 689 source_sessions = projectdata.projectexperiments(project=p).query() --> 690 session_names = map((lambda x:x['session']['name']),source_sessions) 691 session_names.reverse() #oldest first 692 for session_name in session_names: TypeError: 'NoneType' object has no attribute '__getitem__'
Working on debuging this...
Updated by Anchi Cheng about 10 years ago
Sargis
r18552 fixed the bug that caused the error and also improved the speed a lot by using sinedon timelimit query rather than timestamp comparison in python which had to compare all results since python iteration is slow.
The cause of the bug was that some early imported calibrations have no session field (default to None).
Updated by Sargis Dallakyan about 10 years ago
Thank you Anchi. This time it did a good progress and I'm now getting this result:
[sargis@dewey ~]$ python /home/sargis/ami-workspace/myami-trunk/dbschema/archive_project.py 231 ['09may25a', '09may25b', '09may25c', '09jul11a'] ****Session 09may25a **** Importing session.... Importing instrument.... Importing calibrations.... Importing C2ApertureSize.... number of images in the session = 278 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . imported 278 images Importing image ddinfo.... Importing image stats.... Importing queuing.... Importing mosaic tiles.... Importing dequeuing.... Importing drift.... Importing focus results.... importing TransformManager Settings.... Traceback (most recent call last): File "/home/sargis/ami-workspace/myami-trunk/dbschema/archive_project.py", line 742, in <module> archiveProject(projectid) File "/home/sargis/ami-workspace/myami-trunk/dbschema/archive_project.py", line 726, in archiveProject app.run() File "/home/sargis/ami-workspace/myami-trunk/dbschema/archive_project.py", line 710, in run self.runStep3() File "/home/sargis/ami-workspace/myami-trunk/dbschema/archive_project.py", line 696, in runStep3 self.importSettings() File "/home/sargis/ami-workspace/myami-trunk/dbschema/archive_project.py", line 623, in importSettings self.importSettingsByClassAndAlias(allalias) File "/home/sargis/ami-workspace/myami-trunk/dbschema/archive_project.py", line 590, in importSettingsByClassAndAlias results = self.researchSettings(settingsname,name=node_name) File "/home/sargis/ami-workspace/myami-trunk/dbschema/archive_project.py", line 188, in researchSettings r2 = q.query() File "/home/sargis/ami-workspace/myami-dbcopy/sinedon/data.py", line 429, in query results = db.query(self, **kwargs) File "/home/sargis/ami-workspace/myami-dbcopy/sinedon/dbdatakeeper.py", line 99, in query result = self._query(*args, **kwargs) File "/home/sargis/ami-workspace/myami-dbcopy/sinedon/dbdatakeeper.py", line 118, in _query result = self.dbd.multipleQueries(queryinfo, readimages=readimages) File "/home/sargis/ami-workspace/myami-dbcopy/sinedon/sqldict.py", line 250, in multipleQueries return _multipleQueries(self.db, queryinfo, readimages) File "/home/sargis/ami-workspace/myami-dbcopy/sinedon/sqldict.py", line 525, in __init__ self.queries = setQueries(queryinfo) File "/home/sargis/ami-workspace/myami-dbcopy/sinedon/sqldict.py", line 983, in setQueries query = queryFormatOptimized(queryinfo,value['alias']) File "/home/sargis/ami-workspace/myami-dbcopy/sinedon/sqldict.py", line 1020, in queryFormatOptimized joinTable = queryinfo[id] KeyError: 41493216
Will look into this...
Updated by Anchi Cheng about 10 years ago
- Status changed from Assigned to In Test
- Assignee changed from Anchi Cheng to Sargis Dallakyan
r18561 fix this problem. Sinedon thinks "source_session" is in the destination_dbname even though the original db is source_dbname.
Work around that by redo the query for source_session in get.
Updated by Anchi Cheng about 10 years ago
r18645, r18646, r 18647 creates a working pipeline
Usage¶
Using any myami >= 3.2¶
- initArchiveTables.php is similar to initTablesReport.php and is used to initialize the tables in the archive databases and will then make myamiweb works even if the data is missing i n the archive.
- set databases in myamiweb/config.php to the archive databases
- goto this page in http://your_myamiweb/setup/initArchiveTables.php
- dbschema/tools/copy_refimages.py that makes local copy of the references not in the imaging session.
Using non-dbcopy myami¶
- archive_initialize.py that makes sure that administrator and project owner and those users sessions are shared to are included in the archive
- This produces archive_users_1.cfg which will be used in the latter part.
Using dbcopy myami to run archiverun.py from dbcopy myami to do the real work.¶
- edit the file to give it the right databases and path to bot the dbcopy version and non-dbcopy version of myami and self.projectids in the function autoSet so it can switch as needed.
- archive_leginondb.py is renamed from priviously archive_project.py and is used to archive leginon database
- archive_projectdb.py archives project database
- archive_activate.py force auto_increment to all sinedon table `DEF_id`.
Updated by Anchi Cheng almost 10 years ago
- Related to Bug #2963: archive project database by project added
Updated by Anchi Cheng almost 10 years ago
r18698 and r18799 gives more options to leginon archiving