Task #9127
closed
Remove no longer files stored in Jackrabbit before migration to postgres
Added by Costantino Perciante almost 8 years ago.
Updated almost 8 years ago.
Assignee:
Costantino Perciante
Infrastructure:
Production
Description
Before MongoDB (the actual storage used behind the workspace), files were stored into jackrabbit itself (through the DataStore facility). Some GBs of no longer used files need to be freed up. In order to do so:
- a script must be executed to replace the payload (if any) of file nodes with an empty payload (a background job that can be executed with workspace up and running);
- the DataStore Garbage Collector must be executed (it needs the workspace down, so we need to schedule its execution properly and extimate how long it takes to finish)
We tested once this operations onto a snapshot of the current content of jackrabbit in production and we were able to free up 25GB of files. However, we need to better extimate the time needed for the whole task to finish.
It is important because will let us migrate the datastore into postgres (no need of shared file system if we plan to have replicated jcr instances) and speedup the migration phase
- Description updated (diff)
- Status changed from New to In Progress
- % Done changed from 0 to 30
We managed to free up more or less the same space in a couple of hours on a snapshotted workspace. Valentina's script can run with jcr up and running and then the datastore garbage collector will take 2 hours more or less to actually remove data.
We can schedule its execution during the downtime needed for gcube 4.6 upgrade, can't we?
Moreover, a backup of production workspace is needed just before we proceed. Last but not least, the datastore will be migrated on postgres as well.
Costantino Perciante wrote:
We managed to free up more or less the same space in a couple of hours on a snapshotted workspace. Valentina's script can run with jcr up and running and then the datastore garbage collector will take 2 hours more or less to actually remove data.
If we can run it on the production workspace, why we have to wait fro gcube 4.6 rollout?
Can it e used to clean up the current version of the workspace?
Pasquale Pagano wrote:
If we can run it on the production workspace, why we have to wait fro gcube 4.6 rollout?
Can it e used to clean up the current version of the workspace?
Unfortunately the datastore garbage collector needs jackrabbit down (tomcat/home library must be down during its execution). This means that Valentina's script can run when we want and with jackrabbit up, but then we need to shut it down for the collector.
- Status changed from In Progress to Closed
- Assignee changed from Valentina Marioli to Costantino Perciante
- % Done changed from 30 to 100
This activity has been successfully concluded on friday 7/07/17. 28GB of useless data has been removed.
Also available in: Atom
PDF