Documentum Scanner
Introduction
The Documentum Scanner extracts objects such as files, folders, relations, etc. from a source Documentum repository and saves this data to migration-center for further processing. As a change in migration-center 3.2, the Documentum Scanner and Importer, are no longer tied to one another – data scanned with the Documentum Scanner can now be imported by any other importer, including of course the Documentum. Starting with version 3.2.9 objects derived from dm_sysobject are supported.
Scanner is the term used in migration-center for an input adapter. It is used to read the data that needs processing into migration-center and is the first step in a migration project, thus scan also refers to the process used to input data to migration-center in general.
A scanner works as a job that can be run at any time and can be executed repeatedly. For every run a detailed history and log file are created.
A scanner is defined by a unique name, a set of configuration parameters and an optional description.
Documentum Scanners can be created, configured, started and monitored through migration-center Client but the corresponding processes are executed by migration-center Job Server.
Supported Documentum Content Server versions
The Documentum Scanner currently supports Documentum Content Server versions 4i, 5.2.5 to 20.4, including service packs.
For accessing a Documentum repository Documentum Foundation Classes 5.3 or newer is required. Any combinations of DFC versions and Content Server versions supported by EMC Documentum are also supported by migration-center’s Documentum Scanner, but it is recommended to use the DFC version matching the version of the Content Server being scanned. The DFC must be installed and configured on every machine where migration-center Server Components is deployed.
For scanning a Documentum 4i or 5.2.5 source repository, DFC version 5.3 must be used since newer DFC versions do not support accessing older Documentum repositories properly. At the same time, migration-center does not support DFC versions older than 5.3, therefore DFC 5.3 is the only option in this case.
DFC (Documentum Foundation Classes) configuration
Starting from version 3.9 of migration-center additional configurations need to be made for the Documentum adapter to be able to locate Documentum Foundation Classes. This is done by modifying the dfc.conf file, located in the Job Server installation folder.
There are two settings inside the file that by default match the paths of a standard DFC install. One needs to have the path for the config folder of DFC and the other needs the path to the dctm.jar.
See below example:
wrapper.java.classpath.dfcConfig=C:/Documentum/config
wrapper.java.classpath.dfcDctmJar=C:/Program Files/Documentum/dctm.jar
The dfcConfig
parameter must point to the configuration folder. The dfcDctmJar
parameter must point to the dctm.jar file!
Information on folder migration
When scanning Documentum documents, their folder paths are also scanned, and the folder structure can be automatically re-created by migration-center in the target system. This procedure will not keep any of the metadata attached to the folder objects, such as owners, permissions, specific object types, or any custom attributes. Depending on project requirements, it may be required to do a “folder-only migration” first, e.g. for migrating a complete folder structure including custom folder object types, permissions and other attributes first, and then populate this folder structure with documents afterwards. In order to execute a folder-only migration the following steps should be performed to configure the migration process accordingly:
Scanner: on the scanner’s |Parameters| tab check the exportFolderStructure option. Set scanFolderPaths (mandatory in that case) and excludeFolderPaths (if any, optional) leave the parameter documentTypes empty to scan only the folder structure without the documents; list document types as well if both folders and documents should be scanned; now only folders will be scanned without any documents they may contain. Note: Scanning folders is not possible via the dqlString option in the scanner.
Migration set: When creating a new migration set choose a <source type to target type>(folder) object type. Now only the scanner runs containing folder objects will be displayed on the |Filescan Selection| tab. Note that the number of objects contained in the displayed scanner runs now indicates folders and not documents, which is why the number on display (folders) can be different from the total number of objects processed by the scan (if it contains other types of objects besides folders, such as documents).
Folder migration is important. It is necessary to take the approach described above when migrating folder structures with complex folder objects containing custom object types, permissions, attributes, relations, etc. This information will be lost if exportFolderStructure is not selected during scan. If the exportFolderStructure parameter was not set during a scan, it is of course possible to re-run the scan again after setting this option, or to copy/create a new scanner and scan the missing folder information with that one.
Supported Documentum features
Versions
Versions (and branches) are supported by the Documentum Scanner, including custom version labels. The exportVersions parameter in the scanner’s configuration parameters determines if all versions (checked) or only current versions of documents (not checked, default setting) are scanned.
It is important to know that a consistency check of the version tree is performed by migration-center before scanning. A version tree containing invalid or missing references will not be exported at all and the operation will be reported as an error in the scanner’s log. It is not possible for migration-center to process or repair such a version structure because of the missing references.
For documents with versions, the version label extracted by the scanner from the Documentum attribute r_version_label can be changed by the means of the transformation rules during processing. The version structure (i.e. the ordering of the objects relative to their antecedents) cannot be changed using migration-center.
If objects are scanned with the exportVersions option checked, all versions must be imported as well since each object references its antecedent, going back to the very first version. Therefore, it is advised not to drop the versions of an object between the scan and the import processes since this will most likely generate inconsistencies and errors. If an object is intended to be migrated without versions (i.e. only the current version of the object needs to be migrated), then the affected objects should be scanned without enabling the exportVersions option.
The number of latest versions to be scanned can be limited through the exportLatestVersions parameter. See more about using this parameter in Documentum scanner parameters.
Scanning large version trees
Processing a version tree is based on a recursive algorithm, which implies that all objects which are part of a version tree must be loaded into memory together. This can be problematic with very large version trees (several thousand versions). By default, the Documentum Scanner can load and process version trees up to around 2,000 versions in size. For even larger version trees to be processed the Java heap size for the Job Server must be increased according to the following steps:
Stop the Job Server
Open the wrapper.conf file located in the migration-center Server Components installation folder (by default it is %programfiles%\fme AG\migration-center Server Components <Version>)
Search for
# Java Additional Parameters # Increase the value of this parameter it if your documentum scanner needs # to scan a large number of versions per document. Alocate 256k for every # 1000 versions/document.
Edit the line
wrapper.java.additional.1=-Xss512k
, incrementing the default 512k by 256k for every additional 1,000 versions mc should be able to process. E.g. for enabling processing of version trees containing up to 4,000 versions (2,000+1,000+1,000 versions), set the value to 1024k (512k+256k+256k)Save the file
Start the Job Server
Primary content and renditions
The scanner exports the primary content of all documents unless the skipContent or exportStoragePath are checked. The locations where the content was exported can be seen in the column Content location and in the source attribute mc_content_location,If a primary content has multiple pages, the column Content location stores the location where the page 0 was exported since mc_content_location stores all locations of all pages.
If skipContent is checked the primary content and the renditions of the document will not be exported. So, the documents will be exported as content less objects.
If exportStoragePath is checked the primary content and the renditions of the document will not be exported to staging area. Instead the content related attributes (content_location, mc_content_location, dctm_obj_rendition) will be set with the full path of the content in the repository file store.
Renditions are supported by the Documentum Scanner. The “exportRenditions” parameter in the scanner’s configuration parameters determines if renditions are scanned. Renditions of an object will not count as individual objects, since they are different instances of content belonging to one and the same object. The scanner extracts rendition’s contents, format, page modifiers page numberand storage location used. This information is exposed to the user via migration-center source objects attributes starting with dctm_obj_rendition* in any documents migration set that has Documentum or FirstDoc as source system.
Documentum 4i does not have the page modifier attribute/feature for renditions, therefore such information will not be extracted from a Documentum 4i repository.
Document paths
The scanner collects the full folder paths where the document is linked and add them to the following source attributes:
dctm_obj_link - stores the first path of every folder where the document is linked. If the folder itself is linked on multiple other folders, only the first path of the folder is extracted.
dctm_obj_al_links - stores all paths (r_folder_path) of all folders where the document is linked.
Relations
Relations are supported by the Documentum Scanner. The option named exportRelations in the scanner’s configuration determines if they are scanned and added to the migration-center database. Information about them cannot be altered using transformation. Migration-center will manage relations automatically if the appropriate options in the scanner and importer have been selected. They will always be connected to their parent object and can be viewed in migration-center by right-clicking on an object in any view of a migration set and selecting <View Relations> from the context menu. The resulting dialog will list all relations of the selected object with their associated metadata, such as relation name, child object, etc.
IMPORTANT: The children of the scanned relations are not scanned automatically if they are not in the scope of the scanner. The user must ensure the documents and folders that are children in the scanned relations are included in the scope of the scanner (they are linked under the scanned path or they are returned by dqlString).
migration-center’s Documentum Scanner supports relations between folders and/or documents only (i.e. “dm_folder” and “dm_document” objects, as well as their respective subtypes). “dm_subscription” type objects, for example, although supports relations from a technical point of view, will be ignored by the scanner because they are relations involving a “dm_user” object. Custom relation objects (i.e. relation-type objects which are subtypes of “dm_relation”) are also supported, including any custom attributes they may have. The restrictions mentioned above regarding the types of objects connected by a relation also apply to custom relation objects.
Export relations as renditions
As an alternative to scanning relations as they are in Documentum, the scanner offers the possibility to scan the child related documents as renditions of the parent document. For that, the parameter “exportRelationsAsRendtions” should be checked. This requires “scanRelations” to be checked as well. You can filter the relations that will be scanned as renditions by setting the relation names in the parameter “relationsAsRenditionNames”. If this is not set, all relations to documents will be processed as renditions.
Virtual Documents
Documentum Virtual Documents are supported by the Documentum Importer. The option named exportVirtualDocs in the configuration of the scanner determines if virtual documents are scanned and exported to migration-center.
There is a second parameter related to virtual documents, named maintainVirtualDocsIntegrity. This option will allow the scanner to include children of VDs which may be outside the scope of the scanner (paths to scan or dqlString) in order to maintain the integrity of the VD. If this parameter is disabled, any children in the VD that are out of scope (they are not linked under the scanned path or they are not returned by dqlString) will not be scanned and the VD may be incomplete.
The VD binding information (dmr_containment objects) are always scanned and attached to the root object of a VD regardless of the maintainVirtualDocsIntegrity option. In this way it is possible to scan any missing child objects later on and still be able to restore the correct VD structure based on the information stored with the root object.
The exportVDVersions options allows exporting only the latest version of the VD documents. This option applies only to virtual documents since the exportVersions option applies only to normal documents.
The exportVersions option needs to be checked for scanning Virtual Documents (i.e. if the exportVirtualDocuments option is checked) even if the virtual documents themselves do not have multiple versions, otherwise the virtual documents export might produce unexpected results. This is because the VD parents may still reference child objects that are not current versions of those respective objects. This is not an actual product limitation, but rather an issue caused by this particular combination of scanner options and Documentum’s VD features, which rely on information related to versioning.
The Snapshot feature of virtual documents is not supported by migration-center.
Audit Trails
The Documentum Scanner also supports audit trail entries for documents and folders. To enable scanning audit trails, the scanner parameter exportAuditTrail must be checked; in this case the audit trail entries of all documents and folders within the scope of the scan will be scanned as Documentum(audittrail) type objects, similarly to Documentum(document) or Documentum(folder) type objects.
There are some additional parameters used for fine tuning the selection and type of audit trail entries the scanner should consider:
auditTrailType – is the Documentum audit trail object type. By default, this is dm_audittrail but custom audit trail types (derived from dm_audittrail) are supported as well
auditTrailSelection – is used for narrowing down the selection of audit trail records since the number of audit trails can grow large especially in old systems, but not necessarily all audit trail entries may be relevant for a migration to a new system. This option accepts a DQL conformant WHERE clause as would be used in a SELECT statement. If this returns no results, all audit trail objects of scanned documents and folders will be scanned. Example 1: event_name in ('dm_save', 'dm_checkin') Example 2: event_name = 'dm_checkin' and time_stamp >= DATE('01.01.2012', 'DD.MM.YYYY')
auditTrailIgnoreAttributes – contains a comma separated list of dm_audittrail attributes the scanner should ignore. Again, this option can be used to eliminate audit trail information that is not needed for migration right from the scan.
Export audit trail as renditions
Because there are target systems that don’t allow importing audit trail objects, Documentum scanner allows exporting audit trail objects to PDF renditions of the scanned documents. Exporting audit trails objects as PDF renditions applies only to documents.
The following scanner parameters are used for applying this feature:
exportAuditTrailAsRendition – when checked the audit trail entries are written in a PDF files that are saved as renditions for the documents. This parameter can be checked only when exportAuditTrail is checked and skipContent is not checked. If not checked the audit trail entries are exported as Documentum(audittrail).
auditTrailPerVersionTree – this apply only when exportAuditTrailsAsRendition is checked. When it is checked one PDF is generated for all audit trail entries of all versions of the document. The audit trails entries related to the deleted versions are exported as well. The rendition is assigned to the latest version in the tree. When not checked, one PDF rendition is generated for every version in the tree. In this case the audit trails entries related to the deleted versions are not exported because those versions are not exported by the scanner since they don’t exist anymore in the repository.
Exporting audit trail per version tree may have a big impact on the scanner performance. That’s because audit trail entries for documents are queried by the attribute dm_audittrail.chronicle_id. The performance might be dramatically improved by adding an index in the underlying table DM_AUDITTRAIL_S for the column CHRONICLE_ID.
Aspects
Scanning aspects is supported with the latest update of the migration-center Documentum Scanner. Attributes resulting from aspects are scanned automatically for any document or folder type object within scope of the scan.
The notation used by the Documentum Scanner to identify attributes which result from aspects appended to the objects being scanned is the same as used by Documentum, namely <aspect_name>.<aspect_attribute>.
Any number of aspects per document/folder, as well as any number of attributes per aspect are supported.
After a scan has finished, attributes scanned from aspects are available for further processing just like any other source attribute and can be used normally in any transformation rule.
Aspects are supported only for document and folder type objects!
PDF Annotations
Starting with version 3.2.8 Update 2 of migration-center the possibility of scanning PDF annotations has been added to the Documentum Scanner. When activating “exportAnnotations” the scanner will scan the related “dm_note” objects together with DM_ANNOTATE relations. The “dm_note” objects are scanned as normal objects since the DM_ANNOTATE relations are exported as MC relation having the relation_type = “DctmAnnotationRelation”.
During delta migration, the scanner is able to identify the annotation changes and scan them accordingly.
Comments
Scanning the documents and folders related comments is possible and can be activated (default is deactivated) by changing the scanner parameter “exportComments” to true.
The best-known use case for documents and folders comments is within xCP (xCelerated Composition Platform) application, but it can also be used in custom WDK Documentum solutions.
The comment objects will be scanned as MC relation objects and can be seen in the MC client by opening the relations view of a scanned object. They will have the value of RELATION_TYPE as “CommentRelation”. All comment related attributes that have values will be scanned as attributes of these relations.
For performance reason, when a document has more versions, the comment relations will be attached only to the first document version that had comments (since all document versions share the same comments).
During delta migration, the scanner is able to identify comment changes, based on modifications of “i_vstamp” so it will rescan the corresponding document with all its comments (the first version document that had comments – see paragraph above) even if the document did not change.
To be able to scan the document’s comments it is necessary that the DFC used to have a valid Global Registry configured, because the adapter is using the “CommentManager” BOF service to read them.
Migrating Updates (“Update” or “Delta” Migration)
Objects that have changed in the source system since the last scan are scanned as update objects. Whether an object in a migration set is an update or not can be seen by checking the value of the Is_update column – if it’s 1, the current object is an update to a previously scanned object (the base object). There are some things to consider when working with the update migration feature:
Updated objects are detected based on the r_modify_date and i_vstamp attributes. If one of these attributes has changed, the object itself is considered to have changed and will be scanned and added as an update. Typically any action performed in Documentum changes at least one if not both of these attributes, offering a reliable way to detect whether an object has changed since the last scan or not; on the other hand, objects changed by third party code/applications without touching these attributes might not be detected by migration-center as having changed.
Objects deleted from the source after having been migrated are not detected and will not be deleted in the target system. This is by design (due to the added overhead, complexity and risk involved in deleting customer data).
Updates/changes to primary content, renditions, metadata, VD structures, and relations of objects will be detected and updated accordingly.
Documentum Scanner Properties
To create a new Documentum Scanner job, specify the respective adapter type in the Scanner Properties window – from the list of available adapters “Documentum” must be selected. Once the adapter type has been selected, the Parameters list will be populated with the parameters specific to the selected adapter type.
The Properties window of a scanner can be accessed by double-clicking a scanner in the list or selecting the Properties button/menu item from the toolbar/context menu.
A detailed description is always displayed at the bottom of the window for the currently selected parameter.
Common scanner parameters
Documentum scanner parameters
When editing a field in the |Parameters| tab (like loggingLevel i.e.), a description/help/hint appears in the lower part of the window.
DQLString Limitations
There are some limitations and best practices regarding dqlString parameter:
The query must return only r_object_id and therefore it must start with "select r_object_id" or "select distinct r_object_id"
"order by" clause and "union" is not allowed in the query
The query should return only ids of documents (strings starting with "09..")
The query should return only the current version of the document (check "exportVersions" for exporting all versions)
For better performance, the query should return distinct values for r_object_id
Log files
A complete history is available for any Documentum Scanner job from the respective items’ History window. It is accessible through the History button/menu entry on the toolbar/context menu. The list of all runs for the selected job together with additional information, such as the number of processed objects, the starting time, the ending time and the status are displayed in a grid format.
Double clicking an entry or clicking the Open button on the toolbar opens the log file created by that run. The log file contains more information about the run of the selected job:
Version information of the migration-center Server Components the job was run with
The parameters the job was run with
Execution Summary that contains the total number of objects processed, the number of documents and folders scanned or imported, the count of warnings and errors that occurred during runtime.
Log files generated by the Documentum Scanner can be found in the chosen “logs” folder at the installation of the Server Components of the machine where the job was run, e.g. …\fme AG\migration-center Server Components <Version>\logs\Dctm-Scanner
the amount of information written to the log files depends on the setting specified in the loggingLevel start parameter for the respective job.
Last updated