Comment on page
Scanner is the term used in migration-center for an input connector. Using the IBM Domino scanner module to read the data that needs processing into migration-center is the first step in a migration project, thus “scan” also refers to the process used to input data to migration-center.
The IBM Domino Scanner is available since migration-center 3.2.5. It extracts documents, metadata and attachments from IBM Domino/Notes applications and use them as input for migration-center. After the scan the generated data can be processed and migrated to other systems supported by the various migration-center importers.
The currently supported formats of the documents export are Domino XML (dxl), Hypertext Markup Language (html), ARPA Internet Text Message (rfc 822/eml) and HTML from the EML. In addition, the scanner is capable of generating a Portable Document Format (pdf) rendition based on a DXL file of that document.
The IBM Domino Scanner currently supports all IBM Notes/Domino versions 9.x and above. Documents from applications that have been built with older IBM Notes/Domino versions can be extracted without any limitation.
The module works as a job that can be run at any time and can even be executed repeatedly. For every run a detailed history and log file are created.
A Scanner is defined by a unique name, a set of configuration parameters and an optional description.
IBM Domino scanners can be created, configured, started and monitored through migration-center client, but the corresponding processes are executed by migration-center Job Server.
The scanner is available in 32 bit and 64 bit versions. Each version has different prerequisites and limitations. Both versions require additional software installed on the migration-center Jobserver.
The 32-bit scanner requires:
- Microsoft Windows 32-bit or 64-bit
- Java 1.8.x (32-bit)
- IBM Notes 9.0.1 or later
The 64-bit scanner requires:
Because the 64-bit version uses the IBM Domino software the scanner can currently not generate any formats other than DXL and PDF
The PDF Generation Module is licensed separately
Regardless of using the 32-bit or 64-bit scanner, the installation steps are the same. All the steps should be performed on the Jobserver machine where the Domino scanner will be run.
- 1.Install IBM Notes and/or IBM Domino software
- 2.Add the folder path of the software's executables in the PATH environment variable
- 3.Install the appropriate Microsoft Visual C++ 2017 Redistributable Package
- 5.Locate the mc-domino-scanner_windows-x86-x64_[ver].exe installer in the Domino package
- 6.Run the installer using Run As Admin
- 7.Set the install location to the .../lib/mc-domino-scanner folder of the Jobserver
- 8.Start the Migration Center Jobserver Service
By default the Jobserver is configured to work with the 32 bit version of Domino Scanner.
In order to use the 64 bit version you need to change the following lines having
x64in the wrapper.conf:
And also change the java used by the jobserver to 64 bit by changing the
JRE_HOMEenvironment variable and re-installing the Jobserver service.
IBM Domino stores all date and time information based on GMT/UTC internally. When a datetime value is converted into text for display purposes, the value is always displayed using the client’s current timezone settings.
Therefore the timezone settings on the migration-center Jobserver will be used to convert values of datetime attributes.
If you require date and time values to be scanned based on a specific timezone set migration-center Jobserver’s timezone accordingly.
If you require “normalized” date and time values in migration-center, set the migration-center Jobserver’s timezone to GMT/UTC.
The IBM Domino Scanner connects to a specified IBM Domino/Notes application and can extract documents, content of richtext fields (composite items), metadata and attachments out of this application based on user-defined criteria. See chapter IBM Domino Scanner parameters below for more information about the features and configuration parameters available in the IBM Domino Scanner.
After a scan has completed, the newly scanned documents along with their metadata, attachments and the content of the richtext fields they contain are available for further processing in migration-center.
To create a new IBM Domino Scanner job, specify the respective adapter type in the scanner properties window – from the list of available connectors, “Domino” must be selected. Once the adapter type has been selected, the list of parameters will be populated with the parameters specific to the selected adapter type.
The properties window of a scanner can be accessed by double-clicking a scanner in the list or by selecting the [Properties] button for the corresponding selected entry on the toolbar or context menu.
The IBM Domino Scanner for fme migration-center supports the generation of different output formats for a Domino document. Each of the formats has its advantages and disadvantages. Which one best suits your needs can be determined by closely looking at your requirements, i.e. how users should work with the documents once migration into the new target system has been completed.
The formats currently supported will be described in detail in the following sections.
The .MSG and eml2HTML formats require an additional license for creating.
The Domino XML (DXL) format is an XML format that has been defined by IBM/Lotus. It has been around for a while (at least since Domino version 6). A DXL file represents an entire Domino document including all its metadata, richtext elements and attachments.
The generation of DXL files from Domino documents relies on core functionality of Domino’s C-API as provided by IBM.
DXL files can be used to extract any document information from Domino applications. Based on special helper applications that are not part of Domino/Notes, a DXL file can be re-imported back into the original Domino application in order to read its content or otherwise work with the document at a later point in time.
DXL is especially useful whenever Domino documents should be transformed into PDF. The “PDF Generation Module” which is available as an add-on product for the IBM Domino Scanner makes use of the DXL format for PDF generation.
The ARPA Internet Text Message format (RFC 822) describes the syntax for messages that are sent among computer users (e-mail). The EML file format adheres to RFC 822.
Any Domino document – not only e-mails – can be transformed into EML format based on core functionality of Domino’s C-API as provided by IBM. An EML file contains the document’s content, its metadata as well as its attachments.
The EML format does not guarantee preservation of the Domino document’s integrity. Information from the document maybe lost or changed during conversion into EML (see Domino C-API documentation).
The major benefit of EML is that – since version 8 of Notes – an EML file can be opened in Notes again without the need for special helper applications.
Hypertext Markup Language (HTML) files can be generated for Domino documents based on two different approaches both of which will now be described.
Hypertext Markup Language (HTML) – direct approach
The Domino C-API offers the ability to directly transform a domino document into an HTML file.
As with the EML file format, the direct HTML generation based on the Domino C-API has some issues regarding completeness of information. One example are images that had been embedded into richtext fields. Those images will not be visible in the HTML file created.
EML to Hypertext Markup Language (EML2HTML) – indirect approach
Besides the direct approach described in the previous section, HTML can also be created from the EML format.
In most scenarios that the Domino scanner has been tested on, the result of the indirect approach had a much higher quality than that of the direct approach.
Generating EML2HTML requires a third party library that needs to be purchased separately. Please contact your fme sales representative for details.
The MSG format is the format that is used by Microsoft Outlook to store e-mails on the filesystem. It’s a container format that includes the e-mail and all its attachments.
Generating MSG requires a third-party library that needs to be purchased separately. Please contact your fme sales representative for details.
The Domino scanner can extract the entire Domino document (not just the document’s richtext fields) as a single RTF file. This functionality is provided by the Domino C-API.
Based on the add-on “PDF Generation Module” (see Exporting OLE objects), the Domino scanner is capable of generating PDF, PDF/a-1a or PDF/a-1b files for any type of Domino document – independent of the application it originates from.
All the PDF formats preserve the Domino document in a read-only form that looks like the document opened in Notes.
The PDF generation module takes care of collapsible sections, fixed-width images and tables and other Domino specific features that PDF printing might interfere with.
If required, all the Domino document’s attachments can be re-attached to the PDF file that was generated (see parameter “embedAttachmentsIntoPDF”). Thereby, the entire e-mail will be preserved in a read-only format that can be viewed anywhere at any time requiring a standard PDF reader only.
If the IBM Domino documents contain OLE embedded objects, Apache OpenOffice 4.1.5 or later must be installed and configured on the migration-center job server in order to properly extract the OLE objects.
Install Apache OpenOffice 4.1.5 on the migration-center job server.
Add the folder containing the “soffice.exe” file to the system’s search path. This folder is typically:
<Apache OpenOffice installation folder>/program
Add the following entry to the file “wrapper.conf” inside the migration-center server components installation folder and replace <#> with an appropriate value for your installation:
wrapper.java.classpath.<#>=<Apache OpenOffice installation folder>/program/classes/*.jar
Open the configuration file „documentDirectoryRuntimeConfiguration.xml“ located in subfolder „lib/mc-domino-scanner/conf“ of your migration-center server components‘ installation folder in your favorite editor for XML files.
Go to line 83 of the file which looks like:
and replace “false” with “true”.
The entry inside the configuration file should look like:
If you want to use a different port for the Apache OpenOffice server than the default port (8100), go to line 84 of the file:
Uncomment it and and replace “8100” with the portnumber to use, e.g “1234”.
The entry inside the configuration file should look like:
Save the configuration file.
While PDF generation can be activated in the scanner’s configuration (parameters “primaryDocumentFormat”, “secondaryDocumentFormats” and “embedAttachmentsIntoPDF”), the setup of PDF generation requires and the additional “PDF Generation Module”.
From a technical perspective, the “PDF Generation Module” requires an additional system (“rendition server”). This system will be used to print any IBM Notes document using a PDF printer driver based on IBM Notes’ standard print functionality. The process for PDF generation is as follows:
- 1.The scanner submits a request to create a PDF rendition for an existing Domino document or a DXL file to PDF Generation Module on the rendition server.
- 2.PDF Generation Module creates a PDF rendition of the document.
- 3.If PDF generation was successful, PDF Generation Module will save the PDF to a shared network drive.
- 4.PDF Generation Module will signal success or failure to the scanner.
Setting up the rendition server requires additional configurative actions. For each IBM Domino application/database template that was used to create documents, an empty database needs to be created based on this template and either made available locally on the rendition server or on the IBM Domino server.
Each of these empty databases needs to be prepared for PDF printing. As necessary configuration steps vary depending on the application that is being worked on, they cannot be described here.
Please contact your fme representative should you wish to implement PDF generation for migration of an IBM Domino application/database.
A complete history is available for any IBM Domino Scanner job from the respective item’s history window. It is accessible through the [History] button/menu entry on the toolbar/context menu.
The History window displays a list of all runs for the selected job together with additional information, such as the number of processed objects, the start and ending time and the status.
Double clicking an entry or clicking the [Open] button on the toolbar opens the log file created by that run. The log file contains more information about the run of the selected job:
- version information of the migration-center Server Components the job was run with
- the parameters the job was run with
- the execution summary that contains the total number of objects processed, the number of documents and folders scanned or imported, the count of warnings and errors that occurred during runtime
Log files generated by the IBM Domino Scanner can be found in the server components installation folder of the machine where the job was run, e.g. …\fme AG\migration-center Server Components <Version>\logs
The amount of information written to the log files depends on the setting specified in the ‘loggingLevel’ start parameter for the respective job.
Here are causes and solutions for some common errors when trying to setup the Domino Scanner:
DominoBackendJNI is missing dependent libraries; please check the installation prerequisites!
This error message means you need to do one of the following things if not done correctly: - The IBM Notes or IBM Domino install path was not correctly to the PATH variable - Install the correct VC++ Redistributable package - Reinstall the Jobserver Windows Service using
The following issues with the MC Domino Scanner are known to exist and will be fixed in later releases:
- The scanner requires that the temporary directory for the user running MC Job Server Service exists and that the user can write to this directory. If the directory does either not exist or the user does not have write permission to the directory, the creation of temporary files during document and attachment extraction will fail. The logfile will show error messages like
„INFO | jvm 1 | 2014/10/02 12:06:26 | 12:06:26,850 ERROR [Job 1351] com.think_e_solutions.application.documentdirectory… - java.io.IOException: The system cannot find the path specified“.
To work around this issue, make sure the temporary folder exists and the user has write permission for this folder. If the MC Job Server is started manually as a normal user then the Temp folder should be C:\Users\Username\AppData\Local\Temp. Therefore, if the MC Job Server is run as a service by the Local System account, the folder is one of the following:
For the 32-bit version of Windows:
For the 64-bit version of Windows:
- If a document is exported from IBM Domino but the related entries in the mc database cannot be created (e.g. because an attribute’s value exceeds the maximum number of characters allowed for a field in the mc database), the related files can be found in the filesystem (inside the export directory). If this document is scanned again, it will be treated as a new document, not as an update.
- If the scanner parameter “relationType” is set to “relation”, relations will be automatically deleted by migration-center if they do not exist anymore. If the scanner parameter “relationType” is set to “object”, objects representing relationships cannot be deleted if the relation is invalidated.
Example: If a document had one attachment when scanned in scanner run #1 and that attachment was removed from the document before scanner run #2, the scanner cannot remove the object representing the “attachment” relation between document and attachment (created in scanner run #1) in scanner run #2.
- If a PDF rendition is requested and DXLUtility receives the request to generate the rendition but isn’t able to import the DXL file into the appropriate IBM Domino database on the rendition server, it’s likely that the shared folder used to transfer DXL and PDF files between the scanner and PDF Generation Module cannot be read by the user running PDF Generation Module on the rendition server.
- The scanner will crash the Java VM if the parameter “exportCompositeItems” is set to “true” and the log level in log4j.xml (located in subdirectory “conf” of the scanner installation directory) is set to “ERROR”.
- The 64-bit version of the scanner relies on IBM Domino. As Domino lacks the required libraries to export “EML”, “HTML” or “RTF”, the 64-bit version of the scanner cannot export documents in any other format than “DXL” or “PDF”. If other formats are required, the scanner’s 32-bit version needs to be run based on IBM Notes instead.
The following table lists all (relevant) Domino attribute types.
The scanner parameter “excludedAttributeTypes” is a logical “OR” of all types that should be excluded from the scan.