Domino Scanner
Introduction
Scanner is the term used in migration-center for an input connector. Using the IBM Domino scanner module to read the data that needs processing into migration-center is the first step in a migration project, thus “scan” also refers to the process used to input data to migration-center.
The IBM Domino Scanner is available since migration-center 3.2.5. It extracts documents, metadata and attachments from IBM Domino/Notes applications and use them as input for migration-center. After the scan the generated data can be processed and migrated to other systems supported by the various migration-center importers.
The currently supported formats of the documents export are Domino XML (dxl), Hypertext Markup Language (html), ARPA Internet Text Message (rfc 822/eml) and HTML from the EML. In addition, the scanner is capable of generating a Portable Document Format (pdf) rendition based on a DXL file of that document.
The IBM Domino Scanner currently supports all IBM Notes/Domino versions 9.x and above. Documents from applications that have been built with older IBM Notes/Domino versions can be extracted without any limitation.
The module works as a job that can be run at any time and can even be executed repeatedly. For every run a detailed history and log file are created.
A Scanner is defined by a unique name, a set of configuration parameters and an optional description.
IBM Domino scanners can be created, configured, started and monitored through migration-center client, but the corresponding processes are executed by migration-center Job Server.
Installation
Prerequisites
32 bit vs 64 bit
The scanner is available in 32 bit and 64 bit versions. Each version has different prerequisites and limitations. Both versions require additional software installed on the migration-center Jobserver.
The 32-bit scanner requires:
Microsoft Windows 32-bit or 64-bit
Java 1.8.x (32-bit)
IBM Notes 9.0.1 or later
Microsoft Visual C++ 2017 Redistributable Package (x86) Can be downloaded from: https://aka.ms/vs/16/release/vc_redist.x86.exe
The 64-bit scanner requires:
Microsoft Windows 64-bit
Java 1.8.x (64-bit)
IBM Domino 9.0.1 or later
Microsoft Visual C++ 2017 Redistributable Package (x64) Can be downloaded from: https://aka.ms/vs/16/release/vc_redist.x64.exe
Because the 64-bit version uses the IBM Domino software the scanner can currently not generate any formats other than DXL and PDF
Additional features
If you are scanning Domino documents containing Object Linking and Embedding (OLE) objects, Apache OpenOffice 4.1.5 or later must be installed. See section Exporting OLE objects.
For transforming the documents into PDFs an additional PDF Generation Module needs to be installed on a second system which acts as a Rendition Server. See section Generating PDF renditions.
The PDF Generation Module is licensed separately
Installing the software
Regardless of using the 32-bit or 64-bit scanner, the installation steps are the same. All the steps should be performed on the Jobserver machine where the Domino scanner will be run.
Install IBM Notes and/or IBM Domino software
Add the folder path of the software's executables in the PATH environment variable
Install the appropriate Microsoft Visual C++ 2017 Redistributable Package
Install the migration-center Jobserver. See Installation guide
Locate the mc-domino-scanner_windows-x86-x64_[ver].exe installer in the Domino package
Run the installer using Run As Admin
Set the install location to the .../lib/mc-domino-scanner folder of the Jobserver
Start the Migration Center Jobserver Service
Switching between 32 bit and 64 bit
By default the Jobserver is configured to work with the 32 bit version of Domino Scanner.
In order to use the 64 bit version you need to change the following lines having x86
to x64
in the wrapper.conf:
wrapper.java.additional.4=-Djava.library.path=./lib/mc-dctm-adaptor;./lib/mc-outlook-adaptor;./lib/mc-domino-scanner/lib/x86;${path_env}
wrapper.app.env.path=./lib/mc-domino-scanner/lib/x86;${path_env}
And also change the java used by the jobserver to 64 bit by changing the JAVA_HOME
or JRE_HOME
environment variable and re-installing the Jobserver service.
Timezone settings
IBM Domino stores all date and time information based on GMT/UTC internally. When a datetime value is converted into text for display purposes, the value is always displayed using the client’s current timezone settings.
Therefore the timezone settings on the migration-center Jobserver will be used to convert values of datetime attributes.
If you require date and time values to be scanned based on a specific timezone set migration-center Jobserver’s timezone accordingly.
If you require “normalized” date and time values in migration-center, set the migration-center Jobserver’s timezone to GMT/UTC.
Exporting objects from an IBM Domino/Notes application
The IBM Domino Scanner connects to a specified IBM Domino/Notes application and can extract documents, content of richtext fields (composite items), metadata and attachments out of this application based on user-defined criteria. See chapter IBM Domino Scanner parameters below for more information about the features and configuration parameters available in the IBM Domino Scanner.
After a scan has completed, the newly scanned documents along with their metadata, attachments and the content of the richtext fields they contain are available for further processing in migration-center.
IBM Domino Scanner properties
To create a new IBM Domino Scanner job, specify the respective adapter type in the scanner properties window – from the list of available connectors, “Domino” must be selected. Once the adapter type has been selected, the list of parameters will be populated with the parameters specific to the selected adapter type.
The properties window of a scanner can be accessed by double-clicking a scanner in the list or by selecting the [Properties] button for the corresponding selected entry on the toolbar or context menu.
Common scanner parameters
IBM Domino Scanner parameters
Document Formats
The IBM Domino Scanner for fme migration-center supports the generation of different output formats for a Domino document. Each of the formats has its advantages and disadvantages. Which one best suits your needs can be determined by closely looking at your requirements, i.e. how users should work with the documents once migration into the new target system has been completed.
The formats currently supported will be described in detail in the following sections.
The .MSG and eml2HTML formats require an additional license for creating.
Domino XML (DXL)
The Domino XML (DXL) format is an XML format that has been defined by IBM/Lotus. It has been around for a while (at least since Domino version 6). A DXL file represents an entire Domino document including all its metadata, richtext elements and attachments.
The generation of DXL files from Domino documents relies on core functionality of Domino’s C-API as provided by IBM.
DXL files can be used to extract any document information from Domino applications. Based on special helper applications that are not part of Domino/Notes, a DXL file can be re-imported back into the original Domino application in order to read its content or otherwise work with the document at a later point in time.
DXL is especially useful whenever Domino documents should be transformed into PDF. The “PDF Generation Module” which is available as an add-on product for the IBM Domino Scanner makes use of the DXL format for PDF generation.
ARPA Internet Text Message (RFC 822/EML)
The ARPA Internet Text Message format (RFC 822) describes the syntax for messages that are sent among computer users (e-mail). The EML file format adheres to RFC 822.
Any Domino document – not only e-mails – can be transformed into EML format based on core functionality of Domino’s C-API as provided by IBM. An EML file contains the document’s content, its metadata as well as its attachments.
The EML format does not guarantee preservation of the Domino document’s integrity. Information from the document maybe lost or changed during conversion into EML (see Domino C-API documentation).
The major benefit of EML is that – since version 8 of Notes – an EML file can be opened in Notes again without the need for special helper applications.
Hypertext Markup Language (HTML)
Hypertext Markup Language (HTML) files can be generated for Domino documents based on two different approaches both of which will now be described.
Hypertext Markup Language (HTML) – direct approach
The Domino C-API offers the ability to directly transform a domino document into an HTML file.
As with the EML file format, the direct HTML generation based on the Domino C-API has some issues regarding completeness of information. One example are images that had been embedded into richtext fields. Those images will not be visible in the HTML file created.
EML to Hypertext Markup Language (EML2HTML) – indirect approach
Besides the direct approach described in the previous section, HTML can also be created from the EML format.
In most scenarios that the Domino scanner has been tested on, the result of the indirect approach had a much higher quality than that of the direct approach.
Generating EML2HTML requires a third party library that needs to be purchased separately. Please contact your fme sales representative for details.
Microsoft Message Format (MSG)
The MSG format is the format that is used by Microsoft Outlook to store e-mails on the filesystem. It’s a container format that includes the e-mail and all its attachments.
Generating MSG requires a third-party library that needs to be purchased separately. Please contact your fme sales representative for details.
Richtext format (RTF)
The Domino scanner can extract the entire Domino document (not just the document’s richtext fields) as a single RTF file. This functionality is provided by the Domino C-API.
Portable Document Format (PDF/PDF/a-1a/PDF/a-1b)
Based on the add-on “PDF Generation Module” (see Exporting OLE objects), the Domino scanner is capable of generating PDF, PDF/a-1a or PDF/a-1b files for any type of Domino document – independent of the application it originates from.
All the PDF formats preserve the Domino document in a read-only form that looks like the document opened in Notes.
The PDF generation module takes care of collapsible sections, fixed-width images and tables and other Domino specific features that PDF printing might interfere with.
If required, all the Domino document’s attachments can be re-attached to the PDF file that was generated (see parameter “embedAttachmentsIntoPDF”). Thereby, the entire e-mail will be preserved in a read-only format that can be viewed anywhere at any time requiring a standard PDF reader only.
Exporting OLE objects
If the IBM Domino documents contain OLE embedded objects, Apache OpenOffice 4.1.5 or later must be installed and configured on the migration-center job server in order to properly extract the OLE objects.
Install Apache OpenOffice 4.1.5 on the migration-center job server.
Add the folder containing the “soffice.exe” file to the system’s search path. This folder is typically:
<Apache OpenOffice installation folder>/program
Add the following entry to the file “wrapper.conf” inside the migration-center server components installation folder and replace <#> with an appropriate value for your installation:
wrapper.java.classpath.<#>=<Apache OpenOffice installation folder>/program/classes/*.jar
Open the configuration file „documentDirectoryRuntimeConfiguration.xml“ located in subfolder „lib/mc-domino-scanner/conf“ of your migration-center server components‘ installation folder in your favorite editor for XML files.
Go to line 83 of the file which looks like:
<parameter name="exportOLEObjects">false</parameter>
and replace “false” with “true”.
The entry inside the configuration file should look like:
<parameter name="exportOLEObjects">true</parameter>
If you want to use a different port for the Apache OpenOffice server than the default port (8100), go to line 84 of the file:
<!--<parameter name="apacheOpenOfficePort">8100</parameter>-->
Uncomment it and and replace “8100” with the portnumber to use, e.g “1234”.
The entry inside the configuration file should look like:
<parameter name="apacheOpenOfficePort">1234</parameter>
Save the configuration file.
Generating PDF renditions
While PDF generation can be activated in the scanner’s configuration (parameters “primaryDocumentFormat”, “secondaryDocumentFormats” and “embedAttachmentsIntoPDF”), the setup of PDF generation requires and the additional “PDF Generation Module”.
From a technical perspective, the “PDF Generation Module” requires an additional system (“rendition server”). This system will be used to print any IBM Notes document using a PDF printer driver based on IBM Notes’ standard print functionality. The process for PDF generation is as follows:
The scanner submits a request to create a PDF rendition for an existing Domino document or a DXL file to PDF Generation Module on the rendition server.
PDF Generation Module creates a PDF rendition of the document.
If PDF generation was successful, PDF Generation Module will save the PDF to a shared network drive.
PDF Generation Module will signal success or failure to the scanner.
Setting up the rendition server requires additional configurative actions. For each IBM Domino application/database template that was used to create documents, an empty database needs to be created based on this template and either made available locally on the rendition server or on the IBM Domino server.
Each of these empty databases needs to be prepared for PDF printing. As necessary configuration steps vary depending on the application that is being worked on, they cannot be described here.
Please contact your fme representative should you wish to implement PDF generation for migration of an IBM Domino application/database.
Log files
A complete history is available for any IBM Domino Scanner job from the respective item’s history window. It is accessible through the [History] button/menu entry on the toolbar/context menu.
The History window displays a list of all runs for the selected job together with additional information, such as the number of processed objects, the start and ending time and the status.
Double clicking an entry or clicking the [Open] button on the toolbar opens the log file created by that run. The log file contains more information about the run of the selected job:
version information of the migration-center Server Components the job was run with
the parameters the job was run with
the execution summary that contains the total number of objects processed, the number of documents and folders scanned or imported, the count of warnings and errors that occurred during runtime
Log files generated by the IBM Domino Scanner can be found in the server components installation folder of the machine where the job was run, e.g. …\fme AG\migration-center Server Components <Version>\logs
The amount of information written to the log files depends on the setting specified in the ‘loggingLevel’ start parameter for the respective job.
Troubleshooting
Common errors
Here are causes and solutions for some common errors when trying to setup the Domino Scanner:
DominoBackendJNI is missing dependent libraries; please check the installation prerequisites!
This error message means you need to do one of the following things if not done correctly:
- The IBM Notes or IBM Domino install path was not correctly to the PATH variable
- Install the correct VC++ Redistributable package
- Reinstall the Jobserver Windows Service using UninstallWinNTService.bat
and InstallWinNTService.bat
Known Issues
The following issues with the MC Domino Scanner are known to exist and will be fixed in later releases:
The scanner requires that the temporary directory for the user running MC Job Server Service exists and that the user can write to this directory. If the directory does either not exist or the user does not have write permission to the directory, the creation of temporary files during document and attachment extraction will fail. The logfile will show error messages like
„INFO | jvm 1 | 2014/10/02 12:06:26 | 12:06:26,850 ERROR [Job 1351] com.think_e_solutions.application.documentdirectory… - java.io.IOException: The system cannot find the path specified“.
To work around this issue, make sure the temporary folder exists and the user has write permission for this folder. If the MC Job Server is started manually as a normal user then the Temp folder should be C:\Users\Username\AppData\Local\Temp. Therefore, if the MC Job Server is run as a service by the Local System account, the folder is one of the following:
For the 32-bit version of Windows:
C:\Windows\System32\config\systemprofile\AppData\Local\Temp
For the 64-bit version of Windows:
C:\Windows\SysWOW64\config\systemprofile\AppData\Local\Temp
If a document is exported from IBM Domino but the related entries in the mc database cannot be created (e.g. because an attribute’s value exceeds the maximum number of characters allowed for a field in the mc database), the related files can be found in the filesystem (inside the export directory). If this document is scanned again, it will be treated as a new document, not as an update.
If the scanner parameter “relationType” is set to “relation”, relations will be automatically deleted by migration-center if they do not exist anymore. If the scanner parameter “relationType” is set to “object”, objects representing relationships cannot be deleted if the relation is invalidated.
Example: If a document had one attachment when scanned in scanner run #1 and that attachment was removed from the document before scanner run #2, the scanner cannot remove the object representing the “attachment” relation between document and attachment (created in scanner run #1) in scanner run #2.
If a PDF rendition is requested and DXLUtility receives the request to generate the rendition but isn’t able to import the DXL file into the appropriate IBM Domino database on the rendition server, it’s likely that the shared folder used to transfer DXL and PDF files between the scanner and PDF Generation Module cannot be read by the user running PDF Generation Module on the rendition server.
The scanner will crash the Java VM if the parameter “exportCompositeItems” is set to “true” and the log level in log4j.xml (located in subdirectory “conf” of the scanner installation directory) is set to “ERROR”.
The 64-bit version of the scanner relies on IBM Domino. As Domino lacks the required libraries to export “EML”, “HTML” or “RTF”, the 64-bit version of the scanner cannot export documents in any other format than “DXL” or “PDF”. If other formats are required, the scanner’s 32-bit version needs to be run based on IBM Notes instead.
Domino attribute types
The following table lists all (relevant) Domino attribute types.
The scanner parameter “excludedAttributeTypes” is a logical “OR” of all types that should be excluded from the scan.