Domino Scanner

Introduction

Scanner is the term used in migration-center for an input adapter. Using the IBM Domino scanner module to read the data that needs processing into migration-center is the first step in a migration project, thus “scan” also refers to the process used to input data to migration-center.

The IBM Domino Scanner is available since migration-center 3.2.5. It extracts documents, metadata and attachments from IBM Domino/Notes applications and use them as input for migration-center. After the scan the generated data can be processed and migrated to other systems supported by the various migration-center importers.

The currently supported formats of the documents export are Domino XML (dxl), Hypertext Markup Language (html), ARPA Internet Text Message (rfc 822/eml) and HTML from the EML. In addition, the scanner is capable of generating a Portable Document Format (pdf) rendition based on a DXL file of that document.

The IBM Domino Scanner currently supports all IBM Notes/Domino versions 6.x and above. Documents from applications that have been built with older IBM Notes/Domino versions can be extracted without any limitation.

The module works as a job that can be run at any time and can even be executed repeatedly. For every run a detailed history and log file are created.

A Scanner is defined by a unique name, a set of configuration parameters and an optional description.

IBM Domino scanners can be created, configured, started and monitored through migration-center client, but the corresponding processes are executed by migration-center Job Server.

Installation

Prerequisites

To be able to use the IBM Domino scanner additional software must be installed on the migration-center job server that executes IBM Domino scanner operations.

The scanner is available in a 64-bit and 32-bit version, each of which has different requirements.

The 32-bit version of the scanner relies on IBM Notes whereas the 64-bit version of the scanner uses IBM Domino. Because IBM Domino lacks some libraries used by the scanner to generate specific document formats (e.g. “EML”, “HTML” and “RTF”) the 64-bit version of the scanner can currently not generate any other formats than “DXL” and “PDF”.

The 32-bit scanner requires:

  • Microsoft Windows based 32-bit/64-bit Operating System.

  • Java Runtime Environment (JRE) 1.8.x (32-bit)

  • IBM Notes 9.0.1 or later

  • IBM Notes 9.0.1 or later must be installed on the migration-center job server. Please refer to chapter “9. IBM Notes/Domino installation and configuration” for detailed instructions about installing and configuring IBM Notes.

  • Microsoft Visual C++ 2017 Redistributable Package (x86) The Microsoft Visual C++ 2017 Redistributable Package can be downloaded from: https://aka.ms/vs/16/release/vc_redist.x86.exe

The 64-bit scanner requires:

  • Microsoft Windows based 64-bit Operating System.

  • Java Runtime Environment (JRE) 1.8.x (64-bit)

  • IBM Domino 9.0.1 or later IBM Domino version 9.0.1 must be installed on the migration-center job server. The community version may be used. Please refer to chapter “9. IBM Notes/Domino installation and configuration” for detailed instructions about installing and configuring IBM Domino.

  • Microsoft Visual C++ 2017 Redistributable Package (x64) The Microsoft Visual C++ 2017 Redistributable Package can be downloaded from: https://aka.ms/vs/16/release/vc_redist.x86.exe

If the scanner shall be used for Domino documents containing Object Linking and Embedding (OLE) objects, Apache OpenOffice 4.1.5 or later must be installed. Please refer to the section “Exporting OLE objects” for details.

If documents extracted from IBM Domino/Notes applications should be transformed into PDF (PDF, PDF/a-1a, PDF/a-1b) by the scanner, a second system, a “rendition server” is required. The rendition server must have the optional PDF Generation Module installed. For details about setting up a rendition server based on PDF Generation Module refer to the PDF Generation Module manual.

Timezone settings

IBM Domino stores all date and time information based on GMT/UTC internally.

If a date and time value is converted into a text value for display purposes in an IBM Domino API based software solution, the date and time value is always displayed using the client’s current timezone settings.

As the scanner is an IBM Domino API based software product, the migration-center job server’s timezone setting will be used to extract all date and time values from IBM Domino documents, i.e. they will be available in the migration-center database and always be related to the migration-center Job Server’s timezone.

If you require date and time values to be shown based on a specific timezone inside the migration-center database, set migration-center Job Server’s timezone accordingly.

If you require “normalized” date and time values in migration-center, set the migration-center Job Server’s timezone to GMT/UTC.

Running the installer

If the installer is run separately from the migration-center installer, it must be run with administrative privileges. If it’s run as a normal user, the installer cannot update configuration files and set environment variables as required.

Exporting objects from an IBM Domino/Notes application

The IBM Domino Scanner connects to a specified IBM Domino/Notes application and can extract documents, content of richtext fields (composite items), metadata and attachments out of this application based on user-defined criteria. See chapter IBM Domino Scanner parameters below for more information about the features and configuration parameters available in the IBM Domino Scanner.

After a scan has completed, the newly scanned documents along with their metadata, attachments and the content of the richtext fields they contain are available for further processing in migration-center.

IBM Domino Scanner properties

To create a new IBM Domino Scanner job, specify the respective adapter type in the scanner properties window – from the list of available adapters, “Domino” must be selected. Once the adapter type has been selected, the list of parameters will be populated with the parameters specific to the selected adapter type.

The properties window of a scanner can be accessed by double-clicking a scanner in the list or by selecting the [Properties] button for the corresponding selected entry on the toolbar or context menu.

Common scanner parameters

IBM Domino Scanner parameters

Document Formats

The IBM Domino Scanner for fme migration-center supports the generation of different output formats for a Domino document. Each of the formats has its advantages and disadvantages. Which one best suits your needs can be determined by closely looking at your requirements, i.e. how users should work with the documents once migration into the new target system has been completed.

The formats currently supported will be described in detail in the following sections.

The .MSG and eml2HTML formats require an additional license for creating.

Domino XML (DXL)

The Domino XML (DXL) format is an XML format that has been defined by IBM/Lotus. It has been around for a while (at least since Domino version 6). A DXL file represents an entire Domino document including all its metadata, richtext elements and attachments.

The generation of DXL files from Domino documents relies on core functionality of Domino’s C-API as provided by IBM.

DXL files can be used to extract any document information from Domino applications. Based on special helper applications that are not part of Domino/Notes, a DXL file can be re-imported back into the original Domino application in order to read its content or otherwise work with the document at a later point in time.

DXL is especially useful whenever Domino documents should be transformed into PDF. The “PDF Generation Module” which is available as an add-on product for the IBM Domino Scanner makes use of the DXL format for PDF generation.

ARPA Internet Text Message (RFC 822/EML)

The ARPA Internet Text Message format (RFC 822) describes the syntax for messages that are sent among computer users (e-mail). The EML file format adheres to RFC 822.

Any Domino document – not only e-mails – can be transformed into EML format based on core functionality of Domino’s C-API as provided by IBM. An EML file contains the document’s content, its metadata as well as its attachments.

The EML format does not guarantee preservation of the Domino document’s integrity. Information from the document maybe lost or changed during conversion into EML (see Domino C-API documentation).

The major benefit of EML is that – since version 8 of Notes – an EML file can be opened in Notes again without the need for special helper applications.

Hypertext Markup Language (HTML)

Hypertext Markup Language (HTML) files can be generated for Domino documents based on two different approaches both of which will now be described.

Hypertext Markup Language (HTML) – direct approach

The Domino C-API offers the ability to directly transform a domino document into an HTML file.

As with the EML file format, the direct HTML generation based on the Domino C-API has some issues regarding completeness of information. One example are images that had been embedded into richtext fields. Those images will not be visible in the HTML file created.

EML to Hypertext Markup Language (EML2HTML) – indirect approach

Besides the direct approach described in the previous section, HTML can also be created from the EML format.

In most scenarios that the Domino scanner has been tested on, the result of the indirect approach had a much higher quality than that of the direct approach.

Generating EML2HTML requires a third party library that needs to be purchased separately. Please contact your fme sales representative for details.

Microsoft Message Format (MSG)

The MSG format is the format that is used by Microsoft Outlook to store e-mails on the filesystem. It’s a container format that includes the e-mail and all its attachments.

Generating MSG requires a third-party library that needs to be purchased separately. Please contact your fme sales representative for details.

Richtext format (RTF)

The Domino scanner can extract the entire Domino document (not just the document’s richtext fields) as a single RTF file. This functionality is provided by the Domino C-API.

Portable Document Format (PDF/PDF/a-1a/PDF/a-1b)

Based on the add-on “PDF Generation Module” (see Exporting OLE objects), the Domino scanner is capable of generating PDF, PDF/a-1a or PDF/a-1b files for any type of Domino document – independent of the application it originates from.

All the PDF formats preserve the Domino document in a read-only form that looks like the document opened in Notes.

The PDF generation module takes care of collapsible sections, fixed-width images and tables and other Domino specific features that PDF printing might interfere with.

If required, all the Domino document’s attachments can be re-attached to the PDF file that was generated (see parameter “embedAttachmentsIntoPDF”). Thereby, the entire e-mail will be preserved in a read-only format that can be viewed anywhere at any time requiring a standard PDF reader only.

Exporting OLE objects

If the IBM Domino documents contain OLE embedded objects, Apache OpenOffice 4.1.5 or later must be installed and configured on the migration-center job server in order to properly extract the OLE objects.

Install Apache OpenOffice 4.1.5 on the migration-center job server.

Add the folder containing the “soffice.exe” file to the system’s search path. This folder is typically:

<Apache OpenOffice installation folder>/program

Add the following entry to the file “wrapper.conf” inside the migration-center server components installation folder and replace <#> with an appropriate value for your installation:

wrapper.java.classpath.<#>=<Apache OpenOffice installation folder>/program/classes/*.jar

Open the configuration file „documentDirectoryRuntimeConfiguration.xml“ located in subfolder „lib/mc-domino-scanner/conf“ of your migration-center server components‘ installation folder in your favorite editor for XML files.

Go to line 83 of the file which looks like:

<parameter name="exportOLEObjects">false</parameter>

and replace “false” with “true”.

The entry inside the configuration file should look like:

<parameter name="exportOLEObjects">true</parameter>

If you want to use a different port for the Apache OpenOffice server than the default port (8100), go to line 84 of the file:

<!--<parameter name="apacheOpenOfficePort">8100</parameter>-->

Uncomment it and and replace “8100” with the portnumber to use, e.g “1234”.

The entry inside the configuration file should look like:

<parameter name="apacheOpenOfficePort">1234</parameter>

Save the configuration file.

Generating PDF renditions

While PDF generation can be activated in the scanner’s configuration (parameters “primaryDocumentFormat”, “secondaryDocumentFormats” and “embedAttachmentsIntoPDF”), the setup of PDF generation requires and the additional “PDF Generation Module”.

The “PDF Generation Module” is licensed separately.

From a technical perspective, the “PDF Generation Module” requires an additional system (“rendition server”). This system will be used to print any IBM Notes document using a PDF printer driver based on IBM Notes’ standard print functionality. The process for PDF generation is as follows:

  1. The scanner submits a request to create a PDF rendition for an existing Domino document or a DXL file to PDF Generation Module on the rendition server.

  2. PDF Generation Module creates a PDF rendition of the document.

  3. If PDF generation was successful, PDF Generation Module will save the PDF to a shared network drive.

  4. PDF Generation Module will signal success or failure to the scanner.

Setting up the rendition server requires additional configurative actions. For each IBM Domino application/database template that was used to create documents, an empty database needs to be created based on this template and either made available locally on the rendition server or on the IBM Domino server.

Each of these empty databases needs to be prepared for PDF printing. As necessary configuration steps vary depending on the application that is being worked on, they cannot be described here.

Please contact your fme representative should you wish to implement PDF generation for migration of an IBM Domino application/database.

Log files

A complete history is available for any IBM Domino Scanner job from the respective item’s history window. It is accessible through the [History] button/menu entry on the toolbar/context menu.

The History window displays a list of all runs for the selected job together with additional information, such as the number of processed objects, the start and ending time and the status.

Double clicking an entry or clicking the [Open] button on the toolbar opens the log file created by that run. The log file contains more information about the run of the selected job:

  • version information of the migration-center Server Components the job was run with

  • the parameters the job was run with

  • the execution summary that contains the total number of objects processed, the number of documents and folders scanned or imported, the count of warnings and errors that occurred during runtime

Log files generated by the IBM Domino Scanner can be found in the server components installation folder of the machine where the job was run, e.g. …\fme AG\migration-center Server Components <Version>\logs

The amount of information written to the log files depends on the setting specified in the ‘loggingLevel’ start parameter for the respective job.

Known Issues

The following issues with the MC Domino Scanner are known to exist and will be fixed in later releases:

  • The scanner requires that the temporary directory for the user running MC Job Server Service exists and that the user can write to this directory. If the directory does either not exist or the user does not have write permission to the directory, the creation of temporary files during document and attachment extraction will fail. The logfile will show error messages like

„INFO | jvm 1 | 2014/10/02 12:06:26 | 12:06:26,850 ERROR [Job 1351] com.think_e_solutions.application.documentdirectory… - java.io.IOException: The system cannot find the path specified“.

To work around this issue, make sure the temporary folder exists and the user has write permission for this folder. If the MC Job Server is started manually as a normal user then the Temp folder should be C:\Users\Username\AppData\Local\Temp. Therefore, if the MC Job Server is run as a service by the Local System account, the folder is one of the following:

For the 32-bit version of Windows:

C:\Windows\System32\config\systemprofile\AppData\Local\Temp

For the 64-bit version of Windows:

C:\Windows\SysWOW64\config\systemprofile\AppData\Local\Temp

  • If a document is exported from IBM Domino but the related entries in the mc database cannot be created (e.g. because an attribute’s value exceeds the maximum number of characters allowed for a field in the mc database), the related files can be found in the filesystem (inside the export directory). If this document is scanned again, it will be treated as a new document, not as an update.

  • If the scanner parameter “relationType” is set to “relation”, relations will be automatically deleted by migration-center if they do not exist anymore. If the scanner parameter “relationType” is set to “object”, objects representing relationships cannot be deleted if the relation is invalidated.

Example: If a document had one attachment when scanned in scanner run #1 and that attachment was removed from the document before scanner run #2, the scanner cannot remove the object representing the “attachment” relation between document and attachment (created in scanner run #1) in scanner run #2.

  • If a PDF rendition is requested and DXLUtility receives the request to generate the rendition but isn’t able to import the DXL file into the appropriate IBM Domino database on the rendition server, it’s likely that the shared folder used to transfer DXL and PDF files between the scanner and PDF Generation Module cannot be read by the user running PDF Generation Module on the rendition server.

  • The scanner will crash the Java VM if the parameter “exportCompositeItems” is set to “true” and the log level in log4j.xml (located in subdirectory “conf” of the scanner installation directory) is set to “ERROR”.

  • The 64-bit version of the scanner relies on IBM Domino. As Domino lacks the required libraries to export “EML”, “HTML” or “RTF”, the 64-bit version of the scanner cannot export documents in any other format than “DXL” or “PDF”. If other formats are required, the scanner’s 32-bit version needs to be run based on IBM Notes instead.

Domino attribute types

The following table lists all (relevant) Domino attribute types.

The scanner parameter “excludedAttributeTypes” is a logical “OR” of all types that should be excluded from the scan.

Last updated