Domino Scanner
Introduction
Scanner is the term used in migration-center for an input adapter. Using the IBM Domino scanner module to read the data that needs processing into migration-center is the first step in a migration project, thus “scan” also refers to the process used to input data to migration-center.
The IBM Domino Scanner is available since migration-center 3.2.5. It extracts documents, metadata and attachments from IBM Domino/Notes applications and use them as input for migration-center. After the scan the generated data can be processed and migrated to other systems supported by the various migration-center importers.
The currently supported formats of the documents export are Domino XML (dxl), Hypertext Markup Language (html), ARPA Internet Text Message (rfc 822/eml) and HTML from the EML. In addition, the scanner is capable of generating a Portable Document Format (pdf) rendition based on a DXL file of that document.
The IBM Domino Scanner currently supports all IBM Notes/Domino versions 6.x and above. Documents from applications that have been built with older IBM Notes/Domino versions can be extracted without any limitation.
The module works as a job that can be run at any time and can even be executed repeatedly. For every run a detailed history and log file are created.
A Scanner is defined by a unique name, a set of configuration parameters and an optional description.
IBM Domino scanners can be created, configured, started and monitored through migration-center client, but the corresponding processes are executed by migration-center Job Server.
Installation
Prerequisites
To be able to use the IBM Domino scanner additional software must be installed on the migration-center job server that executes IBM Domino scanner operations.
The scanner is available in a 64-bit and 32-bit version, each of which has different requirements.
The 32-bit version of the scanner relies on IBM Notes whereas the 64-bit version of the scanner uses IBM Domino. Because IBM Domino lacks some libraries used by the scanner to generate specific document formats (e.g. “EML”, “HTML” and “RTF”) the 64-bit version of the scanner can currently not generate any other formats than “DXL” and “PDF”.
The 32-bit scanner requires:
Microsoft Windows based 32-bit/64-bit Operating System.
Java Runtime Environment (JRE) 1.8.x (32-bit)
IBM Notes 9.0.1 or later
IBM Notes 9.0.1 or later must be installed on the migration-center job server. Please refer to chapter “9. IBM Notes/Domino installation and configuration” for detailed instructions about installing and configuring IBM Notes.
Microsoft Visual C++ 2017 Redistributable Package (x86) The Microsoft Visual C++ 2017 Redistributable Package can be downloaded from: https://aka.ms/vs/16/release/vc_redist.x86.exe
The 64-bit scanner requires:
Microsoft Windows based 64-bit Operating System.
Java Runtime Environment (JRE) 1.8.x (64-bit)
IBM Domino 9.0.1 or later IBM Domino version 9.0.1 must be installed on the migration-center job server. The community version may be used. Please refer to chapter “9. IBM Notes/Domino installation and configuration” for detailed instructions about installing and configuring IBM Domino.
Microsoft Visual C++ 2017 Redistributable Package (x64) The Microsoft Visual C++ 2017 Redistributable Package can be downloaded from: https://aka.ms/vs/16/release/vc_redist.x86.exe
If the scanner shall be used for Domino documents containing Object Linking and Embedding (OLE) objects, Apache OpenOffice 4.1.5 or later must be installed. Please refer to the section “Exporting OLE objects” for details.
If documents extracted from IBM Domino/Notes applications should be transformed into PDF (PDF, PDF/a-1a, PDF/a-1b) by the scanner, a second system, a “rendition server” is required. The rendition server must have the optional PDF Generation Module installed. For details about setting up a rendition server based on PDF Generation Module refer to the PDF Generation Module manual.
Timezone settings
IBM Domino stores all date and time information based on GMT/UTC internally.
If a date and time value is converted into a text value for display purposes in an IBM Domino API based software solution, the date and time value is always displayed using the client’s current timezone settings.
As the scanner is an IBM Domino API based software product, the migration-center job server’s timezone setting will be used to extract all date and time values from IBM Domino documents, i.e. they will be available in the migration-center database and always be related to the migration-center Job Server’s timezone.
If you require date and time values to be shown based on a specific timezone inside the migration-center database, set migration-center Job Server’s timezone accordingly.
If you require “normalized” date and time values in migration-center, set the migration-center Job Server’s timezone to GMT/UTC.
Running the installer
If the installer is run separately from the migration-center installer, it must be run with administrative privileges. If it’s run as a normal user, the installer cannot update configuration files and set environment variables as required.
Exporting objects from an IBM Domino/Notes application
The IBM Domino Scanner connects to a specified IBM Domino/Notes application and can extract documents, content of richtext fields (composite items), metadata and attachments out of this application based on user-defined criteria. See chapter IBM Domino Scanner parameters below for more information about the features and configuration parameters available in the IBM Domino Scanner.
After a scan has completed, the newly scanned documents along with their metadata, attachments and the content of the richtext fields they contain are available for further processing in migration-center.
IBM Domino Scanner properties
To create a new IBM Domino Scanner job, specify the respective adapter type in the scanner properties window – from the list of available adapters, “Domino” must be selected. Once the adapter type has been selected, the list of parameters will be populated with the parameters specific to the selected adapter type.
The properties window of a scanner can be accessed by double-clicking a scanner in the list or by selecting the [Properties] button for the corresponding selected entry on the toolbar or context menu.
Common scanner parameters
Configuration parameters
Values
Name
Enter a unique name for this scanner
Mandatory
Adapter type
Select the “LotusDomino” adapter from the list of available adapters
Mandatory
Location
Select the job server location where this job should be run. Jobservers are defined in the Jobserver window. If no job server is selected, migration-center will prompt the user to define a job server location when saving the importer.
Mandatory
Description
Enter a description for this job (optional)
IBM Domino Scanner parameters
Configuration parameters
Values
dominoServer
The IBM Domino server used to connect to the application. If the application (”.nsf” file) is stored and accessed locally without using an IBM Domino server, leave this field empty.
dominoDatabase*
The filename of the “.nsf” file that holds the application’s documents. If the “.nsf” file is stored inside the IBM Domino/Notes data directory, the path of the “.nsf” file relative to the IBM Domino/Notes data directory is sufficient, otherwise specify the fully qualified filename of the “.nsf” file.
If PDF is used as either the primary format or one of the secondary formats and PDF is to be generated based on existing documents (s.a.), the value for “dominoServer” and “dominoDatabase” will be passed to PDF generation module. Therefore, the database filename should be specified relative to the IBM Domino/Notes data directory.
Mandatory
idFilename*
The filename of the ID file used to access the application.
This ID must have full permissions for all documents that are due to be scanned.
Mandatory
password
The password for the ID file referenced in parameter “idFilename”.
selectionFormula*
An IBM Notes formula to select the documents that should be processed by the scanner.
The default is “select @all” which will process all documents
Mandatory
profileName*
The name of the profile used to extract information out of the IBM Domino/Notes application.
The default value for this parameter “mcProfile” which will cause the scanner to process the application according to the the other scanner configuration parameters, e.g. extract document metadata, document contents and attachments etc.
By changing the value to ''mcStatistics'' the scanner will ignore most of the other scanner configuration parameters and - instead of processing each document (extract metadata, document contents and attachments) - generate a text file with statistical information about the application (forms, documents, attributes). The generated file will be placed inside the folder specified by scanner parameter “exportLocation” and named “<jobID>_statistics.txt”. The profile “mcStatistics” will not generate any objects in the migration-center database.
This parameter’s value must not be changed to any other value than “mcProfile” or “mcStatistics” unless a customized profile has been developed to fulfill specific customer needs.
Mandatory
primaryDocumentFormat*
The primary format used to extract the document. The resulting file will be treated as the primary document content in mc database. Valid values are “dxl”, “html”, “eml”, “eml2html” and “pdf”.
The default value is “dxl”.
The 64-bit version of the scanner can only generate “DXL” and “PDF”. Configuring any other format will cause the scanner to fail.
Mandatory
secondaryDocumentFormats
A list of all document formats that should be generated in addition to the primary document format (see “primaryDocumentFormat” above). Multiple values must be separated by “|” (pipe). Valid values are “dxl”, “html”, “eml”, “eml2html” and “pdf”.
The resulting files will be associated with the mc object as secondary formats. Their (fully-qualified) filenames are made available using the mc object’s “secondaryFormats” attribute which is a multi-value attribute.
The 64-bit version of the scanner can only generate “DXL” and “PDF”. Configuring any other format will cause the scanner to fail.
includeAttributes
A list of all document attributes (metadata) that should be extracted from the IBM Domino/Notes application and made available inside the MC database. If all attributes should be extracted, leave this field empty
excludedAttributeTypes
A filter specifying Domino data types that should not be exported from the IBM Domino/Notes application.
Default value is “1” which will exclude all composite items from being exported to the migration-center database
attributeSplitterMaxChunkSizeBytes
Large attribute values are split into chunks of max. bytes as specified with correct handling of multi-byte characters to avoid any SQL exceptions.
Migration-center uses Oracle’s “varchar2” datatype which has a
Maximum of 4,000 bytes.
exportCompositeItems*
Specifies whether composite items (i.e. richtext fields) contained in an IBM Domino/Notes document (e.g. an e-mail’s “Body” element) should be extracted from the document and made available as separate richtext files (RTF format). Valid values are “false” and “true” as well as “0” and “1”.
If this option is chosen, the scanner will generate one RTF file for each of an IBM Domino/Notes document’s composite items. The name of the file will be created as <document’s NoteID>_<item’s name>.rtf.
This option is especially useful if the document’s contents (typically contained in richtext fields) should be editable once the document has been migrated into the target system.
This feature is not supported with the 64-bit version of the scanner.
Mandatory
includedCompositeItems
A list of names of composite items in a document (e.g. “Body”) that should be extracted into separate richtext files. Multiple values must be separated by “|” (pipe), If all composite items should be extracted, leave this field empty.
If you want to exclude specific attributes, prefix each attribute name with a “!”.
It is not possible to mix include and exclude operations. If one composite item’s name in the list is prefixed with “!”, then only those composite item names starting with “!” will be considered and the corresponding items will be excluded
exportAttachments*
Specifies whether attachments contained in the IBM Domino/Notes documents should be extracted from the document in their native format and made available as separate MC objects. Valid values are “false” and “true” as well as “0” and “1”.
Mandatory
embedAttachmentsIntoPDF*
Determines whether the Domino documents’ attachments are extracted and embedded into a PDF rendition of the Domino document. If this parameter is set to true:
- all attachments will automatically be extracted from the document independent of “exportAttachments” parameter’s value,
- a PDF rendition will automatically be created even if it has not been requested according to the values of parameters “primaryDocumentFormat” or “secondaryDocumentFormats”.
Mandatory
embedLinksIntoPDF
If a PDF rendition is requested and this parameter is set to true, links (Domino document links and URL links) contained in the original Domino document will be added as bookmarks to the PDF file.
The default value is “false”.
exportLocation*
The location where the exported object content should be temporary saved. It can be a local folder on the machine that runs the job server or a shared folder on the network.
This folder must exist prior to launching the scanner and the MC user must have write permission for it. MC will not create this folder automatically. If the folder cannot be found, an appropriate error will be raised and logged.
This path must be accessible by both scanner and importer. Therefore, if scanner and importer are running on different machines, using a shared network folder is advisable.
Mandatory
loggingLevel*
Sets the verbosity of the log file.
Valid values are:
1 - logs only errors during scan
2 - is the default value reporting all warnings and errors
3 - logs all successfully performed operations in addition to any warnings or errors
4 - logs all events (for debugging only, use only if instructed by fme product support since it generates a very large amount of output. Do not use in production)
Mandatory
Document Formats
The IBM Domino Scanner for fme migration-center supports the generation of different output formats for a Domino document. Each of the formats has its advantages and disadvantages. Which one best suits your needs can be determined by closely looking at your requirements, i.e. how users should work with the documents once migration into the new target system has been completed.
The formats currently supported will be described in detail in the following sections.
The .MSG and eml2HTML formats require an additional license for creating.
Domino XML (DXL)
The Domino XML (DXL) format is an XML format that has been defined by IBM/Lotus. It has been around for a while (at least since Domino version 6). A DXL file represents an entire Domino document including all its metadata, richtext elements and attachments.
The generation of DXL files from Domino documents relies on core functionality of Domino’s C-API as provided by IBM.
DXL files can be used to extract any document information from Domino applications. Based on special helper applications that are not part of Domino/Notes, a DXL file can be re-imported back into the original Domino application in order to read its content or otherwise work with the document at a later point in time.
DXL is especially useful whenever Domino documents should be transformed into PDF. The “PDF Generation Module” which is available as an add-on product for the IBM Domino Scanner makes use of the DXL format for PDF generation.
ARPA Internet Text Message (RFC 822/EML)
The ARPA Internet Text Message format (RFC 822) describes the syntax for messages that are sent among computer users (e-mail). The EML file format adheres to RFC 822.
Any Domino document – not only e-mails – can be transformed into EML format based on core functionality of Domino’s C-API as provided by IBM. An EML file contains the document’s content, its metadata as well as its attachments.
The EML format does not guarantee preservation of the Domino document’s integrity. Information from the document maybe lost or changed during conversion into EML (see Domino C-API documentation).
The major benefit of EML is that – since version 8 of Notes – an EML file can be opened in Notes again without the need for special helper applications.
Hypertext Markup Language (HTML)
Hypertext Markup Language (HTML) files can be generated for Domino documents based on two different approaches both of which will now be described.
Hypertext Markup Language (HTML) – direct approach
The Domino C-API offers the ability to directly transform a domino document into an HTML file.
As with the EML file format, the direct HTML generation based on the Domino C-API has some issues regarding completeness of information. One example are images that had been embedded into richtext fields. Those images will not be visible in the HTML file created.
EML to Hypertext Markup Language (EML2HTML) – indirect approach
Besides the direct approach described in the previous section, HTML can also be created from the EML format.
In most scenarios that the Domino scanner has been tested on, the result of the indirect approach had a much higher quality than that of the direct approach.
Generating EML2HTML requires a third party library that needs to be purchased separately. Please contact your fme sales representative for details.
Microsoft Message Format (MSG)
The MSG format is the format that is used by Microsoft Outlook to store e-mails on the filesystem. It’s a container format that includes the e-mail and all its attachments.
Generating MSG requires a third-party library that needs to be purchased separately. Please contact your fme sales representative for details.
Richtext format (RTF)
The Domino scanner can extract the entire Domino document (not just the document’s richtext fields) as a single RTF file. This functionality is provided by the Domino C-API.
Portable Document Format (PDF/PDF/a-1a/PDF/a-1b)
Based on the add-on “PDF Generation Module” (see Exporting OLE objects), the Domino scanner is capable of generating PDF, PDF/a-1a or PDF/a-1b files for any type of Domino document – independent of the application it originates from.
All the PDF formats preserve the Domino document in a read-only form that looks like the document opened in Notes.
The PDF generation module takes care of collapsible sections, fixed-width images and tables and other Domino specific features that PDF printing might interfere with.
If required, all the Domino document’s attachments can be re-attached to the PDF file that was generated (see parameter “embedAttachmentsIntoPDF”). Thereby, the entire e-mail will be preserved in a read-only format that can be viewed anywhere at any time requiring a standard PDF reader only.
Exporting OLE objects
If the IBM Domino documents contain OLE embedded objects, Apache OpenOffice 4.1.5 or later must be installed and configured on the migration-center job server in order to properly extract the OLE objects.
Install Apache OpenOffice 4.1.5 on the migration-center job server.
Add the folder containing the “soffice.exe” file to the system’s search path. This folder is typically:
<Apache OpenOffice installation folder>/program
Add the following entry to the file “wrapper.conf” inside the migration-center server components installation folder and replace <#> with an appropriate value for your installation:
wrapper.java.classpath.<#>=<Apache OpenOffice installation folder>/program/classes/*.jar
Open the configuration file „documentDirectoryRuntimeConfiguration.xml“ located in subfolder „lib/mc-domino-scanner/conf“ of your migration-center server components‘ installation folder in your favorite editor for XML files.
Go to line 83 of the file which looks like:
<parameter name="exportOLEObjects">false</parameter>
and replace “false” with “true”.
The entry inside the configuration file should look like:
<parameter name="exportOLEObjects">true</parameter>
If you want to use a different port for the Apache OpenOffice server than the default port (8100), go to line 84 of the file:
<!--<parameter name="apacheOpenOfficePort">8100</parameter>-->
Uncomment it and and replace “8100” with the portnumber to use, e.g “1234”.
The entry inside the configuration file should look like:
<parameter name="apacheOpenOfficePort">1234</parameter>
Save the configuration file.
Generating PDF renditions
While PDF generation can be activated in the scanner’s configuration (parameters “primaryDocumentFormat”, “secondaryDocumentFormats” and “embedAttachmentsIntoPDF”), the setup of PDF generation requires and the additional “PDF Generation Module”.
The “PDF Generation Module” is licensed separately.
From a technical perspective, the “PDF Generation Module” requires an additional system (“rendition server”). This system will be used to print any IBM Notes document using a PDF printer driver based on IBM Notes’ standard print functionality. The process for PDF generation is as follows:
The scanner submits a request to create a PDF rendition for an existing Domino document or a DXL file to PDF Generation Module on the rendition server.
PDF Generation Module creates a PDF rendition of the document.
If PDF generation was successful, PDF Generation Module will save the PDF to a shared network drive.
PDF Generation Module will signal success or failure to the scanner.
Setting up the rendition server requires additional configurative actions. For each IBM Domino application/database template that was used to create documents, an empty database needs to be created based on this template and either made available locally on the rendition server or on the IBM Domino server.
Each of these empty databases needs to be prepared for PDF printing. As necessary configuration steps vary depending on the application that is being worked on, they cannot be described here.
Please contact your fme representative should you wish to implement PDF generation for migration of an IBM Domino application/database.
Log files
A complete history is available for any IBM Domino Scanner job from the respective item’s history window. It is accessible through the [History] button/menu entry on the toolbar/context menu.
The History window displays a list of all runs for the selected job together with additional information, such as the number of processed objects, the start and ending time and the status.
Double clicking an entry or clicking the [Open] button on the toolbar opens the log file created by that run. The log file contains more information about the run of the selected job:
version information of the migration-center Server Components the job was run with
the parameters the job was run with
the execution summary that contains the total number of objects processed, the number of documents and folders scanned or imported, the count of warnings and errors that occurred during runtime
Log files generated by the IBM Domino Scanner can be found in the server components installation folder of the machine where the job was run, e.g. …\fme AG\migration-center Server Components <Version>\logs
The amount of information written to the log files depends on the setting specified in the ‘loggingLevel’ start parameter for the respective job.
Known Issues
The following issues with the MC Domino Scanner are known to exist and will be fixed in later releases:
The scanner requires that the temporary directory for the user running MC Job Server Service exists and that the user can write to this directory. If the directory does either not exist or the user does not have write permission to the directory, the creation of temporary files during document and attachment extraction will fail. The logfile will show error messages like
„INFO | jvm 1 | 2014/10/02 12:06:26 | 12:06:26,850 ERROR [Job 1351] com.think_e_solutions.application.documentdirectory… - java.io.IOException: The system cannot find the path specified“.
To work around this issue, make sure the temporary folder exists and the user has write permission for this folder. If the MC Job Server is started manually as a normal user then the Temp folder should be C:\Users\Username\AppData\Local\Temp. Therefore, if the MC Job Server is run as a service by the Local System account, the folder is one of the following:
For the 32-bit version of Windows:
C:\Windows\System32\config\systemprofile\AppData\Local\Temp
For the 64-bit version of Windows:
C:\Windows\SysWOW64\config\systemprofile\AppData\Local\Temp
If a document is exported from IBM Domino but the related entries in the mc database cannot be created (e.g. because an attribute’s value exceeds the maximum number of characters allowed for a field in the mc database), the related files can be found in the filesystem (inside the export directory). If this document is scanned again, it will be treated as a new document, not as an update.
If the scanner parameter “relationType” is set to “relation”, relations will be automatically deleted by migration-center if they do not exist anymore. If the scanner parameter “relationType” is set to “object”, objects representing relationships cannot be deleted if the relation is invalidated.
Example: If a document had one attachment when scanned in scanner run #1 and that attachment was removed from the document before scanner run #2, the scanner cannot remove the object representing the “attachment” relation between document and attachment (created in scanner run #1) in scanner run #2.
If a PDF rendition is requested and DXLUtility receives the request to generate the rendition but isn’t able to import the DXL file into the appropriate IBM Domino database on the rendition server, it’s likely that the shared folder used to transfer DXL and PDF files between the scanner and PDF Generation Module cannot be read by the user running PDF Generation Module on the rendition server.
The scanner will crash the Java VM if the parameter “exportCompositeItems” is set to “true” and the log level in log4j.xml (located in subdirectory “conf” of the scanner installation directory) is set to “ERROR”.
The 64-bit version of the scanner relies on IBM Domino. As Domino lacks the required libraries to export “EML”, “HTML” or “RTF”, the 64-bit version of the scanner cannot export documents in any other format than “DXL” or “PDF”. If other formats are required, the scanner’s 32-bit version needs to be run based on IBM Notes instead.
Domino attribute types
The following table lists all (relevant) Domino attribute types.
The scanner parameter “excludedAttributeTypes” is a logical “OR” of all types that should be excluded from the scan.
Type
Numeric value
TYPE_ACTION
16
TYPE_ASSISTANT_INFO
17
TYPE_CALENDAR_FORMAT
24
TYPE_COLLATION
2
TYPE_COMPOSITE
1
TYPE_ERROR
0
TYPE_FORMULA
1536
TYPE_HIGHLIGHTS
12
TYPE_HTML
21
TYPE_ICON
6
TYPE_INVALID_OR_UNKNOWN
0
TYPE_LS_OBJECT
20
TYPE_MIME_PART
25
TYPE_NOTELINK_LIST
7
TYPE_NOTEREF_LIST
4
TYPE_NUMBER
768
TYPE_NUMBER_RANGE
769
TYPE_OBJECT
3
TYPE_QUERY
15
TYPE_RFC822_TEXT
1282
TYPE_SCHED_LIST
22
TYPE_SEAL
9
TYPE_SEAL_LIST
11
TYPE_SEALDATA
10
TYPE_SIGNATURE
8
TYPE_TEXT
1280
TYPE_TEXT_LIST
1281
TYPE_TIME
1024
TYPE_TIME_RANGE
1025
TYPE_UNAVAILABLE
512
TYPE_USER_DATA
14
TYPE_USERID
1792
TYPE_VIEW_FORMAT
5
TYPE_VIEWMAP_DATASET
18
TYPE_VIEWMAP_LAYOUT
19
TYPE_WORKSHEET_DATA
13
Last updated