Database Scanner
Introduction
The Database scanner can access SQL compliant databases and extract information via user specified SQL SELECT statements.
Note that the Database Scanner can extract content from a database or use the content files which are specified by the user during the transformation process via the mc_content_location system attribute.
Exporting objects from a database
The Database scanner can access SQL compliant databases and extract information via user specified SQL SELECT statements. Since many content management systems rely on some type of SQL compliant database to manage their data, the Database scanner can also be used as a generic interface for extracting information from unsupported/obsolete/custom built content management systems. The types of database management systems supported are not limited to any particular vendor or brand, as access happens via JDBC, hence any SQL compliant database having a (compatible) JDBC adapter available can be accessed and queried.
Some common content management system features supported by migration-center Database scanner are metadata, including system metadata such as permission information, owner, creator, content path, etc, as well as version information, as long as these types of information can be obtained from the respective system’s SQL database. The information extracted by the database scanner is stored in the migration-center database and can be processed, transformed, validated and imported just like any other type of scanned information.
Note that the Database Scanner can extract content from a database stored in BLOB/CLOB fields. Alternatively, the content files corresponding to the objects can be specified by the user during the transformation process via the mc_content_location system attribute.
Depending on the way the content is stored, it may be necessary to extract the content to the filesystem first by other means before migration-center can process it. For the mc_content_location system attribute any of the available transformation functions can be used, so it is easy to generate a value resembling a path pointing to the location of the object’s content file, which a migration-center importer can use to import the content. A good practice would be to export content files to the filesystem using the object’s unique identifier as the filename, and then build the path information based on the known path and the objects unique identifier. This location would need to be accessible to the Job Server running the import which will migrate the content to the new target system.
Database Scanner Configuration
Configuring the JDBC driver
The types of database management systems supported are not limited to any particular vendors or brands, as access happens via JDBC, hence any SQL compliant database having a (compatible) JDBC adapter available can be accessed and queried.
Before using the database scanner, the appropriate JDBC driver needs to be installed and configured for the Job Server. To configure a JDBC driver all required Java jar files and Java library paths need to be added to the "... \migration-center Server Components \jdbc.conf" file. The file can be edited with any text editor and contains all information needed for configuration as each setting is described in the file itself.
The JDBC driver for Oracle 11g is delivered with the Database scanner already preconfigured and ready to use.
Scanner configuration
To create a new database Scanner job, specify the respective adapter type in the Scanner Properties window – from the list of available connectors, “Database” must be selected. Once the adapter type has been selected, the Parameters list will be populated with the parameters specific to the selected adapter type, in this case the Database Scanner.
The Properties window of a scanner can be accessed by double-clicking a scanner in the list or selecting the Properties button or entry from the toolbar or context menu.
Common scanner parameters
Database scanner parameters
Configurations of queries in the “queryFile”
The “queryFile” is an xml file containing a series of SQL queries that will be run sequentially by the scanner for extracting information from the database.
There are two ways to scan objects with metadata:
Using one query that contains both the unique identifier of the rows/objects which need to be exported and the names of the columns that represent the object's metadata.
Using at least two of the queries: one is for selecting the unique identifier of the rows/objects which need to be exported as, and the second query to extract the metadata for each row/object returned by the previous query. Note: If the main query has metadata, those won't be taken into consideration.
If objects have multiple versions and the related information that need to be extracted (i.e. a row/object is a version of another row/object), some additional queries are required.
The number of queries for extracting metadata for objects or versions is not limited. Multiple queries can be specified to gather and extract various information related to objects.
Example 1.1: a simple file with one query
Example 1.2: a simple file with two queries
The xml file is composed for <query> elements that contain the queries that will be run by scanner for extracting objects and metadata. The <query> element may have the following attributes (some of them mandatory):
Example 2: a more complex query file
The scanner will validate the queries configuration file against the following xml schema:
Delta migration
If the parameter “scanUpdates” is not checked, the scanner will only scan new objects or new versions of existing objects and it will ignore all objects that were previously scanned, i.e. that already exist in the MC database (based on their ID in source system).
If the parameter “scanUpdates” is checked, the scanner will scan new objects or new versions of existing objects and it will detect if it should scan existing objects as updates. If an object will be scanned as an update depends on several factors:
If there are no attributes specified in the “deltaFields” parameter, the scanner will scan every object that already exists in the MC database as an update.
If one or multiple attributes are specified in the “deltaFields” parameter, the scanner will scan an object that was previously scanned as an update only if a value of a delta field in the source database is different than the corresponding value in the MC database. If all values (of all fields defined in “deltaFields”) in the source database match the values in the MC database, then the object will not be scanned as an update, it will just be ignored.
The field names in the “deltaFields” are case sensitive so you should define them as they are scanned by the scanner.
Scanning BLOB/CLOB content
Since version 3.2.8 Update 2 of migration-center the database connector supports extracting BLOB or CLOB content from databases. In order for this feature to work the main-content or version-content query type must be specified in the query file.
The structure of the queries is as follows:
The column for BLOB/CLOB is mandatory and it must have the alias BLOB_CONTENT or CLOB_CONTENT so the scanner is able to process the value correctly and extract the content.
The columns [nameOfFile] and [nameOfExtenstion] are optional. In case they are set, the scanner will use the values to build the name of the file where the content will be exported. The scanner avoids overwriting exiting files by adapting the file name to be unique in the export folder. Also, the characters that area not allowed in the file name will be replaced. Filename and extensions are also saved as source attributes so they can be used in the transformation engine. For avoiding name conflicts with the other attributes, the user can set aliases for these columns. If the columns [nameOfFile] and [nameOfExtenstion] are missing the scanner will use the value of the id in source system as a filename.
Multiple contents for a single object are allowed. If the query returns multiple rows the content returned by every row is exported. All the paths will be stored in the source attribute “mc_content_location” and the path from the last returned row is set as primary content in the column “content_location”.
Example 3: scanning BLOB content
Scanning Excel files
Although it is still possible to use the database scanner to scan Excel files, we recommend to use our CSV & Excel scanner instead. The CSV & Excel scanner is faster and there are no tweaks necessary to get the JDBC-ODBC bridge working in Java later 1.7.
Prerequisites and Configuration
To scan excel files MS Office or at least Excel ODBC drivers need to be installed on the Job Server machine. To install the Excel ODBC driver one of the easiest ways is to download and install the 32bit version of the Microsoft Access Database Engine 2010 Redistributable that includes the necessary drivers and which can be downloaded from Microsoft’s website.
The connectionURL for should have the following format:
jdbc:odbc:DRIVER={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};DBQ=
PATH-TO-EXCEL-FILE
;ReadOnly=1
Note that all extensions (*.xls, *.xlsx, *.xlsm, *.xlsb) are required.
Example 4: XML Queries for scanning Excel files
Solving Null values for some fields
When a column in the sheet contains mixed data (numbers and strings) the ODBC driver uses the first few rows to detect the column type. If the first values are numbers, the ODBC driver tries to handle all the subsequent values as numbers as well and therefore the values that are strings (not numbers) will be ignored, resulting in a “null” value being scanned by migration-center.
To avoid this problem, the parameter “IMEX=1” must be set in the metadata query:
Last updated