SharePoint Scanner
Introduction
The SharePoint Scanner allows extracting documents, list items, folders, list/libraries and their related information from Microsoft SharePoint sites.
Supports Microsoft SharePoint 2007/2010/2013/2016 documents, list items, folders, list/libraries
Extract content(document), metadata
Extract versions
Exclude specified content types
Exclude specified file types
Exclude specified columns (attributes)
Calculate checksum during scan to be used later for validating the imported content (in combination with importers supporting this feature)
The SharePoint Scanner is implemented mainly as SharePoint Solution running on the SharePoint Server, with the Job Server part managing communication between migration-center and the SharePoint component.
SharePoint Scanners can be created, configured, started and monitored through migration-center Client, while the corresponding processes are executed by migration-center Job Server and the migration-center SharePoint Scanner respectively.
Scanner is the term used in migration-center for an input adapter. Using a scanner such as the SharePoint Scanner to extract data that needs processing in migration-center is the first step in a migration project, thus scan also refers to the process used to input data to migration-center.
Scanners and importers work as jobs that can be run at any time, and can even be executed repeatedly. For every run a detailed history and log file are created. Multiple scanner and import jobs can be created or run at a time, each being defined by a unique name, a set of configuration parameters and a description (optional).
Known Issues
The SharePoint Online Scanner might receive timeout error from SharePoint Online when scanning libraries with more than 5000 documents (#52865)
Installation
The migration-center SharePoint Scanner requires installing an additional, separate component from the main product components. The migration-center SharePoint Scanner is a SharePoint Solution which manages the scan (extraction) process from Microsoft SharePoint Server. This component will need to be installed and deployed manually on the machine hosting the Microsoft SharePoint Server. The required steps are detailed in this chapter.
To install the main product components consult the migration-center Installation Guide document.
To install the migration-center SharePoint Scanner, read on.
Requirements
The migration-center SharePoint Scanner is implemented as a SharePoint Solution, a functionality supported only with Microsoft SharePoint Server 2007 or newer.
Since the migration-center SharePoint Scanner Solution must be installed on the same machine as Microsoft SharePoint Server, the range of Windows operating systems supported is the same as those supported by Microsoft SharePoint Server 2007-2013 respectively. Please consult the documentation for Microsoft SharePoint Server 2007-2016 for more information regarding supported operating systems and system requirements.
Administrative rights are required for performing the required uninstallation, installation and deployment procedures described in this chapter.
Installing migration-center Server Components
Follow the standard installation procedure described in the Installation Guide to install the migration-center Server Components containing Job Server and corresponding part of the SharePoint Scanner.
Installing the SharePoint Solution part of the SharePoint Scanner
Connect to the SharePoint Server (log in directly or via Remote Desktop); in a farm, any server should be suited for this purpose.
Copy the McScanner.wsp file from <migration-center Server Components installation folder>/lib/mc-sharepoint-scanner/Sharepoint <SPVersion> to a location on the SharePoint Server
Open an administrative Command Prompt
Navigate to C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\<Hive Folder>\BIN
Use the STSADM tool to install the SharePoint Solution part of the SharePoint Scanner STSADM –o addsolution –filename <path to the file copied at step 2>\McScanner.wsp
For SharePoint 2010, 2013 and 2016 an alternative installation using PowerShell is possible and can be used if preferred:
Connect to the SharePoint Server (log in directly or via Remote Desktop); in a farm, any server should be suited for this purpose.
Copy the McScanner.wsp file from <migration-center Server Components installation folder>/lib/mc-sharepoint-scanner/Sharepoint <SPVersion> to a location on the SharePoint Server
Open the SharePoint Management Shell from the Start menu
Use the following PowerShell commands to install the SharePoint Solution part of the SharePoint Scanner Add-SPSolution <path to the file copied at step 2>\McScanner.wsp The output should be like
Name
SolutionId
Deployed
mcscanner.wsp
f905025e-3de7-44c9-828a-f7b12f726bc1
False
Deploying the SharePoint Scanner solution
Having installed the SharePoint Solution it is now time to deploy it. Due to differences in the various SharePoint versions’ management interfaces, the procedure differs slightly depending on the version used. Follow the steps below corresponding to the targeted SharePoint version:
SharePoint 2007:
Open SharePoint Central Administration
Go to Operations
Under Global Configuration, click Solution Management
Click McScanner.wsp and follow instructions to deploy the solution.
SharePoint 2010, 2013 and 2016:
Open SharePoint Central Administration
Go to System Settings
Under Farm Management, click Manage Farm Solutions
Click McScanner.wsp and follow the instructions to deploy the solution
Verify the solution works correctly after deployment by calling the following URL in a web browser:
http://<your sharepoint farm>/_vti_bin/McScanner.asmx?wsdl
If the output looks like picture below, deployment was successful and the SharePoint Scanner is working.
Configuration
Configure Java keystore for using a secure SharePoint connection
Since the SharePoint Scanner is split between two components (the mc Job Server part running in Java, and the SharePoint Solution part running on IIS), these two components need some additional configuration to communicate between one another in case the SharePoint site is configured to run over HTTPS using the SSL protocol.
In this case the issuer of the server’s SSL certificate must set as a trusted certification authority on the JVM used by the Job Server to allow the Job Server component of the SharePoint Scanner to trust and connect to the secure SharePoint site.
Follow the steps below to register the certification authority with the JVM:
Export the certificate as a .cer file
Transfer the file to the machine running the Job Server
Open a command prompt
Import the certificate file to the Java keystore using the following command (use the actual path corresponding to JAVA_HOME instead of the placeholder; the below is one single command/line!) JAVA_HOME\bin\keytool –import –alias <set an alias of your choice, e.g. SP2013> -keystore ..\lib\security\cacerts –file <full path and name of certificate file from step 2>
Enter “changeit” when asked for the password to the keystore
The information contained in the certificate is displayed. Verify the information is correct and describes the certification authority used to issue the SSL certificate used by the secure SharePoint connection
Type “y” when prompted “Trust this certificate?”
“Certificate was added to keystore” is displayed, confirming the addition of the CA from the certificate as a certification authority now trusted by Java.
Restart the Job Server
Repeat the above steps for all machines if you have multiple Job Servers with the SharePoint Scanner running.
SharePoint Scanner Properties
To create a new SharePoint Scanner, create a new scanner and select SharePoint from the Adapter Type drop-down. Once the adapter type has been selected, the Parameters list will be populated with the parameters specific to the selected adapter type. Mandatory parameters are marked with an *.
The Properties of an existing scanner can be accessed after creating the scanner by double-clicking the scanner in the list or selecting the Properties button/menu item from the toolbar/context menu. A description is always displayed at the bottom of the window for the selected parameter.
Multiple scanners can be created for scanning different locations, provided each scanner has a unique name.
Common scanner parameters
Configuration parameters
Values
Name
Enter a unique name for this scanner
Mandatory
Adapter type
Select SharePoint from the list of available adapters
Mandatory
Location
Select the Job Server location where this job should be run. Job Servers are defined in the Jobserver window. If no Job Server has been created by the user to this point, migration-center will prompt the user to define a Job Server Location when saving the Scanner.
Mandatory
Description
Enter a description for this job (optional)
SharePoint Scanner parameters
Configuration parameters
Values
webserviceUrl*
This is the URL to the SharePoint Scanner component installed on the SharePoint Server
Also see chapter 3 Configuration for more information.
Example: http://<sharepointfarm>/<site>/
Mandatory
Username*
The SharePoint user on whose behalf the scan process will be executed.
This user also needs to be able to access the temporary storage location where the scanned objects will be saved to (see parameter exportLocation below).
Should be a SharePoint Administrator.
Example: sharepoint.corporate.domain\spadmin
Mandatory
Password*
Password of the user specified above
Mandatory
includeLibraries*
List of Document Libraries the adapter should scan. Multiple values can be entered and separated with the “|” character. At least one valid Document Library must be specified
Mandatory
query
Sharepoint CAML query for a detailed selection of scan documents.
Does not work with excludeContentTypes.
excludeContentTypes
Exclude unneeded content types when scanning the document libraries specified above.
Multiple values can be entered and separated with the “|” character.
excludeFileExtensions
Exclude unneeded file types when scanning the document libraries specified above.
Multiple values can be entered and separated with the “|” character.
excludeAttributes
Exclude unneeded columns (attributes) when scanning the document libraries specified above.
Multiple values can be entered and separated with the “|” character.
includeInternalAttributes
List of internal SharePoint attributes to be scanned
scanDocuments
If enabled the scanner will a process all the Documents it encounters for the configured valid path
scanListItems
If enabled the scanner will a process all the List Items it encounters for the configured valid path
scanFolders
If enabled the scanner will a process all the Folders it encounters for the configured valid path
scanLists
If enabled the scanner will a process all the Lists/Libraries it encounters for the configured valid path
scanSubsites
If enabled the scanner will a process the data of the subsites
scanPermissions
If enabled each created item no matter the type will have the Permissions scanned into the migration center database for further use in the migration process
scanVersionsAsSingleObjects
If enabled the scanner will process each version tree as a single object in MC, which contains the content and metadata of the latest version and also the link to the contents of the other versions in the mc_previous_version_content_paths attribute
computeChecksum
If enabled the scanner calculates a checksum for every content it scans. These checksums can be used during import to compare against a second checksum computed during import of the documents. If the checksums differ, it means the content has been corrupted or otherwise altered, causing the affected document to be rolled back and transitioned to the “import error” status in migration-center.
hashAlgorithm
Specifies the algorithm to be used if the computeChecksum parameter is checked.
Supported algorithms: MD5, SHA-1, SHA-224, SHA-256, SHA-384 and SHA-512. Default algorithm is MD5.
Note that SHA-224
hashEncoding
Specified the encoding to be used if the computeChecksum parameter is checked.
Supported algorithms: HEX and Base64. Default is HEX.
exportLocation*
Folder path. The location where the exported object content should be temporary saved. This location is relative to the SharePoint server, thus the account specified above needs access to it. The export location can be both a local folder on the SharePoint server or a network share (recommended).
Mandatory
loggingLevel*
Sets the verbosity of the log file.
Values:
1 - logs only errors during scan
2 - is the default value reporting all warnings and errors
3 - logs all successfully performed operations in addition to any warnings or errors
4 - logs all events (for debugging only, use only if instructed by fme product support since it generates a very large amount of output. Do not use in production)
Mandatory
Export location requirements
When scanning from SharePoint, the “exportLocation” parameter must be set.
For ensuring proper functionality of the content export there are a few considerations to keep in mind:
Regarding the path: The export location should (ideally) be a UNC path that points to a shared location (e.g., \\server\fileshare). If a local system path is used (D:\Temp), this path will be relative to the SharePoint Server where the WSP solution is running, and NOT to the Job Server machine.
Regarding the credentials: For accessing the specified file share the SharePoint scanner will use the credentials provided in the scanner configuration. Therefore, the same user used to do the migration (e.g., sharepoint\mcuser) must have write permissions on the file share.
Scanning using the CAML query
Starting from version 3.3 of migration-center the SharePoint scanner is able to use SharePoint CAML queries for filtering which objects are to be scanned. Based on the entered query, the scanner scans documents, folders and list items in the lists/libraries, which are specified in the parameter “includeLibraries”. If the parameter “includeLibraries” contains *, the query applies to all lists/libraries within the site.
The following example shows a simple CAML query for scanning the contents of the "Level1" folder inside the "TestLib" library alongside all its subfolders:
<Where>
<BeginsWith>
<FieldRef Name='FileDirRef'/>
<Value Type='Text'>/sites/mc/TestLib/Level1</Value>
</BeginsWith>
</Where>
For details on how to form CAML queries for each version of SharePoint please consult the official Microsoft MSDN documentation.
When using the CAML query parameter “query” the “excludeContentTypes” parameter must be empty. Otherwise the scanner will fail to start with an error message.
History, Reports, Logs
A complete history is available for any SharePoint Scanner job from the respective items’ –History- window. It is accessible through the [History] button/menu entry on the toolbar/context menu. The -History- window displays a list of all runs for the selected job together with additional information, such as the number of processed objects, the start and ending time and the status.
Double clicking an entry or clicking the Open button on the toolbar opens the log file created by that run. The log file contains more information about the run of the selected job:
Version information of the migration-center Server Components the job was run with
The parameters the job was run with
Execution Summary that contains the total number of objects processed, the number of documents and folders scanned or imported, the count of warnings and errors that occurred during runtime.
Log files generated by the SharePoint Scanner can be found in the Server Components installation folder of the machine where the job was run, e.g. …\fme AG\migration-center Server Components 3.6\logs
The amount of information written to the log files depends on the setting specified in the ‘loggingLevel’ start parameter for the respective job.
Additional logs are generated by the SharePoint Solution part of the SharePoint Scanner on the server side. The location of this log file can be configured through the file C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\15\CONFIG\ Migration_Center_SP_Scanner.config .
Open the file with a text editor and edit the line below to configure the path to the log file
<file type="log4net.Util.PatternString" value="C:\MC\logs\%property{LogFileName}" />
Only change the path to the file, do not change the name! (the %property{LogFileName} part).
Last updated