SharePoint Online Scanner
Introduction
The SharePoint Online Scanner allows extracting documents, folders and their related information from Microsoft SharePoint Online libraries.
SharePoint Online Scanners can be created, configured, started and monitored through migration-center Client, while the corresponding processes are executed by migration-center Job Server and the migration-center SharePoint Scanner respectively.
Scanner is the term used in migration-center for an input adapter. Using a scanner such as the SharePoint Scanner to extract data that needs processing in migration-center is the first step in a migration project, thus scan also refers to the process used to input data to migration-center.
Scanners and importers work as jobs that can be run at any time and can even be executed repeatedly. For every run a detailed history and log file are created. Multiple scanner and import jobs can be created or run at a time, each being defined by a unique name, a set of configuration parameters and a description (optional).
Installation
To install the main product components, consult the migration-center Installation Guide document.
The migration-center SharePoint Online scanner requires installing an additional component besides the main product components.
This additional component needs the .NET Framework 4.7.2 installed and it’s designed to run as a Windows service and must be installed on all machines where the a Job Server is installed.
To install this additional component, it is necessary to run an installation file, which is located within the
SharePoint folder of your Job Server install location, which is by default C:\Program Files (x86)\fme AG\migration-center Server Components <Version>\lib\mc-sharepointonline-scanner\CSOM_Service\install.
To install the service run the install.bat file using administrative privileges. You will need to start it manually for the first time, afterwards the service is configured to start automatically at system startup.
To uninstall the service run the uninstall.bat file using administrative privileges.
The CSOM service must be run with the same user as the Job Server service so that it has the same access to the export location.
SharePoint Online Scanner Properties
To create a new SharePoint Online Scanner, create a new scanner and select SharePoint Online from the Adapter Type drop-down. Once the adapter type has been selected, the Parameters list will be populated with the parameters specific to the selected adapter type. Mandatory parameters are marked with an *.
The Properties of an existing scanner can be accessed after creating the scanner by double-clicking the scanner in the list or by selecting the Properties button/menu item from the toolbar/context menu. A description is always displayed at the bottom of the window for the selected parameter.
Multiple scanners can be created for scanning different locations, provided each scanner has a unique name.
Common scanner parameters
Configuration parameters
Values
Name
Enter a unique name for this scanner
Mandatory
Adapter type
Select SharePoint Online from the list of available adapters
Mandatory
Location
Select the Job Server location where this job should be run. Job Servers are defined in the Jobserver window. If no Job Server has been created by the user to this point, migration-center will prompt the user to define a Job Server Location when saving the Scanner.
Mandatory
Description
Enter a description for this job (optional)
SharePoint Scanner parameters
The configuration parameters available for the SharePoint Scanner are described below:
Configuration parameters
Values
siteURL*
This is the URL to the SharePoint site that will be scanned.
Example: https://<sharepointonlinefarm>/<site>/
Mandatory
username*
The SharePoint Online user on whose behalf the scan process will be executed.
Should be a SharePoint Online Administrator.
Example: admin@company.onmicrosoft.com
Mandatory
password*
Password of the user specified above
Mandatory
proxyServer
The name or IP of the proxy server.
proxyPort
The port of the proxy server.
proxyUsername
The username if required by the proxy server.
proxyPassword
The password for the proxy username.
camlQuery
CAML statement that will be used to retrieve the ids of objects that will be scanned.
In case of setting this parameter the parameters excludeListAndLibraries, includeListAndLibraries, scanSubsites, excludeSubsites must not be set.
excludeListsAndLibraries
The list of libraries and lists path to be excluded from scanning.
includeListsAndLibraries
List of Lists and Libraries the adapter should scan. Multiple values can be entered and separated with the “,” character.
excludeSubsites
The list of subsites path to be excluded from scanning.
Multiple values can be entered and separated with the “,” character.
excludeContentTypes
The list of content types to be excluded from scanning.
Multiple values can be entered and separated with the “,” character.
excludeFolders
The list of folders to be excluded from scanning. All the folders with the specified name from the site/subsite/library/list depending of scanner configuration will be ignored by the scanner. To exclude a specific folder, it is necessary to specify the full path.
Multiple values can be entered and separated with the “,” character.
Example: folder1 then all the folders with the folder1 name from the site/subistes/library/list will be excluded.
<Some_Library>/<Test_folder>/folder1 the scanner will exclude just the folder1 that is in the Test_folder.
includeFolders
List of folders the adapter should scan. All the folders with the specified name from the site/subsite/library/list depending of scanner configuration will be scanned. To scan a specific folder, it is necessary to specify the full path.
The values of the parameter “excludeFolders” will be ignored if this attribute contains values.
Multiple values can be entered and separated with the “,” character.
Example: folder1 then all the folders with the folder1 name from the site/subistes/library/list will be scanned.
<Some_Library>/<Test_folder>/folder1 the scanner will scan just the folder1 that is in the Test_folder.
scanSubsites
Flag indicting if the objects from subsites will be scanned.
scanDocuments
Flag indicting if the documents scanned will be added as migration-center objects.
scanFolders
Flag indicting if the folders scanned will be added as migration center objects.
includedAttributes
The internal attributes that will be scanned even if the value is null
scanLatestVersionOnly
Flag indicating if just the latest version of a document will be scanned.
computeChecksum
If enabled the scanner calculates a checksum for every content it scans. These checksums can be used during import to compare against a second checksum computed during import of the documents. If the checksums differ, it means the content has been corrupted or otherwise altered, causing the affected document to be rolled back and transitioned to the “import error” status in migration-center.
hashAlgorithm
Specifies the algorithm to be used if the computeChecksum parameter is checked. Supported algorithms: MD5, SHA-1, SHA-224, SHA-256, SHA-384 and SHA-512.
Default algorithm is MD5.
hashEncoding
The encoding type which will be used for checksum computation. Supported encoding types are HEX, Base32, Base64.
Default encoding is HEX.
exportLocation*
Folder path. The location where the exported object content should be temporary saved. It can be a local folder on the same machine with the Job Server or a shared folder on the network. This folder must exist prior to launching the scanner and must have write permissions. migration-center will not create this folder automatically. If the folder cannot be found an appropriate error will be raised and logged. This path must be accessible by both scanner and importer so if they are running on different machines, it should be a shared folder.
Mandatory
numberOfThreads
Maximum number of concurrent threads.
Default is 10 and maximum allowed is 20.
loggingLevel*
Sets the verbosity of the log file.
Values:
1 - logs only errors during scan
2 - is the default value reporting all warnings and errors
3 - logs all successfully performed operations in addition to any warnings or errors
4 - logs all events (for debugging only, use only if instructed by fme product support since it generates a very large amount of output. Do not use in production)
Mandatory
Scanning using the CAML query
The SharePoint Online scanner can use SharePoint CAML queries for filtering which objects are to be scanned. Based on the entered query, the scanner scans documents and folders in the lists/libraries.
For details on how to form CAML queries for each version of SharePoint please consult the official Microsoft MSDN documentation.
When using the CAML query parameter “query”, the parameters "excludeListAndLibraries", "includeListAndLibraries", "scanSubsites", "excludeSubsites", "excludeFolders", "includeFolders" must not be set. Otherwise the scanner will fail to start with an error message.
Scanning permissions
The SharePoint Online scanner can extract permission information for documents and folders. Note that only unique permissions are extracted. Permissions inherited from parent objects are not extracted by the scanner.
Additional configuration settings
There is a configuration file for additional settings regarding the SharePoint Online Scanner. Located under the …/lib/mc-sharepointonline-scanner/ folder in the Job Server install location it has the following properties that can be set:
Configuration property
Values
excluded_file_extensions
List of file extensions that will be ignored by the scanner.
Default: .aspx|.webpart|.dwp|.master|.preview.
excluded_attributes
List of internal attributes that will be ignored by the scanner.
initialization_timeout
Amount of time in milliseconds to wait before the scanner throws a timeout error during the initialization phase.
Default: 21600000 ms
History, Reports, Logs
A complete history is available for any SharePoint Scanner job from the respective items – History - window. It is accessible through the [History] button/menu entry on the toolbar/context menu. The -History- window displays a list of all runs for the selected job together with additional information, such as the number of processed objects, the start and ending time and the status.
Double clicking an entry or clicking the Open button on the toolbar opens the log file created by that run. The log file contains more information about the run of the selected job:
Version information of the migration-center Server Components the job was run with
The parameters the job was run with
Execution Summary that contains the total number of objects processed, the number of documents and folders scanned or imported, the count of warnings and errors that occurred during runtime.
Log files generated by the SharePoint Scanner can be found in the Server Components installation folder of the machine where the job was run, e.g. …\fme AG\migration-center Server Components <Version>\logs
The amount of information written to the log files depends on the setting specified in the ‘loggingLevel’ start parameter for the respective job.
Additional logs
An additional log file is generated by the SharePoint Online Scanner.
The location of this log file is in the same folder as the regular SharePoint Online scanner log files with the name: mc-sharepointonline-scanner.log.
Last updated