SharePoint Online Scanner

Introduction

The SharePoint Online Scanner allows extracting documents, folders and their related information from Microsoft SharePoint Online libraries.

SharePoint Online Scanners can be created, configured, started and monitored through migration-center Client, while the corresponding processes are executed by migration-center Job Server and the migration-center SharePoint Scanner respectively.

Scanner is the term used in migration-center for an input adapter. Using a scanner such as the SharePoint Scanner to extract data that needs processing in migration-center is the first step in a migration project, thus scan also refers to the process used to input data to migration-center.

Scanners and importers work as jobs that can be run at any time and can even be executed repeatedly. For every run a detailed history and log file are created. Multiple scanner and import jobs can be created or run at a time, each being defined by a unique name, a set of configuration parameters and a description (optional).

Important note: Starting with version 3.15 Update 2 the SharePoint Online Scanner only supports the app-only authentication method! If you cannot use app-only authentication for any reason, please do not upgrade to 3.15 Update 2 or a later version.

Known issues & limitations

  • CAML query not working on SPO Scanner (#59597)

  • SPO Scanner might receive timeout error from SharePoint Online when scanning libraries with more than 5000 documents (#52865)

Installation

To install the main product components, consult the migration-center Installation Guide document.

The migration-center SharePoint Online scanner requires installing an additional component besides the main product components.

This additional component needs the .NET Framework 4.7.2 installed and it’s designed to run as a Windows service and must be installed on all machines where the a Job Server is installed.

To install this additional component, it is necessary to run an installation file, which is located within the SharePoint folder of your Job Server install location, which is by default C:\Program Files (x86)\fme AG\migration-center Server Components <Version>\lib\mc-sharepointonline-scanner\CSOM_Service\install.

To install the service run the install.bat file using administrative privileges. You will need to start it manually for the first time, afterwards the service is configured to start automatically at system startup.

To uninstall the service run the uninstall.bat file using administrative privileges.

The CSOM service must be run with the same user as the Job Server service so that it has the same access to the export location.

The app-only principal authentication used by the scanner calls the following HTTPS endpoints. Please ensure that the job server machine has access to those endpoints:

  • <tenant name>.sharepoint.com:443

  • accounts.accesscontrol.windows.net:443

Preparation for app-only principal authentication

The importer supports app-principal authentication for connecting to SharePoint Online. The app-principal authentication comes in two flavors: Azure AD app-only principal authentication and SharePoint app-only principal authentication.

Azure AD app-only authentication requires full control access for the migration-center application on your SharePoint Online tenant. This includes full control on ALL site collections of your tenant.

If you want to restrict the access of the migration-center application to certain site collections or sites, you can use SharePoint app-only authentication.

Azure AD app-only principal authentication

The migration-center SharePoint Online scanner supports Azure AD app-only authentication. This is the authentication method for background processes accessing SharePoint Online recommended by Microsoft. When using SharePoint Online you can define applications in Azure AD and these applications can be granted permissions to your SharePoint Online tenant.

Please follow these steps in order to setup your migration-center application in your Azure AD.

The information in this chapter is based on the following Microsoft guidelines: https://docs.microsoft.com/en-us/sharepoint/dev/solution-guidance/security-apponly-azuread

Step 1: Create a self-signed certificate for your migration-center Azure AD application

In Azure AD when doing App-Only you typically use a certificate to request access: anyone having the certificate and its private key can use the app and the permissions granted to the app. The below steps walk you through the setup of this model.

You are now ready to configure the Azure AD Application for invoking SharePoint Online with an App-Only access token. To do that, you must create and configure a self-signed X.509 certificate, which will be used to authenticate your migration-center Application against Azure AD, while requesting the App-Only access token. First you must create the self-signed X.509 Certificate, which can be created using the makecert.exe tool that is available in the Windows SDK or through a provided PowerShell script which does not have a dependency to makecert. Using the PowerShell script is the preferred method and is explained in this chapter.

It's important that you run the below scripts with Administrator privileges.

To create a self-signed certificate with this script, which you can find in the <job server folder>\lib\mc-spo-batch-importer\scripts folder:

.\Create-SelfSignedCertificate.ps1 -CommonName "MyCompanyName" -StartDate 2020-07-01 -EndDate 2022-06-30

The dates are provided in ISO date format: YYYY-MM-dd

You will be asked to give a password to encrypt your private key, and both the .PFX file and .CER file will be exported to the current folder.

Save the password of the private key as you’ll need it later.

Step 2: Register the migration-center Azure AD application

Next step is registering an Azure AD application in the Azure Active Directory tenant that is linked to your Office 365 tenant. To do that, open the Office 365 Admin Center (https://admin.microsoft.com) using the account of a user member of the Tenant Global Admins group. Click on the "Azure Active Directory" link that is available under the "Admin centers" group in the left-side tree view of the Office 365 Admin Center. In the new browser's tab that will be opened you will find the Microsoft Azure portal (https://portal.azure.com/). If it is the first time that you access the Azure portal with your account, you will have to register a new Azure subscription, providing some information and a credit card for any payment need. But don't worry, in order to play with Azure AD and to register an Office 365 Application you will not pay anything. In fact, those are free capabilities. Once having access to the Azure portal, select the "Azure Active Directory" section and choose the option "App registrations". See the next figure for further details.

In the "App registrations" tab you will find the list of Azure AD applications registered in your tenant. Click the "New registration" button in the upper left part of the blade. Next, provide a name for your application, e.g. “migration-center” and click on "Register" at the bottom of the blade.

Once the application has been created copy the "Application (client) ID" as you’ll need it later.

Step 3: Configure necessary permissions for the migration-center application

Now click on "API permissions" in the left menu bar and click on the "Add a permission" button. A new blade will appear. Here you choose the permissions that are required by migration-center. Choose i.e.:

  • Microsoft APIs

    • SharePoint

      • Application permissions

        • Sites

          • Sites.FullControl.All

        • TermStore

          • TermStore.Read.All

        • User

          • User.Read.All

    • Graph

      • Application permissions

        • Sites

          • Sites.FullControl.All

Click on the blue "Add permissions" button at the bottom to add the permissions to your application. The "Application permissions" are those granted to the migration-center application when running as App Only.

Step 4: Uploading the self-signed certificate

Next step is “connecting” the certificate you created earlier to the application. Click on "Certificates & secrets" in the left menu bar. Click on the "Upload certificate" button, select the .CER file you generated earlier and click on "Add" to upload it.

Step 5: Grand admin consent

The “Sites.FullControl.All” application permission requires admin consent in a tenant before it can be used. In order to do this, click on "API permissions" in the left menu again. At the bottom you will see a section "Grand consent". Click on the "Grand admin consent for" button and confirm the action by clicking on the "Yes" button that appears at the top.

Final Step: Setting the necessary parameters in the importer

In order to use Azure AD app-only principal authentication with the SharePoint Online scanner you need to fill in the following scanner parameters with the information you gathered in the steps above:

Configuration parameters

Values

appClientId

The ID of the migration-center Azure AD application.

appCertificatePath

The full path to the certificate .PFX file, which you have generated when setting up the Azure AD application.

appCertificatePassword

The password to read the certificate specified in appCertificatePath.

SharePoint app-only principal authentication

SharePoint app-only authentication allows you to grant fine granular access permissions on your SharePoint Online tenant for the migration-center application.

The information in this chapter is based on the following guidelines from Microsoft:

https://docs.microsoft.com/en-us/sharepoint/dev/solution-guidance/security-apponly-azuread https://docs.microsoft.com/en-us/sharepoint/dev/solution-guidance/security-apponly-azureacs https://docs.microsoft.com/en-us/sharepoint/dev/sp-add-ins/add-in-permissions-in-sharepoint

Step 1: Create a self-signed certificate for your migration-center Azure AD application

In Azure AD when doing App-Only you typically use a certificate to request access: anyone having the certificate and its private key can use the app and the permissions granted to the app. The below steps walk you through the setup of this model.

You are now ready to configure the Azure AD Application for invoking SharePoint Online with an App-Only access token. To do that, you must create and configure a self-signed X.509 certificate, which will be used to authenticate your migration-center Application against Azure AD, while requesting the App-Only access token. First you must create the self-signed X.509 Certificate, which can be created by using the makecert.exe tool that is available in the Windows SDK or through a provided PowerShell script which does not have a dependency to makecert. Using the PowerShell script is the preferred method and is explained in this chapter.

It's important that you run the below scripts with Administrator privileges.

To create a self-signed certificate with this script, which you can find in the <job server folder>\lib\mc-spo-batch-importer\scripts folder:

.\Create-SelfSignedCertificate.ps1 -CommonName "MyCompanyName" -StartDate 2020-07-01 -EndDate 2022-06-30

The dates are provided in ISO date format: YYYY-MM-dd

You will be asked to give a password to encrypt your private key, and both the .PFX file and .CER file will be exported to the current folder.

Save the password of the private key as you’ll need it later.

Step 2: Register the migration-center Azure AD application

Next step is registering an Azure AD application in the Azure Active Directory tenant that is linked to your Office 365 tenant. To do that, open the Office 365 Admin Center (https://admin.microsoft.com) using the account of a user member of the Tenant Global Admins group. Click on the "Azure Active Directory" link that is available under the "Admin centers" group in the left-side tree view of the Office 365 Admin Center. In the new browser's tab that will be opened you will find the Microsoft Azure portal (https://portal.azure.com/). If it is the first time that you access the Azure portal with your account, you will have to register a new Azure subscription, providing some information and a credit card for any payment need. But don't worry, in order to play with Azure AD and to register an Office 365 Application you will not pay anything. In fact, those are free capabilities. Once having access to the Azure portal, select the "Azure Active Directory" section and choose the option "App registrations". See the next figure for further details.

In the "App registrations" tab you will find the list of Azure AD applications registered in your tenant. Click the "New registration" button in the upper left part of the blade. Next, provide a name for your application, e.g. “migration-center” and click on "Register" at the bottom of the blade.

Once the application has been created copy the "Application (client) ID" as you’ll need it later.

Step 3: Uploading the self-signed certificate and generate secret key

Next step is “connecting” the certificate you created earlier to the application. Click on "Certificates & secrets" in the left menu bar. Click on the "Upload certificate" button, select the .CER file you generated earlier and click on "Add" to upload it.

After that, you need to create a secret key. Click on “New client secret” to generate a new secret key. Give it an appropriate description, e.g. “migration-center” and choose an expiration period that matches your migration project time frame. Click on “Add” to create the key.

Store the retrieved information (client id and client secret) since you'll need this later! Please safeguard the created client id/secret combination as would it be your administrator account. Using this client id/secret one can read/update all data in your SharePoint Online environment!

Step 4: Granting permissions to the app-only principal

Next step is granting permissions to the newly created principal in SharePoint Online.

If you want to grant tenant scoped permissions this granting can only be done via the “appinv.aspx” page on the tenant administration site. If your tenant URL is https://contoso-admin.sharepoint.com, you can reach this site via https://contoso-admin.sharepoint.com/_layouts/15/appinv.aspx.

If you want to grant site collection scoped permissions, open the “appinv.aspx” on the specific site collection, e.g. https://contoso.sharepoint.com/sites/mysite/_layouts/15/appinv.aspx.

Once the page is loaded add your client id and look up the created principal by pressing the "Lookup" button:

Please enter “www.migration-center.com” in field “App Domain” and “https://www.migration-center.com” in field “Redirect URL”.

To grant permissions, you'll need to provide the permission XML that describes the needed permissions. The migration-center application will always need the “FullControl” permission. Use the following permission XML for granting tenant scoped permissions:

<AppPermissionRequests AllowAppOnlyPolicy="true"> <AppPermissionRequest Scope="http://sharepoint/content/tenant" Right="FullControl" /> <AppPermissionRequest Scope="http://sharepoint/taxonomy" Right="Read" /> </AppPermissionRequests>

Use this permission XML for granting site collection scoped permissions:

<AppPermissionRequests AllowAppOnlyPolicy="true"> <AppPermissionRequest Scope="http://sharepoint/content/sitecollection" Right="FullControl" /> <AppPermissionRequest Scope="http://sharepoint/taxonomy" Right="Read" /> </AppPermissionRequests>

When you click on “Create” you'll be presented with a permission consent dialog. Press “Trust It” to grant the permissions:

Please safeguard the created client id/secret combination as would it be your administrator account. Using this client id/secret one can read/update all data in your SharePoint Online environment!

Final Step: Setting the necessary parameters in the scanner

In order to use SharePoint app-only principal authentication with the SharePoint Online scanner you need to fill in the following scanner parameters with the information you gathered in the steps above:

Configuration parameters

Values

appClientId

The ID of the SharePoint application you have created.

appClientSecret

The client secret, which you have generated when setting up the SharePoint application.

SharePoint Online Scanner Properties

To create a new SharePoint Online Scanner, create a new scanner and select SharePoint Online from the Adapter Type drop-down. Once the adapter type has been selected, the Parameters list will be populated with the parameters specific to the selected adapter type. Mandatory parameters are marked with an *.

The Properties of an existing scanner can be accessed after creating the scanner by double-clicking the scanner in the list or by selecting the Properties button/menu item from the toolbar/context menu. A description is always displayed at the bottom of the window for the selected parameter.

Multiple scanners can be created for scanning different locations, provided each scanner has a unique name.

Common scanner parameters

Configuration parameters

Values

Name*

Enter a unique name for this scanner

Mandatory

Adapter type*

Select SharePoint Online from the list of available adapters

Mandatory

Location*

Select the Job Server location where this job should be run. Job Servers are defined in the Jobserver window. If no Job Server has been created by the user to this point, migration-center will prompt the user to define a Job Server Location when saving the Scanner.

Mandatory

Description

Enter a description for this job (optional)

SharePoint Scanner parameters

The configuration parameters available for the SharePoint Scanner are described below:

Configuration parameters

Values

tenantName*

The name of your SharePoint Online Tenant

Example: Contoso

Mandatory

There are several web site that explain how to determine a SharePoint Online tenant name, e.g. https://morgantechspace.com/2019/07/how-to-find-your-tenant-name-in-office-365.html

tenantURL*

The URL of your SharePoint Online Tenant

Example: https://contoso.sharepoint.com

Mandatory

siteName*

The path to your target site collection.

Example: /sites/My Site

Mandatory

appClientId*

The ID of either the migration-center Azure AD application or the SharePoint application.

Example: ab187da0-c04d-4f82-9f43-51f41c0a3bf0

Mandatory

appCertificatePath

The full path to the certificate .PFX file, which you have generated when setting up the Azure AD application.

Example: D:\migration-center\config\azure-ad-app-cert.pfx

Mandatory for Azure AD app

appCertificatePassword

The password to read the certificate specified in appCertificatePath.

Mandatory for Azure AD app

appClientSecret

The client secret, which you have generated when setting up the SharePoint application (SharePoint app-only principal authentication).

Mandatory for SharePoint app

proxyServer

The name or IP of the proxy server.

proxyPort

The port of the proxy server.

proxyUsername

The username if required by the proxy server.

proxyPassword

The password for the proxy username.

camlQuery

CAML statement that will be used to retrieve the ids of objects that will be scanned.

In case of setting this parameter the parameters excludeListAndLibraries, includeListAndLibraries, scanSubsites, excludeSubsites must not be set.

excludeListsAndLibraries

The list of libraries and lists path to be excluded from scanning.

includeListsAndLibraries

List of Lists and Libraries the adapter should scan. Multiple values can be entered and separated with the “,” character.

excludeSubsites

The list of subsites path to be excluded from scanning.

Multiple values can be entered and separated with the “,” character.

excludeContentTypes

The list of content types to be excluded from scanning.

Multiple values can be entered and separated with the “,” character.

excludeFolders

The list of folders to be excluded from scanning. All the folders with the specified name from the site/subsite/library/list depending of scanner configuration will be ignored by the scanner. To exclude a specific folder, it is necessary to specify the full path.

Multiple values can be entered and separated with the “,” character.

Example: folder1 then all the folders with the folder1 name from the site/subsites/library/list will be excluded.

<Some_Library>/<Test_folder>/folder1 the scanner will exclude just the folder1 that is in the Test_folder.

includeFolders

List of folders the adapter should scan. All the folders with the specified name from the site/subsite/library/list depending of scanner configuration will be scanned. To scan a specific folder, it is necessary to specify the full path.

The values of the parameter “excludeFolders” will be ignored if this attribute contains values.

Multiple values can be entered and separated with the “,” character.

Example: folder1 then all the folders with the folder1 name from the site/subsites/library/list will be scanned.

<Some_Library>/<Test_folder>/folder1 the scanner will scan just the folder1 that is in the Test_folder.

scanSubsites

Flag indicting if the objects from subsites will be scanned.

scanDocuments

Flag indicting if the documents scanned will be added as migration-center objects.

scanFolders

Flag indicting if the folders scanned will be added as migration center objects.

includeAttributes

The internal attributes that will be scanned even if the value is null

scanLatestVersionOnly

Flag indicating if just the latest version of a document will be scanned.

computeChecksum

If enabled the scanner calculates a checksum for every content it scans. These checksums can be used during import to compare against a second checksum computed during import of the documents. If the checksums differ, it means the content has been corrupted or otherwise altered, causing the affected document to be rolled back and transitioned to the “import error” status in migration-center.

hashAlgorithm

Specifies the algorithm to be used if the "computeChecksum" parameter is checked. Supported algorithms: MD5, SHA-1, SHA-224, SHA-256, SHA-384 and SHA-512.

Default algorithm is MD5.

hashEncoding

The encoding type which will be used for checksum computation. Supported encoding types are HEX, Base32, Base64.

Default encoding is HEX.

exportLocation*

Folder path. The location where the exported object content should be temporary saved. It can be a local folder on the same machine with the Job Server or a shared folder on the network. This folder must exist prior to launching the scanner and must have write permissions. migration-center will not create this folder automatically. If the folder cannot be found an appropriate error will be raised and logged. This path must be accessible by both scanner and importer so if they are running on different machines, it should be a shared folder.

Mandatory

numberOfThreads

Maximum number of concurrent threads.

Default is 10 and maximum allowed is 20.

loggingLevel*

Sets the verbosity of the log file.

Values:

1 - logs only errors during scan

2 - is the default value reporting all warnings and errors

3 - logs all successfully performed operations in addition to any warnings or errors

4 - logs all events (for debugging only, use only if instructed by fme product support since it generates a very large amount of output. Do not use in production)

Mandatory

Scanning using the CAML query

The SharePoint Online scanner can use SharePoint CAML queries for filtering which objects are to be scanned. Based on the entered query, the scanner scans documents and folders in the lists/libraries.

The following example shows a simple CAML query for scanning the contents of the "Level1" folder inside the "TestLib" library alongside all its subfolders: <Where> <BeginsWith> <FieldRef Name='FileDirRef'/> <Value Type='Text'>/sites/mc/TestLib/Level1</Value> </BeginsWith> </Where>

For details on how to form CAML queries for each version of SharePoint please consult the official Microsoft MSDN documentation.

When using the CAML query parameter “query”, the parameters "excludeListAndLibraries", "includeListAndLibraries", "scanSubsites", "excludeSubsites", "excludeFolders", "includeFolders" must not be set. Otherwise the scanner will fail to start with an error message.

Scanning permissions

The SharePoint Online scanner can extract permission information for documents and folders. Note that only unique permissions are extracted. Permissions inherited from parent objects are not extracted by the scanner.

Additional configuration settings

There is a configuration file for additional settings regarding the SharePoint Online Scanner. Located under the …/lib/mc-sharepointonline-scanner/ folder in the Job Server install location it has the following properties that can be set:

Configuration property

Values

excluded_file_extensions

List of file extensions that will be ignored by the scanner.

Default: .aspx|.webpart|.dwp|.master|.preview.

excluded_attributes

List of attributes that will be ignored by the scanner. Use "|" as a delimiter when specifying more than one attribute.

initialization_timeout

Amount of time in milliseconds to wait before the scanner throws a timeout error during the initialization phase.

Default: 21600000 ms

History, Reports, Logs

A complete history is available for any SharePoint Scanner job from the respective items – History - window. It is accessible through the [History] button/menu entry on the toolbar/context menu. The -History- window displays a list of all runs for the selected job together with additional information, such as the number of processed objects, the start and ending time and the status.

Double clicking an entry or clicking the Open button on the toolbar opens the log file created by that run. The log file contains more information about the run of the selected job:

  • Version information of the migration-center Server Components the job was run with

  • The parameters the job was run with

  • Execution Summary that contains the total number of objects processed, the number of documents and folders scanned or imported, the count of warnings and errors that occurred during runtime.

Log files generated by the SharePoint Scanner can be found in the Server Components installation folder of the machine where the job was run, e.g. …\fme AG\migration-center Server Components <Version>\logs

The amount of information written to the log files depends on the setting specified in the ‘loggingLevel’ start parameter for the respective job.

Additional logs

An additional log file is generated by the SharePoint Online Scanner.

The location of this log file is in the same folder as the regular SharePoint Online scanner log files with the name: mc-sharepointonline-scanner.log.