Migrating to SharePoint Online - More detail
The Clockwork SharePoint Cloud Migration Tool leverages the power of the Microsoft SharePoint Online Migration API to get documents, versions, and metadata into SharePoint Online. Using the API is Microsoft’s recommended approach to migration. It works as follows:
- Connect to the back end SQL database of your DM system and pull out a set of data for each migration run. It’s a bit more complex than that, and you need some SQL knowledge to set up the “Views”, but we can help get you going
- Import the data into the staging database
- Run a whole lot of checks
- Upload the files to the staging location in Azure (this is the Microsoft-recommended approach)
- Build up XML for a bunch of migration “jobs” and upload to Azure. The XML contains all the metadata, and we make a bunch of “jobs” so they can be run concurrently (max 1500 at a time)
- Launch some of the jobs, monitor, keeping a constant number of jobs running until done
- Check for errors and then validate everything was done
A migration run is a selection of data from the source DM system. It might contain just a few 100 documents or a few 100,000. The Clockwork Tool imports the metadata for the selected documents into a local SQLite database which is then used for monitoring the job progress.
Create views in SQL Server and import with "Get Data".
Alternatively, fill up the SQLite database with data from some other system
Validation checks, and where possible rectifies:
- source file existence
- illegal characters
- name length
- duplication (after any name adjustments for illegal characters)
- prior existence in SharePoint
- check the users assigned to version created and modified-by metadata actually exist.
The Migrator can be used to create:
- simple folders
- folders with content type and metadata (creator and last editor are set)
- document sets (with content type and metadata)
- as folders and document sets are “list items” they are specified for the Migrator in the same way as documents
- if the path specified for a document includes folders that don’t exist, the Migrator will create them as simple folders
- libraries, lists & sites: these can be created in several ways
- with a few basic details
- from a definition file
- by calling out to your own creation scripts
Batch and upload
The Microsoft Migration API migration jobs work most effectively with batches of around 250MB, but 100s of such jobs can be run simultaneously (the official limit is 1500 but in practise this depends on a number of factors).
The Clockwork Migrator groups the documents into batches and uploads the document metadata (as XML) to your Azure storage. If the documents haven’t already been uploaded to Azure the Migrator uploads them as well.
There are 3 ways to upload files to Azure:
- Using the Clockwork Migrator: slowest, best suited for testing and smaller runs
- Using Microsoft’s Azure Storage Explorer: fastest upload
- Using Microsoft’s Azure Data Box: best if you have many terabytes of data, or your internet upload speed is a bit slow.
Launch and monitor
The Tool launches migration jobs and monitors them to maintain a consistent number of concurrently running jobs.
Once individual migration jobs are complete the Tool checks the files actually arrived in SharePoint and notes any discrepancies.
- The Migration tool is a "Windows" application so can run on any Windows computer with Dot Net 4.7.2. This is often a Windows 10 PC but can also be a Windows Server.
- A user account with reasonable rights to all areas of the SharePoint Online tenancy being migrated to, and also at least read access to the tenancy home (https://<<YourTenancy>>.sharepoint.com).
- Access to a "storage" account in your Azure tenancy. Note that the Azure tenancy should be in the same geo-location as your SharePoint Online tenancy.
- If the tool is to pull data directly from a document management database in SQL Server then read access and an OLEDB connection string are required.
- Note: The Tool works off an SQLite staging database which it can populate from SQL Server views, or be pre-populated in with your own techniques.
- If the documents are being accessed from a file share or drive letter, access is needed to read all files, and a UNC (or local drive) file path.
metadata types supported
- String, number, date, choice
- managed metadata
The built in Clockwork "API" allows you to a number of things:
- Instead of having the tool provision structure (create sites, libraries, folders and document sets), it can call your own structure creation scripts
- It can also let you retrieve source documents from other systems using your own scripts.
There are several factors that can affect the overall speed
- Customer internet speed can affect both the time it takes to validate documents, and also the time it takes to upload documents to Azure.
[[Documents can be uploaded in advance ready for launch of the migration jobs. The Azure Storage Explorer is a great tool for this, and if you have too much data (many terabytes) you can look at Microsoft's Azure "Data Box"]]
- the Microsoft Migration API runs in your SharePoint Online tenancy as a background job and it backs off whenever the tenancy is under load. This means that during working hours (in the time zone of your SharePoint Online datacentre) performance is usually very slow, however after-hours Microsoft expect speeds of 250 GB/day are possible
- preparation of the migration run data can also take time as it depends on SQL Server speed, as well as the capabilities of the PC used to run the Migration Tool (where the SQLite database is run).
Why we use the SharePoint Online Migration API
SharePoint Online is a shared environment, and your tenancy is one of thousands using the same shared environment. Microsoft continually monitor overall resource usage and adjust how much resource we each have available at any given time.
Microsoft developed the SharePoint Online Migration API as a way to manage the intense resource needs of migrations. Migrations that use it can be given as much resource as is currently available, and throttled back instantly whenever system load gets a bit high.
Microsoft very strongly encourage all migrations to use it, and the net result is that migrations actually go much faster!
Files generally should not be migrated to the root of libraries. This is because the Migration API first uploads files to the root of the target library where it processes their metadata before moving them to the target destination. Files in the root of the library will prevent documents of the same name being uploaded.
Note: When the Migration API uploads a file to the root of the library it hides it so you don’t see it.
Similarly, because of the migration jobs running concurrently, there may be occasional errors if files of the same name (with different destination folders) happen to be uploaded at exactly the same time. However, the API error logs indicate if this happens so these occasional issues can be rectified.