SharePoint 2010 – some developer gotchas when migrating a lot of documents

These are a few of the gotchas that came up when I was writing a project in C# to migrate over a million documents into SharePoint 2010. If you aren’t the site designer then it is worth making sure the designers have considered these points (if relevant), otherwise you may end up cursing SharePoint unfairly.

  • Documents and folders – where are they?
  • Accessing SharePoint’s SQL tables or views invalidates warranty
  • Complex security rules in a library can be a BAD idea,
  • SharePoint Server 2010 capacity management: Software boundaries and limits
  • Files and versions
    • Create and modified dates, author and editor
    • Metadata promotion and demotion
    • ListItem.Update , SystemUpdate, UpdateOverwriteVersion
    • Library version control settings seem to affect behaviour of some methods

Documents and folders – where are they?

In a lot of the enterprise CMS solutions documents are abstracted and so can appear to live in multiple locations depending on the navigation approach used, yet always it is the same document. For example a user might include HR documents relating to themselves or their staff in their own workspace, while the documents are seen by HR staff to be in their usual place in the HR hierarchy. Similarly documents or folders relating to a project might appear to be in the workspaces of all the participants. In all cases if someone looks at a document from wherever it might appear to be, the CMS goes off and retrieves the single copy of the document.

SharePoint documents are located in a library. Users can create links to documents (or folders) in their own workspaces but the link is not the real document. If you try to edit the metadata for the link to the document you are just changing metadata for the link. You have to actually go to the right place to get at the documents metadata. For users of an enterprise CMS this is a bit of a come down, and one of the reasons a number of Content Management companies have produced modules for SharePoint. But those modules don’t always do everything you want.

SharePoint database table access – a warranty issue

The EULA you accepted actually prohibits accessing the database tables behind the scenes, and, I am told, also invalidates any warranty. The correct way to access them is through the SharePoint object model (on the server) or the web services. This makes sense as table definitions, views etc could get changed with service packs updates.

Ok, so why have I mentioned this? One of the standard approaches to migrating between CMSs when large numbers of documents are involved is to bypass web services (way too slow), and compare the speeds of using the object model versus moving documents directly and updating relevant tables (often 2 to 10 times faster which is important if you are talking weeks to migrate everything). But as this invalidates warranty it is a nonstarter without permission from Microsoft.

Complex security rules in a library can be a BAD idea

For us we had a lot of libraries planned but security, especially in areas like HR documents, would have all sorts of complex security rules. We started to find security roles would fail to stick, even though the code reported success. The following article would have been useful if we’d discovered it early on 😉 – SharePoint performance degradation with a large number of unique security scopes in lists. Performance degradation of Microsoft Office SharePoint Server 2007 and SharePoint 2010 with a large number of unique security scope set on folders or documents on a list.

From the article:

SharePoint 2010: When a greater number of unique security scopes than the value of the List Query Size Threshold (default is 5,000) set to the web application are created for folders or documents in a list, there’s a significant performance degradation in SharePoint operations that badly affects end users operations(rendering content) as well as SharePoint activities like indexing SharePoint Content. After the number of unique security scopes exceeds the value of the List Query Size Threshold, SharePoint uses a code path that requires additional SQL round trips to analyze the scopes before rendering a view.

SharePoint 2010: Control the number of unique security scopes for a list to be less than the value of the List Query Size Threshold.

Note: A scope is the security boundary for a securable object and any of its children that do not have a separate security boundary defined. A scope contains an Access Control List (ACL), but unlike NTFS ACLs, a scope can include security principals that are specific to SharePoint Server. The members of an ACL for a scope can include Windows users, user accounts other than Windows users (such as forms-based accounts), Active Directory groups, or SharePoint groups.

SharePoint Server 2010 capacity management: Software boundaries and limits

A great list to have is – Give it a read and make sure your site designers have considered these points.

Files, versions, etc

Create date, last updated date, author and editor

When migrating documents from another system into SharePoint you want the create date, last updated date, author, and editor to be the same as they were in the old system. In theory this shouldn’t be a problem. When you add a file with SPWeb.Files.Add you can set these parameters. But they didn’t stick for us, they kept defaulting to the current date / time and the user doing the adding. There is also “fun” when you migrate multiple versions.

When your item has been added you can use the SPListItem listItem to hit the properties

  • listItem[“Modified”], listItem[“Created”], listItem[“Editor”], and listItem[“Author”]
  • and update them with listItem.SystemUpdate() or listItem.UpdateOverwriteVersion() (more on these below)

As you have these settings for each document version you need to do this each time you add a version. The dates for the earlier version get kept with the properties of that version. Note that the same applies with the listItem[“Title”] field.

Metadata promotion and demotion

This is one of the interesting features of SharePoint. When you check in an Office document, document properties that match a content type field get mapped to that content type field (provided the data is valid). Once the document is part of SharePoint, if you work on the document the properties reflect the content type in SharePoint and all the selection lists etc for the properties link to the lists in SharePoint.

This makes managing the metadata quite simple, but it creates a bit of a problem with migrating documents with multiple versions. If you have several versions in your source system you want to add each of those documents in as a separate version. The first version is ok, the fun starts when you check it out so you can add new document as the next version. When you add the next version, a file from the source system, it doesn’t have the metadata updates that were applied to version 1 and SharePoint is updated with what it does have – so you have to tidy up the metadata each time you add a document.

ListItem.Update, SystemUpdate, UpdateOverwriteVersion

Generally you update metadata in the SPListitem object.

  • Update() is asynchronous and so “fastest”, or at least returns control to your code soonest. However it updates date stamps, editor and versions. And being asynchronous the update may not have finished before your code tries to correct metadata or add another version leading to sometimes ‘interesting’ results
  • In a document migration you want to keep the dates and authors as they were in the old system so you have to use SystemUpdate(false) to make them stick – (true) increments the versioning
  • UpdateOverwriteVersion() is useful if you have a version and want to replace the file as well as the metadata.

Library version control settings seem to affect behaviour of some methods

Sometimes we got different behaviours from the various methods (adding files, changing metadata) depending on the version control settings for the library. Because I never got to the bottom of this I’ll only suggest that if you are getting odd behaviours try changing the versioning for your test library and see if you get the same effects.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.