SharePoint 2007 - Setting up Content Deployment Jobs

Wednesday, 27 February 2008 22:38 by RanjanBanerji

In my previous post on content deployment  I talk about how content deployment jobs are not as smart as they are made out to be.  A little more testing helped me determine when jobs can be smart and when they are not.

Let's start by defining smart.  When doing an incremental deployment Microsoft claims that for a path when content is deployed SharePoint will remember that fact and will only deploy new, edited, or deleted content from this point on.  This is not exactly true.  My previous post shows that.  But here is a recap and some additional information.

Let's start with creating a site collection with a structure like as follows.

Root

--------S1

--------S2

--------S3

--------S4

--------S5

--------S6

----------------S6.1

------------------------S6.1.1

------------------------S6.1.2

------------------------S6.1.3

----------------S6.2

----------------S6.3

 

Content Deployment Scenarios (Please note, all content deployments in this case are incremental, i.e., only new, edited, or deleted content is deployed).

Scenario 1

  1. Deploy S6.1.1 (Branch) as job 1
  2. Now deploy S6.1 (Branch) as job 2.  SharePoint will redeploy all content from S6.1.1 in job 2 even though you just deployed it under job 1, i.e., not that smart.

Scenario 2

  1. Deploy S6.1(Branch) job 1.
  2. Now deploy S6.1.1 as job 2.  SharePoint will only deploy content from S6.1.1 that has been edited, added, or deleted, i.e., smart, it remembers everything pushed under job 1.

Scenario 3

  1. Deploy S6.1 (Branch) as job 1
  2. Deploy S6.2 (Branch) as job 2
  3. Deploy S6.1 (Branch), S6.2 (Branch), and S6.3 (Branch) as job 3.  Job 3 will deploy S6.3 and only edits, additions and deletions to S6.1 and S6.2, i.e., smart.

So it appears SharePoint's content deployment is smart as long as a job contains sites that children of a site that is already deployed or siblings.  But deploy a site and then its parent site and the smartness is gone.

Why is this important?

Deploying content for a very large site collection under bad network conditions may require that your first deployment for each site is done as a separate job.  This will result in 10s or 20s or 100s of jobs to get your initial content across.  From this point on you want to reduce the total number of jobs running since now you are only deploying new content, changes, and deletions. 

Why reduce the number of jobs for new content?  Well SharePoint often issues a nasty error (save conflict) if two jobs complete at the same time.  Its not easy to schedule a large number of jobs in a manner that there will never be a save conflict error.  SharePoint offers no way to trigger a job based on the termination of another job.  So your best bet for reliable deployments is to reduce the number of jobs.

Ahaa! and the moment you try to reduce the number of jobs by creating consolidated jobs after your initial deployment you may find that SharePoint is redeploying everything if your consolidation followed the pattern of scenario 1 above.  And why is redeplying bad?  Well that will be addressed in my next post.

So plan how you create your smaller jobs and how you consolidate them later.

 

SharePoint 2007 Content Deployment: Deploying Large Applications

Friday, 15 February 2008 09:33 by RanjanBanerji

There is much to be said about SharePoint 2007's Content Deployment feature.  But I will rant about that later.  The good part is that if you spend 2 weeks of your life understand the Content Migration API you will figure out how to make it work.  Kinda!

If you have a large database and wish to deploy it to another application you will possibly encounter the following issues over and above errors caused by you because there is no documentation on Content deployment.  Specifically you may encounter failure in the transportation phase due to:

  • Timeouts
  • Network glitches

When these occur SharePoint does not have the "please fix your network or timeout issues and then click continue" option.  Nope, it has a much better feature.  Your entire deployment is wiped clean.  So an 8 hour export is lost due to a 5 second network burp.  So you decide to breakdown your Content Deployment into smaller jobs.  Right?  Well there are issues with that approach too:

Tyler Butler on the SharePoint Products and technology blog (http://blogs.msdn.com/sharepoint/archive/2006/05/02/588140.aspx) talks about how Content Deployment is smart and remembers what Content has been deployed in the past and will automatically deploy only new, changed or deleted information when configured to do so.

Well Tyler is right, but only kind of.  In my tests I observed some deviations from the stated behaviour (or my understanding of the what the behaviour should have been).

Imagine a web site/site collection with a structure like:

Root

--------S1

--------S2

--------S3

--------S4

--------S5

--------S6

----------------S6.1

------------------------S6.1.1

------------------------S6.1.2

------------------------S6.1.3

----------------S6.2

----------------S6.3

My Observations

  • If you deploy to any set of nodes you can create a second job or reuse the same job to do incremental deployments.  SharePoint Content Deployment is smart and will only deploy changes.
  • If you deploy to S6.1 and all its branches and then create another job to do an incremental to S6 or to Root the incremental will ignore the prior deployment to S6.1 and will push absolutely everything.

For some reason Microsoft decided that a deployment job will check all other jobs to see if any other deploys the exact same sites.  What they decided not to do is to check all sites in one job against all sites in all other jobs for a given path to determine what should or should not be deployed.  Adding a few seconds of time to do this would have saved many a people a lot of time and effort.

Why do I see Microsoft's approach as a bug or a problem rather than a feature?  Well we started of by saying we need to deploy a large application.  So imagine if your large application had S6.1 (branch) as a huge node as in lots of GB of data.  So does S3.  So you say, hmm! since deploying the entire site collection fails (due to some network issues that you face)  let's create 3 jobs:

  • Job1:  Deploys S6.1 and all its branches.  You run it and it works.
  • Job2:  Deploys S3.  You run it and it works.
  • Job3:  Deploys entire site collection (only changed, new, and deleted content):  You run it.  It runs but without considering Job1 and Job2 and therefore it re-deploys everything and then runs the risk of failure due to our assumed network issues.


Now you may say that Job3 can be created by selecting all nodes except S6.1 and S3.  you are correct.  But my example is well, an example.  What if you had hundreds of sites.  Creating job3 could become a pain.  And each time a new site is added someone will have to modify jobs.  SharePoint was meant to be easy to use right?  Am I missing something?

Another reason I needed Job3 to work as a site collection deployment is that once initial data is deployed using Job1, Job2, and Job3 I can use only Job3 from this point on for all future incremental deployments.  This will take away the headache of scheduling multiple jobs such that they do not overlap.  SharePoint has no way to schedule a job conditioned on the completion of another job.  I have observed too many errors when multiple jobs are running concurrently.  So we have another problem.

How does one push a large application?  Here are the options that I have come up with so far:

Options

  1. Create a set of jobs that will do the push and the following incremental.  The problem with this approach is scheduling the incremental jobs such that they do not overlap when running for that will possibly cause an error.   In fact when any two jobs complete at the same time SharePoint throws an error.
  2. Create a job that will push the entire site collection and somehow get the job to work through the SharePoint “out of the box” GUI.  Taking a SharePoint CD to Lourdes, sacrificing a lamb, denouncing evolution, and sticking needles into voodoo dolls representing Microsoft designers may help.
  3. Create a job using the SharePoint “out of the box” GUI but trigger this job using code using the Content Migration API.  See the example here: http://msdn2.microsoft.com/en-us/library/aa981161.aspx
    1. Do an export
    2. Get the files to the destination using a USB drive or any other mechanism.
    3. Do an import
    4. Hope that the job recognizes that a deployment occurred (due to token set in code) and therefore the incremental deployments via the SharePoint “out of the box” GUI will work.

Option 3 is probably your best bet.   But I have to admit I have never tried it.  All my current work started with trynig teh first option, i.e., several smaller jobs.  But this option has its own set of problems which I will talk about in future posts.

 

Tags:  
Categories:   SharePoint
Actions:   E-mail | Permalink | Comments (2) | Comment RSSRSS comment feed