Monday, February 9, 2009

How SharePoint Found Its Content Database

Good Question

I was talking a client through a SharePoint web.config change a couple of weeks ago and he ended up asking a dynamite question.

How does the web application know where its content database is at?

You might say something like "it gets it from the config database", but that would beg another question, "how does it know where the config database is?". In reality, we're curious about how a web application orients itself to its environment when it comes time to respond to a request.

You'll notice that SharePoint web.configs don't have any connectionString elements in them that point to a particular content or configuration database, and SharePoint installs don't make any changes to parent web.configs or machine.configs. So how does this web application orient itself at runtime?

Lets Reflect, Literally

If you think about it, there's a few clues to where this code "ought to be". Pretend for a minute that you were tasked with authoring some piece of functionality that let a web application figure out where its database was at runtime. There's some typical places you'd hook in (besides the web.config). You'd know that this piece of functionality would need to be populated in time to serve requests who would need to know where the content db is at.

You might think about putting it in global.asax Application_OnStart or some http module event that's relatively early in the ASP.NET pipeline.

Lets look at how a typical SharePoint request gets resolved and see if we can't flush out some details. This is the path I took to piece this answer together.

  1. A request comes in and gets handled by IIS. The request matches either a port number or a host header on a web site in the web server.
  2. The matching SharePoint web site has wild card ISAPI mappings that pass all request to ASP.NET. Remember that in a typical ASP.NET web site there mappings for known ASP.NET file extensions (.aspx, .asmx, .ascx, etc...) to the ASP.NET ISAPI filter. When IIS gets a request that ends with the given file extension, it knows that this is an ASP.NET request and hands it off to the aspnet_isapi.dll ISAPI. In a SharePoint IIS web site, we pass everything to ASP.NET. This includes .gifs, .jpegs, .PSDs etc... SharePoint gets a chance to fulfill every requests that gets picked up by the site. Below is a screen cap the wild card mapping present in all SharePoint sites. This is created for you automatically when you provision a new web application. If you were to remove it, your SharePoint instance would no longer be able to serve non ASP.NET files out of the content database. Wildcard mapping that sends ALL requests to the .NET frameowrk and eventually to our SharePoint web application.
  3. Once the request gets handed to ASP.NET, its fair game for the web application to start doing some magic. In fact, it had better figure it out soon cause odds are this request is going to want something from the content database before the page life cycle hits. Because we're talking about the ASP.NET request pipeline (and early in it), it's a reasonable guess to assume an http module could be in play. If we look in the web.config under the <httpModules> section we find that SharePoint has registered an HttpModule named "SPRequest", it's fulfilled by code found in the Microsoft.SharePoint.ApplicationRuntime namespace by a class called SPRequestModule.The SPRequest HttpModule.
  4. If you reflect on SPRequestModule and look into BeginRequestHandler (one of the earlier events in the ASP.NET pipeline), it's obfuscated, but you can read it anyway. BeginRequestHandler makes a call to SPFarm.Local which ends up calling SPFarm.FindLocal in an effort to find the farm that this machine belongs to.
  5. SPFindLocal finally calls into SPConfigurationDatabase.Local which runs to the registry looking for HKLM\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Secure\ConfigDB. This is where your "connection string" for the current farm lives.
  6. This is also how SharePoint finds the config database, from there it simply takes the current request and uses it to find the content database (if any) associated with the given request. After all this has been figured out, a lot of it gets cached.
  7. Finally BeginRequestHandler starts through the complex task of mapping your request to a bunch of content in the web applications content database. The regular Page Life Cycle kicks off and at the end of the day we end up with an HttpResponse built up by SharePoint.

Thought that someone else might enjoy the answer. This is kind of a neat piece of functionality too, it allows the psconfig tool to move this machine from farm to farm without changing any web.config files, so long as you can tell it where it should remap the registry key to. And with that, there's a little less magic in the world.



Marcus Bell said...

Nice sleuthing. Another interesting article. Kepp 'em coming, my man!!!!!

SharePoint Hosting said...

its too good and you can find more about ShairPoint

Anonymous said...

Superb! In-depth, accurate and interesting .... and something I've often wondered about! Keep up the good work!