System.OutOfMemoryException in SiteData During Search Crawl

Wednesday, 9 January 2008 13:58 by RanjanBanerji

This is a problem that has been eluding me for a while. Well two if not three months. Each time we ran a search crawl we would get a outOfMemoryException on certain pages. And of course, for the longest time we could not resolve this issue.

This is a log of what happened and what I and a colleague, Tim Clark, think is the issue. The reason I am telling the entire story is that the reader may pickup on several other SharePoint 2007 pointers.

For starters we used to get the System.OutOfMemoryException only on our production servers but never on our development servers which were a bunch of virtual servers. This issue was quickly resolved. Once glance at the crawl logs and I saw that most pages were not being crawled because the web server was timing out in serving them. Once I increased the timeout value we started getting the exceptions. So now the playing field was even. I could get the error in all our environments.

Then I did what everyone does. Start googling for this error. I got just one hit on some SharePoint forum with no responses to the persons question. So that did not help.

So then the analysis started. We were getting the exception when the crawler would hit certain very large lists. These lists were large for two reasons, large number of items, i.e, 8000 - 15,000 items and these items/documents were large in size. The next step, therefore, was to break these lists into smaller sizes. Using MS guidelines that lists should have fewer than 2000 items (even though these guidelines are for performance not for OutOfMemoryExceptions) I started cutting the big lists to smaller lists of 2000 items each. No help. I still got the errors.

I then cut the lists down to just 400 items each and that too did not help. Now I was in a bit of a bind. So the next conclusion was that the error was occurring when the indexer was going through the documents and images as these were large files.

The one problem I had while looking at the ULS verbose logs was that while I was seeing the exception being thrown, I could not tell which server was throwing it. Was it the machine that was crawling (the indexer) or the web server that was being crawler? what exactly is the data that is returned to the indexer?

These questions led to getting some support from Microsoft directly and some more detailed logging but this time of every web call being made to IIS. Now I got some interesting data. The exception was being thrown by the web server not the indexer. The exception was thrown by SiteData.GetContent().

Now I had something to go with. In the mean time with the help of Microsoft we managed to debug the w3wp process and send Microsoft the dump file. The finding were heap fragmentation. Hmmmm, now why would that happen.

We tried various tricks and configuration settings to choke down the crawler so the number of requests being made to the machine being crawled is reduced. Thinking this my prevent heap fragmentation. The theory of the moment was that for whatever reason the .Net framework or the SharePoint ISAPI DLL was not releasing memory in a timely manner.

While all this was going on I was also reading up as much as I could on how most SharePoint farms were setup. One distinct difference between ours and everyone else's or most others was that we were on a 32 bit environment. Each time I raised this issue with Microsoft and with all the SharePoint folks I worked with I was told that there is no reason to believe that this error is a result of using a 32 bit environment vs a 64. I disagreed but more about that later.

So next I started loading SharePoint assmeblies into reflector to see what's in them. More specifically what is going on in SiteData.GetContent(). The code looked quite normal. Just opens the SPWeb and then gets data for the list. One interesting observation was that it returned an XML with data on the list. So none of the images, documents etc were being returned to the indexer. So the memory fragmentation was not occurring because of handling large image files or documents.

I then created a web part that would invoke SiteData.GetContent() for the list that was giving the exception. This is when the findings started to become interesting. Tim Clark, who does not have a blog so I cannot point to it, started helping me by further analyzing this problem.

I wanted to see what would happen if we had an application that made the same calls the crawler did or something similar. So for starters he created a console application that went through each list in each site making the same calls that SiteData.GetContent() does. Hmmmm, no OutOfMemoryException. Now this was interesting.

Tim then made some interesting observations.

  • If we did an IIS reset and invoked a webpart with a call to SiteData.GetContent() on one of the bad lists we would not get the OutOfMemoryException.
  • If after an IIS reset we ran the crawler, we got the OutOfMemoryException. But from this point loading the webpart with a call to SiteData.GetContent() would result in the OutOfMemoryException. However, now calling the console application that would make the same calls would not.

This was interesting. Also looking at perfmon it appeared that we were getting the OutOfMemoryException even when the machine had plenty of memory to spare.

So what was causing this to happen. The analysis of the w3wp dump file by Microsoft said that there was a heap fragmentation. It appears that this was specific to the w3wp worker thread. Further analysis of the data from the SiteData.GetContent() calls showed that this method returned huge XML files. They were huge for the lists other than the lists in question. But not because these lists had images or documents. They were huge because the lists were of ContentTypes that had a large number of of Site Columns and with large amounts of data in them. Some of the XMLs were 15 to 20 MB.

Our guess (and its just a guess) is that for some reason the SharePoint ISAPI DLL or some other process is not releasing/Disposing objects in time. The reason is that the same or equivalent code can be called from a console application where we, not SharePoint get an SPWeb object, get all sites, lists etc and Dispose the objects and get no OutOfMemoryException error.

All this is interesting but what can one do to solve this? Well there are two possible options so far:

  1. Remember the 32 Vs. 64 bit question I raised? My guess is that with 64 bit the additional RAM available may make it harder to fragment the memory as much. This is not fixing the problem, its simply making the problem harder to reach. Incidentally we received a document from Microsoft now recommending a 64 bit environment. So we went out and purchased a few cheap 64 bit desktops, setup our farm and ran a search crawl. No OutOfMemoryExceptions. Wooo hooooo!
  2. Since the memory/heap fragmentation is occurring only for the w3wp process I figured why not recycle the worker process in the application pool. After some initial adjustment we set the process to recycle when it reached a certain amount of physical memory or virtual memory. We ran the search crawl. No OutOfMemoryExceptions. This approach, however, requires a lot of tweaking of the recycle rules between physical memory and virtual and changes in data can break this approach.  Not the best solution but it may work for some.

What is the cause of the exception? I still do not know. But I think so far we have found two work around's. Also, as I mentioned in a early post, look into your data. One of the reasons we were getting this error is that we had lists that contained huge amounts of data. Lists are meant to be lists not essays or books. If that is the case put your data in a document.

Tags:  
Categories:   SharePoint
Actions:   E-mail | Permalink | Comments (5) | Comment RSSRSS comment feed