Thursday, January 22, 2009

SharePoint: Searching With Managed Properties

Crawled and Managed Properties

When the SharePoint 2007 crawler indexes a piece of content, it finds a not only a bunch of text to index, but also a tonne of metadata about the document. These pieces of metadata are referred to as crawled properties. Consider an Office 2007 Word document (.docx). When the crawler opens up the document, it finds a plethora of information about the file; the file extension, who authored it, when it was created, and the actual text in the document itself. These are just some the many crawled properties captured about any given piece of content. What fields actually get captured are a function of the content and the IFilter that helped the crawler dissect the document.

Here's a reduced screen cap of some of the crawled properties captured for Office Documents. Crawled properties (for Office documents) captured by the SharePoint CrawlerYou can sift through all the properties that are being captured by your crawler by going to the Shared Service Provider, then clicking on Search Settings, Metadata property mappings and then on Crawled Properties. If you click on the Office category you'll end up getting a list that's much longer than the abridged one on the right. Whenever the crawler discovers a new property, it adds the new crawled property to a list maintained in the Shared Service provider. If the property discovered is of type text, then it is automatically added to the search index.

It's important to note that a Crawled Property can be reused in many Managed Properties, and Managed Properties can have mappings from many Crawled Properties (many to many). In fact this is how we harvest value from the Crawled/Managed property marriage. By using them together we can construct extremely powerful and expressive search queries.

Managed Properties are user created labels, that map one or more crawled properties to a searchable term. They are available for basic search, advanced search and for defining scopes. An example of a search expression using a managed property might be the search:

Title:"Tyler"

When I search Title:"Tyler" (read [ManagedProperty]:"[SearchTerm]"), I'm really searching all the crawled properties that are mapped to the managed property Title, for the text "Tyler".

That is, the search expression Title:"Tyler" will bring back all the search results that have the text "Tyler" in at least one of the following crawled properties [Mail:5(Text), People:PreferredName(Text), Basic:displaytitle(Text), ows_Title(Text)]. In fact if you feel that the crawled properties attached to Title aren't sufficient, you can make any changes you want. Managed properties are there for you to explore and tweak.

Creating a Managed Property

You don't really have a lot of control over crawled properties (you simply get what the indexer picks up). You do however get a fair bit of hand over what managed properties exist, and what crawled properties are mapped to them. It's worth mentioning that you don't always need to install a new IFilter to get more crawled properties to show up. The indexer will also pick up custom site columns that you add to your lists/libraries every time it does a full crawl. If this site column has never been seen before (ie. you just created it), a new entry will be added to the crawled properties list.

A good example of this would be if you were to create a new site column (say MyCustomeField) on some list. The nextCreating a Managed Property using a crawled property discoverd from a custom data type. time the indexer does a full crawl, the new crawled property will show up (named ows_MyCustomeField). You can then open up Metadata Property Mappings (from the Shared Service Provider Search Settings) and create a new managed property (say TylersCustomField) that users can use to search exactly that site column. They could, for exaple search for TylersCustomeField:"SomeValue". You can even use that managed property to limit or grow the search results that are governed by any search scope.

When you start to digest the utility of Managed/Crawled properties, it becomes apparent they they're responsible for a huge part of the SharePoint search's expressiveness. They empower you to help users author some pretty functional searches. Not just the kind of searches that find all the documents with a certain file extension (fileextension:"doc", but searches that look only at very custom properties within niche types of content.

Consider the following out of the box managed properties that allow searching for people by their assistant's name (Assistant:"Mary"), by their Responsibilities (Responsibilities:"Sales"), or finding files by filename (filename:"my document"). Managed properties can also be grouped in searches (fileextension:"doc" filename:"my document").

Stop by the Managed Properties pages in the Shared Services Provider and you'll be impressed. Once you start to wrap your head around managed properties, you'll realize how easy it is to leverage this out of the box functionality for your customers. You'll may even start to feel like SharePoint is finally making your job easier.

Best,
Tyler

4 comments:

Ali Raza said...

i have one document library with some of the custom columns which are custom list look ups, that contains hundreds of documents stored with these properties, but these properties are not appearing in crawled properties, i have tried to run full crawler so many times but it is not appearing. i have also set one scope which is document library url, it crawl the contents means documents but it does not crawl, crawled properties, can you please help me in this i am pissed off due to this issue i have to fix this.

Thanks in advance

certxg said...

Good stuff.

Also, you mentioned that:
"When the crawler opens up the document, it finds a plethora of information about the file; the file extension, who authored it, when it was created, and the actual text in the document itself."

I was trying to verify that the actual text in any of the crawled properties, but was not able to. Could you shed more light on how we would be able to get the actual text in the crawled properties?

I went down to its SQL database but wasn't able to find which table those actual texts are stored after the crawl.

Thanks,

certxg

Tyler Holmes said...

Hey certxg,

I've never actually gone on the treasure hunt to find the actual text pulled from some document. If it did exist anywhere it would probably be in the index, not in the search database.

Remember that the search database is a lot more likely to hold a reverse B-tree to help users quickly associate a search expression with a URL that contains related content. There's not a lot of value (that I can think of) in keeping the entire document there.

Happy Hunting.

Best,
Tyler

Tep Vuthy said...

I have questions:

When I search for my crawl property by selection "All Category", I always see something like "ows_MyProperty" and "MyProperty". And I find that "ows_MyProperty" is from "SharePoint" category, where "MyPrpertyp" is from "Office" category.

My problem is that, in my resulting datatable I often, have multi values in every row.

When, I map it to only "ows_MyProperty", there is no muli values return in records of resulting datatable.

So my question is how do we know which property exactly should be used?
What is the different between "SharePoint" and "Office" category?