Out of the box SharePoint will index many types of content. This includes a lot of popular file formats (.ppt, .docx, .doc, .xlsx, .xls, etc...). A list of these file types can be found here. You'll notice that a pretty popular file type (.pdf) is NOT in this list. For the SharePoint search to be able to index .pdfs you need to install a PDF IFilter (Index Filter) that will help the indexing service process PDF files and when building a search index. There are a couple PDF IFilters available from a series of different vendors, but the most popular is probably the one from Adobe. You can download the Adobe PDF IFilter here.
- Stop the IIS Admin Service: Start->Run->Services.msc->Locate the IIS Admin Service and stop it.
- Download the Adobe PDF IFilter and install it on your indexing server.
- Install this GIF () or any icon of your choosing to "C:\Program Files\Common Files\Microsoft Shared\web server extensions\12\TEMPLATE\IMAGES"
- Edit the DocIcon.xml file at "C:\Program Files\Common Files\Microsoft Shared\Web server extensions\12\Template\Xml\DOCICON.XML" and add the following text under the <ByExtention> tag. <Mapping Key="pdf" Value="pdficon.gif"/>
- Recycle the application pool of the Shared Service Provider OR do an IISRESET if you're lazy.
- Open the SSP Admin site (Central Administration->SharedServices1). Click on Search Settings->File Types->New File Type) and add a pdf file type.
- Perform a full crawl on content sources. Search Settings->Content Sources and Crawl Schedules, click on the Content Source you want to perform a full crawl on. Check the Start Full Crawl check box at the bottom and then click OK. Wait for the crawl to finish.
There's been a couple of machines I've done this one where I've had to manually register the PDF IFilter dll (regsvr32 "C:\Program Files\Adobe\PDF IFilter 6.0\PDFFILT.dll") and then recycle the SSP site before the icon would show up. Usually in these cases I also had to do another full crawl after manually registering the PDFFILT.dll. It's probably a good idea to define a new content source with a small set of content (a couple of small pdfs in a document library) for testing.
I also sometimes get errors of the form when indexing large PDFs:
This is because the maximum document size by default is 16 MB. You can increase it with instructions found here.
Hope that helps.