Thursday, April 10, 2008

Walkthrough: Installing Adobe (v6) PDF iFilter for SharePoint 2007 (Moss/WSS)

Purpose

Out of the box SharePoint will index many types of content. This includes a lot of popular file formats (.ppt, .docx, .doc, .xlsx, .xls, etc...). A list of these file types can be found here. You'll notice that a pretty popular file type (.pdf) is NOT in this list. For the SharePoint search to be able to index .pdfs you need to install a PDF IFilter (Index Filter) that will help the indexing service process PDF files and when building a search index. There are a couple PDF IFilters available from a series of different vendors, but the most popular is probably the one from Adobe. You can download the Adobe PDF IFilter here.

Walkthrough

  1. Stop the IIS Admin Service: Start->Run->Services.msc->Locate the IIS Admin Service and stop it.
  2. Download the Adobe PDF IFilter and install it on your indexing server.
  3. Install this GIF (pdficon) or any icon of your choosing to "C:\Program Files\Common Files\Microsoft Shared\web server extensions\12\TEMPLATE\IMAGES"
  4. Edit the DocIcon.xml file at "C:\Program Files\Common Files\Microsoft Shared\Web server extensions\12\Template\Xml\DOCICON.XML" and add the following text under the <ByExtention> tag.
    <Mapping Key="pdf" Value="pdficon.gif"/>
  5. Recycle the application pool of the Shared Service Provider OR do an IISRESET if you're lazy.
  6. Open the SSP Admin site (Central Administration->SharedServices1). Click on Search Settings->File Types->New File Type) and add a pdf file type.
  7. Perform a full crawl on content sources. Search Settings->Content Sources and Crawl Schedules, click on the Content Source you want to perform a full crawl on. Check the Start Full Crawl check box at the bottom and then click OK. Wait for the crawl to finish.Starting full crawl after installing PDF IFilter

You're done! PDF content should start showing up in searches now!Search results that find content in a PDF using IFilter.

Troubleshooting

There's been a couple of machines I've done this one where I've had to manually register the PDF IFilter dll (regsvr32 "C:\Program Files\Adobe\PDF IFilter 6.0\PDFFILT.dll") and then recycle the SSP site before the icon would show up. Usually in these cases I also had to do another full crawl after manually registering the PDFFILT.dll. It's probably a good idea to define a new content source with a small set of content (a couple of small pdfs in a document library) for testing.

I also sometimes get errors of the form when indexing large PDFs:

The file reached the maximum download limit. Check that the full text of the document can be meaningfully crawled.

This is because the maximum document size by default is 16 MB. You can increase it with instructions found here.

Hope that helps.

Best,
Tyler

8 comments:

Sanket said...

Hey,

After doing installation of Adobe iFilter 6.0, still I am getting error in outlook message file which has PDF embeded into it.

error : The filtering process has been terminated.

I already tried with Foxit Filter, but with Foxit filter, I am getting more erros into my Crawl log.

I have 32 bit MOSS 2007.

Do you have any idea why I am getting error into message file after installing Adobe iFilter also.

Thanks,
Sanket

Tyler Holmes said...

Are normal PDFs being indexed? I'd start by creating some smaller test content sources that involve PDFs and ensuring that you can crawl those first. After that I'd slowly start adding other types of PDF content (like ZIPs with embedded PDFs and emails with embedded PDFs). Hope that helps!

Best,
Tyler

Sanket said...

Thanks for the reply.

I already tried to include only PDF into one of the new content source, but also it's not indexed and giving me error saying "The filtering process has been terminated".

I installed Adobe iFiler and Foxit Filter.

Thanks,
Sanket

Tyler Holmes said...

Hey Sanket,

If you're still getting the problem while indexing a single content source that's say a fileshare with a single .pdf in it, then the problem is well isolated.

I would check the event viewer after the exception is thrown to get a better idea of the error.

Best,
Tyler

theCascadian said...

Tyler,

I installed the adobe ifilter on my wss 3.0/ SSE 2008 server instance and all appears to be working well; see the pdf icaons and all, but it doesn't appear to be crawling the contents of the pdfs, just the doc properties. IN a prior dev instance with the setup, I ws able to get it to crawl and return results fro the pdf contents; any tips, red flags?

THanks!

Tim said...

Thanks for your posting, it led me in the right direction. I dug a little deeper and found the documentation that got installed along with the x-64 iFilter. At the bottom of the documentation, there were three links to Adobe PDFs. The one that worked for me was at this URL http://www.adobe.com/special/acrobat/configuring_pdf_ifilter_for_ms_sharepoint_2007.pdf

praveen said...
This comment has been removed by the author.
praveen said...

Tyler


I installed the Adobe v 6.0 I-Filter (PDF) success fully . We are able search the PDF documents but not able search the content in the PDF doucments . we are working on this issue long time on wards , please help me on this