Saturday, April 16, 2011

SharePoint 2007: Adding PDF Support Step-by-Step

Introduction

SharePoint Server 2007 does not by default support PDF content searching [1].  If you want to be able to search through PDF files saved to a document folder, you need to install and configure Adobe PDF support.  This procedure walks you through that process, step-by-step.  This walkthrough was performed on Microsoft SharePoint Server 2007 Enterprise, hosted on Windows Server 2003 Enterprise Edition, and follows similar procedures discussed elsewhere [9, 10, 23], but with more thoroughly descriptive and referenced steps.  It proceeds in six steps: 1) install the Adobe PDF IFilter v6.0; 2) add an Adobe extension registry entry; 3) add the Adobe icon to Windows SharePoint Services; 4) add the Adobe PDF extension to the list of Managed File Types; 5) restart search services; and 6) verify PDF content is indexed.  All references used in this walkthrough are listed in the References section, below.  Good luck!

Procedure

Step 1: Install the Adobe PDF IFilter v6.0

Download the Adobe PDF IFilter v6.0 [2].  Follow the instructions.  'Best to choose default installation directory.  Once completed, you'll find a new program folder.


The installed IFilter is named PDFFILT.dll.


Step 2: Add an Adobe PDF Extension Registry Entry

Next step is to add a registry entry for the Adobe extension if it doesn't exist [1].  Open the registry editor.  Make a full registry backup.  Then navigate to the following registry setting:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Applications\GUID\Gather\Search\Extensions\ExtensionList
Check for an entry for pdf.  If you can't find it, you'll need to create it: check the list of extensions for the largest name - this may likely be 37;  right-click on ExtensionList; from the popup menu, point to New, and then select String Value; for the Name, enter a value equal to the largest current name + 1 (likely 38); for the String Value, enter pdf; then click OK.


Check to make sure that you have the following registry keys.  These should have been configured during the installation of Adobe PDF IFilter v6.0 [1]:
  • HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf
    • Name: Default
    • Type: REG_MULTI_SZ
    • Data: {8315BA54-B69F-4275-AE11-31CB6359EB09}
  • HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Setup\Filters\.pdf
    • Name: Default
      • Type: REG_SZ
      • Data: (value not set)
    • Name: Extension
      • Type: REG_SZ
      • Data: pdf
    • Name: FileTypeBucket
      • Type: REG_DWORD
      • Data: 0x00000001 (1)
    • Name: MimeTypes
      • Type: REG_SZ
      • Data: application/pdf
You should see the following:


If these registry keys don't exist, rerun the Adobe PDF IFilter v6.0 installation.

Step 3: Add the Adobe Icon to Windows SharePoint Services

Next step is to add the Adobe icon to Windows SharePoint Services [14-16].  Download the PDF icon [3].  Download the small 17x17 pixel version. Save the icon file to the following directory on your MOSS 2007:
C:\Program Files\Common Files\Microsoft Shared\web server extensions\12\TEMPLATE\IMAGES
Navigate to the following directory:
C:\Program Files\Common Files\Microsoft Shared\web server extensions\12\TEMPLATE\XML
Look for the DOCICON.XML file in this directory.


Open this file, and then edit it by adding an entry for pdf.


Restart Internet Information Services.  This causes the docicon.xml file to be reread by the SharePoint Server service.

Step 4: Add the Adobe PDF extension to the list of Managed File Types

The next step is to add the Adobe PDF extension to the list of managed file types so that PDF file properties and content are searched and indexed [17-19].  Navigate to the Search Administration page.  On the left, under Crawling, look for File types link.


Click this link.  This takes you to the Manage File Types page.  Click New File Type at the top of the list.


Enter PDF, and then click OK.  You'll see a new file type added to the list.


Step 5: Restart Search Services

Next, stop and start the Windows SharePoint Services Search Service and the Office SharePoint Server Search Service.


Logon to the Shared Services provider associated with these target search services.  Perform full crawls of all content sources.

Step 6: Verify PDF Content is Indexed

Now, check to make sure that the contents of PDFs are searchable and indexable.  You can do this by checking the crawl logs.  First, go to the Shared Services page, and then go to the Crawl Log page.


Then, click on one of the host name links to view its crawl log.  Crawled PDFs should present without any warnings or errors or additional cautionary statements.


Now, perform a simple keyword search for a term that you know is in one or more of the PDF documents currently uploaded to your SharePoint Server instance. 


This completes this step-by-step walkthrough of installing and configuring Adobe PDF support to your SharePoint Server 2007 search service.  The next section discusses a few lessons-learned.  Happy computing!

Lessons-learned

Adobe Acrobat Reader X

This version of the Adobe Acrobat Reader does not include an iFilter dynamic link library.  Attempts to add PDF support to SharePoint Server 2007/2010 instances will fail when using Adobe Acrobat Reader X [4, 5].  Don't waste your time with it.

Adobe Acrobat Reader 9.X

The online literature indicates that the Adobe iFilter has been bundled with Adobe Acrobat Reader, since version 7.0.5, and that you only need to install the bundle on your SharePoint Server instance in order to realize PDF support.  However, my own efforts to use version 9.4 (the last version prior to 10), in order to implement PDF support in SharePoint Server, met with failure.   Others have also experienced some difficulties getting the Adobe Acrobat Reader bundled ifilter to correctly be implemented [20-25].

Others have had success using these versions [7, 8, 9].  I have followed the steps described by others [7-9], but without success.  I also checked my SharePoint Server configuration against the appropriate Microsoft Knowledgbase articles [1, 11], but still without success.  Though Search recognized that there were PDFs and noted them in the crawl log, it could not index their contents.


I continued to review the online literature looking for clues.  One source provided a rather complex procedure for enabling the bundled ifilter to work, requiring additional registry entry changes [26].  Given these ambiguous results, and lacking the time to perform further research on this issue, I decided to press ahead with an approach that had achieved successful results in the past, and this is the procedure presented above, in this walkthrough.

I uninstalled Adobe Reader 9.4 from my SharePoint Server 2007 instance, and then performed a fresh installation of Adobe PDF IFilter v6.0.  After working through all of the various ancillary configuration steps, I was finally able to have my SharePoint Server 2007 instance recognize and index PDF content.

At this point, I don't know where the fault may lie.  'Possibly, I missed something somewhere. In any case, I have found an approach that works and that works consistently over the course of multiple installations.  I have documented my efforts here in order to help others who may encounter similar issues.

References
  1. No Adobe PDF documents are returned in the search results when you search a Windows SharePoint Services 3.0 Web site, KB927675, Microsoft Support, 5/14/2007
  2. Adobe PDF IFilter v6.0, Adobe Support, Downloads, Acrobat
  3. Use of Adobe icons and web logos, Permissions and trademark guidelines, About Adobe
  4. Do shell extensions work in Reader X?, General Questions, Protected Mode FAQ, Adobe Developer Connection
  5. Adobe Reader X problem - no search contents filter
  6. FTP directory /pub/adobe/reader/win/9.x/9.4.0/en_US at ftpadobecom
  7. Adobe Reader 9 Available – Works Fine with SharePoint, Derek Goodridge, Worker Thread Blog, 7/18/2008
  8. And now for something completely different-- Searching PDFs, or Using Adobe's PDF IFilter with WSS 3.0 sp1, ServerGrrl, January 5, 2008
  9. Configuring MOSS 2007 to search pdf documents - install and configure pdf ifilters, Musings on SharePoint 2010, 2/6/2008
  10. Walkthrough: Installing Adobe (v6) PDF iFilter for SharePoint 2007 (Moss/WSS), Tyler Holmes, System.What, 4/10/2008
  11. Adobe Reader files cannot be found after you add the .pdf file type to the list of crawled file types in SharePoint Server 2007, KB928619, Microsoft Support, 5/14/2007
  12. SharePoint 2007 and Adobe PDF, Joining Dots, The Old Joining Dots Blog, 5/9/2007
  13. Adobe Reader 9.4.0
  14. How to add an icon to Windows SharePoint Services to represent Adobe PDF documents that are stored in document libraries, KB837849, Microsoft Support, 11/30/2007
  15. DOCICON.XML, Microsoft TechNet Library
  16. Understanding DocIcon.xml, Microsoft TechNet Library
  17. Manage file types (Office SharePoint Server), Microsoft TechNet Library
  18. File types and IFilter reference (Office SharePoint Server), Microsoft TechNet Library, 9/11/2008
  19. About IFilters (Office SharePoint Server 2007), Microsoft TechNet Library, 4/16/2009
  20. PDF files not getting crawled, Microsoft TechNet Social Formums, Enterprise Search, 11/19/2009
  21. PDF files are not getting crawled, Microsoft MSDN Social Formums, 2009
  22. MOSS 2007, 32 bit - Not all PDF files indexed, FoxITSoftware, 6/12/2008
  23. Indexing and Searching PDFs in MOSS 2007, Aidan Garnish blog, 9/19/2007
  24. Search server is not indexing the content of pdf files, Microsoft MSDN Social Forums, Search Server Installation, Configuration and Administration, 2/12/2009
  25. SharePoint 2007 and PDF indexing, Steven Van de Craen's Blog, 9/19/2007
  26. Index PDF documents on SharePoint using Adobe PDF IFilter 9, Harold van de Kamp's Blog, 10/2/2008
Notes
  • The PDF iFilter installed by Adobe Acrobat 9.X is AcroRdIF.dll, and it is installed to directory C:\Program Files\Adobe\Reader 9.0\Reader by default.
  • Adobe Acrobat Reader X (10) does not contain an iFilter [4].
  • The Adobe PDF IFilter v6.0 comes with an excellent installation and troubleshooting guide that even covers SharePoint topics.  Look for its Readme.htm in the root folder, after you've installed it.

1 comment:

Anonymous said...

Al, thank you so much for your post. I went through so many of the same steps already and racking my brain trying to find out what wasn't working. I didn't see any mention anywhere about Acrobat X not installing the iFilter. So I uninstalled Acrobat X and installed the ifilter v6.0 and it is working now!!! What a life saver:)