Full-Text PDF Search

Jon Morisi shows how to use Full-Text Search to read PDF files:

Faced with this very issue, I decided to setup a local SQL Server Full-Text Search.
Some of the cool things Full-Text Search will give you, over and above, a standard search include the following:

  • One or more specific words or phrases (simple term)
  • A word or a phrase where the words begin with specified text (prefix term)
  • Inflectional forms of a specific word (generation term)
  • A word or phrase close to another word or phrase (proximity term)
  • Synonymous forms of a specific word (thesaurus)
  • Words or phrases using weighted values (weighted term)
In order to get stared with the setup, it’s important to know that the Full-Text Search architecture relies on filters for searching various file types.  This is important for this example because the PDF filter is not installed by default.  So, for starters, we need to go download and install the PDF ifilter(PDFFilter64Setup.msi).

Up until I read this blog post, I had no idea that full-text search could index PDFs, so that’s very interesting.

Related Posts

Tracking Deployment Details

Andy Leonard tells a story whose moral is that you need to keep track of what you deploy: But this had to be done. Right now. I thanked Geoff and hung up the phone. I then made another judgment call and exercised yet more of my ETL Architect authority. I assigned the PrUAT ticket to myself, logged […]

Read More

The Non-Blocking Segment Operator

Hugo Kornelius notes a documentation bug with the Segment operator: The Segment operator, like all operators, is described at the Books Online page mentioned above. Here is the description, quoted verbatim: Segment is a physical and a logical operator. It divides the input set into segments based on the value of one or more columns. These […]

Read More

Categories

June 2016
MTWTFSS
« May Jul »
 12345
6789101112
13141516171819
20212223242526
27282930