PDF Search With Page Numbers

Kevin Feasel

2016-06-28

Search

Jon Morisi has a solution for how to get page numbers for results back from PDFs when using Full-Text Search:

In my last blog post, Setting up Full-Text Search for PDF files, I detailed how to get things setup.  If you tried this you may have noticed that although the searches worked, what you got back was a file name.  This isn’t so helpful if your document is an all encompassing 538 pages.  So, how do we get a page number back?  The best I’ve come up with so far is to split the 538 pages into 538 documents and load / search on those.

My first google search on how to split a pdf into pages came back with, http://www.splitpdf.com/, so I went ahead and used that.  I’m sure there is a way to do this through acrobat or even roll your own split functionality via the API.

It’s not a particularly pretty solution, but it does work, and that’s important.

Related Posts

The Decline(?) Of Google Search

Kevin Feasel

2017-09-13

Search

Vincent Granville argues that Google search is on a slow decline: What has happened over the last few years is that many websites are now getting most of their traffic from sources other than Google. Google is no longer the main source of traffic for most websites, because webmasters pursue other avenues to generate relevant […]

Read More

Trigram Search In SQL Server

Paul White shows how to implement trigram wildcard searches in SQL Server: The basic idea of a trigram search is quite simple: Persist three-character substrings (trigrams) of the target data. Split the search term(s) into trigrams. Match search trigrams against the stored trigrams (equality search) Intersect the qualified rows to find strings that match all […]

Read More

Categories

June 2016
MTWTFSS
« May Jul »
 12345
6789101112
13141516171819
20212223242526
27282930