In my last blog post, Setting up Full-Text Search for PDF files, I detailed how to get things setup. If you tried this you may have noticed that although the searches worked, what you got back was a file name. This isn’t so helpful if your document is an all encompassing 538 pages. So, how do we get a page number back? The best I’ve come up with so far is to split the 538 pages into 538 documents and load / search on those.
My first google search on how to split a pdf into pages came back with, http://www.splitpdf.com/, so I went ahead and used that. I’m sure there is a way to do this through acrobat or even roll your own split functionality via the API.
It’s not a particularly pretty solution, but it does work, and that’s important.