Build Your Own Arcade Controls Forum
Main => Everything Else => Topic started by: sdweim85 on April 27, 2013, 08:36:07 pm
-
I don't need to know exactly how this search was made, but what tools were used so I can look up tutorials and learn how to emulate this. If I had to guess it'd be a search query from a database like SQL, but it searches OCRed (text searchable) PDFs. How did the search find the OCRed text in the PDF files for the database. The search itself doesn't find exactly where the word in question is on the page, but it just finds the page.
Also what does he use in order to go from page to page. I'd have to assume its just a script that opens the next single page PDF in the folder. When I make my websites I just have the user open up the entire 50-100 page pdf, which takes forever. This way is so much more efficient having the PDFs broken down into single pages. So it is organized like so.
Union reporter newspaper
2009
12-21-2009
12-21-2009-Page1.pdf
12-21-2009-Page2.pdf
.....etc
http://www.thejointlibrary.org/archives/search.html (http://www.thejointlibrary.org/archives/search.html)
-
"Your Search for taco did not match any documents"
"Your Search for <graphic expletive for persons of minority> did not match any documents"
I'd have to say this search engine doesn't search much. :-\
"Your Search for n did not match any documents"
Well, it doesn't search for partial strings either.
It's not hard to have a program search for a string. PDF's that have ACTUAL TEXT in them have the text stored as, well, text. it's not an image or anything. it's the same as an RTF document, the text has formatting code around it to display it in a particular size, position, font, etc.
therefore searching text is easy. once you decode the text formatting, you can tell where in the PDF it is.
as for THESE PDF's they are still images, the text (i would imagine) is manually entered (or OCR'd) into a database inside the PDF where the search happens.
if you really want to get into the nitty gritty, contact the people who run the website and ask them. I'm sure there is an administrator somewhere that set this all up. (help AT theJointLibrary DOT org)
-
lol I know the guy who makes the websites, he is contracted by our company. I make similar websites such as http://www.digifind-it.com/easthanover/home.php (http://www.digifind-it.com/easthanover/home.php)
That have a MUCH better search.
My solution is to just use the same search I've been using, but break down the PDFs into single page OCRed PDFs. It'll do the same thing, but will be faster for users to load them in. Unless there's some sort of script that can take a multipage PDF and be viewed one page at a time.