Author Topic: How was this search engine created? (Read 1456 times)

sdweim85 · « **on:** April 27, 2013, 08:36:07 pm »

I don't need to know exactly how this search was made, but what tools were used so I can look up tutorials and learn how to emulate this. If I had to guess it'd be a search query from a database like SQL, but it searches OCRed (text searchable) PDFs. How did the search find the OCRed text in the PDF files for the database. The search itself doesn't find exactly where the word in question is on the page, but it just finds the page.

Also what does he use in order to go from page to page. I'd have to assume its just a script that opens the next single page PDF in the folder. When I make my websites I just have the user open up the entire 50-100 page pdf, which takes forever. This way is so much more efficient having the PDFs broken down into single pages. So it is organized like so.

Union reporter newspaper
2009
12-21-2009
12-21-2009-Page1.pdf
12-21-2009-Page2.pdf
.....etc

http://www.thejointlibrary.org/archives/search.html

lilshawn · « **Reply #1 on:** May 01, 2013, 11:09:21 am »

"Your Search for taco did not match any documents"

"Your Search for <graphic expletive for persons of minority> did not match any documents"

I'd have to say this search engine doesn't search much. $:-\$

"Your Search for n did not match any documents"

Well, it doesn't search for partial strings either.

It's not hard to have a program search for a string. PDF's that have ACTUAL TEXT in them have the text stored as, well, text. it's not an image or anything. it's the same as an RTF document, the text has formatting code around it to display it in a particular size, position, font, etc.

therefore searching text is easy. once you decode the text formatting, you can tell where in the PDF it is.

as for THESE PDF's they are still images, the text (i would imagine) is manually entered (or OCR'd) into a database inside the PDF where the search happens.

if you really want to get into the nitty gritty, contact the people who run the website and ask them. I'm sure there is an administrator somewhere that set this all up. (help AT theJointLibrary DOT org)

sdweim85 · « **Reply #2 on:** May 01, 2013, 12:28:48 pm »

lol I know the guy who makes the websites, he is contracted by our company. I make similar websites such as http://www.digifind-it.com/easthanover/home.php

That have a MUCH better search.

My solution is to just use the same search I've been using, but break down the PDFs into single page OCRed PDFs. It'll do the same thing, but will be faster for users to load them in. Unless there's some sort of script that can take a multipage PDF and be viewed one page at a time.


Main	Restorations	Software	Audio/Jukebox/MP3	Everything Else	Buy/Sell/Trade
Project Announcements	Monitor/Video	GroovyMAME	Merit/JVL Touchscreen	Meet Up	Retail Vendors
Driving & Racing	Woodworking	Software Support Forums	Consoles	Project Arcade	Reviews
Automated Projects	Artwork	Frontend Support Forums	Pinball	Forum Discussion	Old Boards
Raspberry Pi & Dev Board	controls.dat	Linux	Miscellaneous Arcade	Wiki Discussion	Old Archives
Lightguns	Arcade1Up	Try the site in https mode		Site News


Unread posts \| New Replies \| Recent posts \| Rules \| Chatroom \| Wiki \| File Repository \| RSS \| Submit news

Author Topic: How was this search engine created? (Read 1456 times)

sdweim85

How was this search engine created?

lilshawn

Re: How was this search engine created?

sdweim85

Re: How was this search engine created?