Main Restorations Software Audio/Jukebox/MP3 Everything Else Buy/Sell/Trade
Project Announcements Monitor/Video GroovyMAME Merit/JVL Touchscreen Meet Up Retail Vendors
Driving & Racing Woodworking Software Support Forums Consoles Project Arcade Reviews
Automated Projects Artwork Frontend Support Forums Pinball Forum Discussion Old Boards
Raspberry Pi & Dev Board controls.dat Linux Miscellaneous Arcade Wiki Discussion Old Archives
Lightguns Arcade1Up Try the site in https mode Site News

Unread posts | New Replies | Recent posts | Rules | Chatroom | Wiki | File Repository | RSS | Submit news

  

Author Topic: How was this search engine created?  (Read 1456 times)

0 Members and 1 Guest are viewing this topic.

sdweim85

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 98
  • Last login:September 16, 2014, 09:30:03 am
How was this search engine created?
« on: April 27, 2013, 08:36:07 pm »
I don't need to know exactly how this search was made, but what tools were used so I can look up tutorials and learn how to emulate this.  If I had to guess it'd be a search query from a database like SQL, but it searches OCRed (text searchable) PDFs.  How did the search find the OCRed text in the PDF files for the database.  The search itself doesn't find exactly where the word in question is on the page, but it just finds the page.

Also what does he use in order to go from page to page.  I'd have to assume its just a script that opens the next single page PDF in the folder.  When I make my websites I just have the user open up the entire 50-100 page pdf, which takes forever.  This way is so much more efficient having the PDFs broken down into single pages.  So it is organized like so.

Union reporter newspaper
    2009
          12-21-2009
                 12-21-2009-Page1.pdf
                 12-21-2009-Page2.pdf
                  .....etc

http://www.thejointlibrary.org/archives/search.html

lilshawn

  • Trade Count: (+3)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 7513
  • Last login:Yesterday at 04:01:19 pm
  • I break stuff...then fix it...sometimes
Re: How was this search engine created?
« Reply #1 on: May 01, 2013, 11:09:21 am »
"Your Search for taco did not match any documents"

"Your Search for <graphic expletive for persons of minority> did not match any documents"


I'd have to say this search engine doesn't search much.  :-\


"Your Search for n did not match any documents"

Well, it doesn't search for partial strings either.

It's not hard to have a program search for a string. PDF's that have ACTUAL TEXT in them have the text stored as, well, text. it's not an image or anything. it's the same as an RTF document, the text has formatting code around it to display it in a particular size, position, font, etc.

therefore searching text is easy. once you decode the text formatting, you can tell where in the PDF it is.

as for THESE PDF's they are still images, the text (i would imagine) is manually entered (or OCR'd) into a database inside the PDF where the search happens.

if you really want to get into the nitty gritty, contact the people who run the website and ask them. I'm sure there is an administrator somewhere that set this all up. (help AT theJointLibrary DOT org)

sdweim85

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 98
  • Last login:September 16, 2014, 09:30:03 am
Re: How was this search engine created?
« Reply #2 on: May 01, 2013, 12:28:48 pm »
lol I know the guy who makes the websites, he is contracted by our company.  I make similar websites such as http://www.digifind-it.com/easthanover/home.php

That have a MUCH better search.

My solution is to just use the same search I've been using, but break down the PDFs into single page OCRed PDFs.  It'll do the same thing, but will be faster for users to load them in.  Unless there's some sort of script that can take a multipage PDF and be viewed one page at a time.