Sunday, April 15, 2012

The Search Engine

The search engine is complete now. Although I have to improve it, so that the search responds to multi-word queries. For now, it responds to only single word queries. I made the code available in  https://github.com/dileep98490/A-simple-Search-Engine-in-Python along with a README file. Till the previous post, I have been using a custom get_page(url) module, which returns the source of specific webpages. I changed the get_page(url), so that it returns the source of any url you pass into it as input. Below is the modified get_page(url)

import urllib

def get_page(url):#This function is just to return the webpage contents; the source of the webpage when a url is given.
 try:
  f = urllib.urlopen(url)
  page = f.read()
  f.close()
  #print page
  return page
 except: 
  return ""
 return ""

Also, I have created a input taking mechanism such that any page can be given as seed page. Also, the query is given as input, along with the maximum links to be checked as depth. The output is as shown below (the full source code is available on git-hub repository I mentioned above).



Enter the seed page
http://opencvuser.blogspot.com
Enter What you want to search
is
Enter the depth you wanna go
5

Started crawling, presently at depth..
4
3
2
1
0

Printing the results as is with page rank

http://opencvuser.blogspot.com --> 0.05
https://skydrive.live.com/redir.aspx?cid=124ec5b5bc117437&resid=124EC5B5BC11
7437!161&parid=124EC5B5BC117437!103&authkey=!AMnmS6xJcSrXSyg --> 0.05142
85714286

After Sorting the results by page rank

1.      https://skydrive.live.com/redir.aspx?cid=124ec5b5bc117437&resid=124E
C5B5BC117437!161&parid=124EC5B5BC117437!103&authkey=!AMnmS6xJcSrXSyg

2.      http://opencvuser.blogspot.com
The two results were sorted later as per their page rank. I have taken the depth as 5, so the program crawled 5 links and out of them 2 contain our keyword "is"

2 comments:

  1. hello Kumar.
    I ran this code upto the post previous to this one.
    But now when i pass my url to this final code it doesn't work properly. Need your help.
    Thank you

    ReplyDelete
  2. hey, i tried executing your code, but after showing depths, the console screen vanishes. it does not show any output. can you help me out?
    thank you.
    my email: jay.rakshe@hotmail.com

    ReplyDelete