The use of hash tables improves the speed of search engine drastically. Python has inbuilt dictionaries for this purpose. The index can be any string or character or a number. So, we can store our keywords as indexes and the list of the urls in which it is present as value in the dictionary. The below is the modified code section from my previous post. I have commented where ever necessary. The comments for this post start with 'Hash'.
The output is
def Look_up(index,keyword):#This function is for given an index, it finds the keyword in the index and returns the list of links #f=[] if keyword in index:#Hash:Direct lookup, no need to iterate return index[keyword] return [] #The format of element in the index is <keyword>,[<List of urls that contain the keyword>] def add_to_index(index,url,keyword): if keyword in index: if url not in index[keyword]:#Hash:To get rid of redundant urls index[keyword].append(url) return index[keyword]=[url]#Hash:A new hash entry def add_page_to_index(index,url,content):#Adding the content of the webpage to the index for i in content.split(): add_to_index(index,url,i) def Crawl_web(seed):#The website to act as seed page is given as input tocrawl=[seed] crawled=[] index={}#Hash:Dictionary initialization while tocrawl: p=tocrawl.pop() if p not in crawled:#To remove the looping, if a page is already crawled and it is backlinked again by someother link we are crawling, we need not crawl it again c=get_page(p) add_page_to_index(index,p,c) union(tocrawl,get_all_links(c)) crawled.append(p)#As soon as a link is crawled it is appended to crawled. In the end when all the links are over, we will return the crawled since it contains all the links we have so far return crawled,index #Returns the list of links crawled,index=Crawl_web('http://xkcd.com/353')#printing all the links #print index print Look_up(index,"is")##Searching for the keyword "is"
The output is
http://xkcd.com/353
http://xkcd.com/554
Excellent info on hash tables. Thanks admin for the wonderful post.
ReplyDeleteGerman Classes in Chennai
Big Data Training in Chennai
Android Training in Chennai
Selenium Training in Chennai
Digital Marketing Training in Chennai
Hadoop Training in Chennai
Hadoop Training in Velachery
Informative post indeed, I’ve being in and out reading posts regularly and I see alot of engaging people sharing things and majority of the shared information is very valuable and so, here’s my fine read.
ReplyDeleteclick here get fox news app
"Your blog is great
ReplyDeleteDigital Marketing Training Course in Chennai | Digital Marketing Training Course in Anna Nagar | Digital Marketing Training Course in OMR | Digital Marketing Training Course in Porur | Digital Marketing Training Course in Tambaram | Digital Marketing Training Course in Velachery
"
"Your blog is great
ReplyDeleteDigital Marketing Training Course in Chennai | Digital Marketing Training Course in Anna Nagar | Digital Marketing Training Course in OMR | Digital Marketing Training Course in Porur | Digital Marketing Training Course in Tambaram | Digital Marketing Training Course in Velachery
"