BotSeerBeta | ||||||||||||
|
The "Crawler Search" searches more than 15,000 web crawlers and 2,000 User-agent strings. The result page of crawler search shows the var
iations of crawler names described by webmasters and the bias toward each name as well as the Geographical distribution of the IP address
es of the User-Agent strings that contains the crawler name. e.g. : http://botseer.ist.psu.edu/botsearch.jsp?query=googlebot The "robots.txt" search field searches the robots.txt files harvested by BotSeer from the Web and supports fielded search. A user can specify which field of robots.txt they are looking for by adding search modifiers such as "botname", "disallow", "allow", "comments" and "site". "botname" searches the User-Agent field in robots.txt files. "disallow", "allow" and "comments" search the corresponding field in robots.txt files. "site" searches the URL where the robots.txt files are located. The system will search for "botname", "comments" and "site" together if no search modifier is indicated. By default, the results will be ranked by the PageRank score of the website's homepage of a robots.txt file. The documents with the same PageRank scores will be then ranked by the tfidf score computed by Lucene. A user can also choose to rank the results by the bias statistics of the robot when they are searching for a named robot. The "SourceCode" search field searches the source codes and documentations of open source crawlers harvested from the web. The results page shows the abstract of a document, the link to the cached document, and the link to the original open source project homepage. The "Google" search directs a user to the Google search page of a query with the modifier "inurl:robots.txt filetype:txt" which searches robots.txt files in Google's database. This is done as a comparison to BotSeer's "robots.txt" search. BotSeer supports the following query modifiers in robots.txt search:
For more information on robots.txt see: http://robotstxt.org |
Statistics | Help | About | Feedback | Press | |
| © 2007-2008 BotSeer | |
Hosted by Penn State's College of Information Sciences and Technology | |
![]() | |