BotSeerBeta


The "Crawler Search" searches more than 15,000 web crawlers and 2,000 User-agent strings. The result page of crawler search shows the var iations of crawler names described by webmasters and the bias toward each name as well as the Geographical distribution of the IP address es of the User-Agent strings that contains the crawler name.
e.g. : http://botseer.ist.psu.edu/botsearch.jsp?query=googlebot

The "robots.txt" search field searches the robots.txt files harvested by BotSeer from the Web and supports fielded search. A user can specify which field of robots.txt they are looking for by adding search modifiers such as "botname", "disallow", "allow", "comments" and "site". "botname" searches the User-Agent field in robots.txt files. "disallow", "allow" and "comments" search the corresponding field in robots.txt files. "site" searches the URL where the robots.txt files are located. The system will search for "botname", "comments" and "site" together if no search modifier is indicated. By default, the results will be ranked by the PageRank score of the website's homepage of a robots.txt file. The documents with the same PageRank scores will be then ranked by the tfidf score computed by Lucene. A user can also choose to rank the results by the bias statistics of the robot when they are searching for a named robot.

The "SourceCode" search field searches the source codes and documentations of open source crawlers harvested from the web. The results page shows the abstract of a document, the link to the cached document, and the link to the original open source project homepage.

The "Google" search directs a user to the Google search page of a query with the modifier "inurl:robots.txt filetype:txt" which searches robots.txt files in Google's database. This is done as a comparison to BotSeer's "robots.txt" search.

BotSeer supports the following query modifiers in robots.txt search:
Searching a specific robot
Users can search for all the websites that mentioned the robot name by using the modifier botname:keyword.

Example query: botname:googlebot


Site modifier
The modifier site:keyword limits the search to a specific subdatabase of BotSeer. The site modifier provides the possibility to perform search in a specific domain (site:gov limits the search in government websites).

Example query: site:gov


Checking the favorability of a robot
Specifying the modifier favor:robot, BotSeer will return all the websites that favor the robot. disfavor:robot has similar functionality.

Example query:

  • favor:googlebot
  • disfavor:googlebot

  • Searching specific rules
    The modifier disallow:directory searches all the websites that disallow the specified directory. allow:directory has the same functionality.

    Example query:

  • allow:mail
  • disfavor:mail

  • Searching special field in robots.txt files
    The modifier contain:keyword includes two keywords, "delay" and "universal". BotSeer returns all websites that use Crawl-Delay rules and has User-Agent: * rules with the query contain:delay and contain:universal correspondingly.

    Combining the query modifiers
    The query modifiers can be combined to perform advanced search.

    Example query: site:com favor:googlebot contain:delay
    This query searches all .com websites that favor googlebot and use Crawl-Delay rules.



    For more information on robots.txt see:

    http://robotstxt.org



    Statistics | Help | About  | Feedback  | Press 
    © 2007-2008 BotSeer 


    Hosted by Penn State's College of Information Sciences and Technology
     
    Search powered by Lucene