User agent

From Wikipedia, the free encyclopedia.

A user agent is the client application used with a particular network protocol; the phrase is most commonly used in reference to those which access the World Wide Web. Web user agents range from web browsers to search engine crawlers ("spiders"), as well as screen readers and braille browsers used by people with disabilities.

When Internet users visit a web site, a text string is generally sent to identify the user agent to the server. This forms part of the HTTP request, prefixed with User-agent: or User-Agent: and typically includes information such as the application name, version, host operating system, and language. Bots, such as web crawlers, often also include a URL and/or e-mail address so that the webmaster can contact the operator of the bot.

The user-agent string is one of the criteria by which crawlers can be excluded from certain pages or parts of a website using the "robots exclusion standard" (robots.txt). This allows webmasters who feel that certain parts of their website should not be included in the data gathered by a particular crawler, or that a particular crawler is using up too much bandwidth, to request that crawler not to visit those pages.

User agent spoofing

At various points in its history, use of the Web has been dominated by one browser to the extent that many websites are designed to work with that particular browser, rather than according to standards from bodies such as the W3C and IETF. Such sites often include "browser sniffing" code, which alters the information sent out depending on the User-Agent string received. This can mean that less popular browsers are not sent complex content, even though they might be able to deal with it correctly, or in extreme cases refused all content. Thus various browsers "cloak" or "spoof" this string, in order to identify themselves as something else to such detection code; often, the browser's real identity is then included later in the string.

The earliest example of this is Internet Explorer's use of a User-Agent string beginning "Mozilla/<version> (compatible; MSIE <version>...", in order to receive content intended for Netscape Navigator, its main rival at the time of its development. It should be stressed that this is not a reference to the open-source Mozilla browser, which was developed much later, but to the original codename for Navigator, which was also the name of the Netscape company mascot. This format of User-Agent string has since been copied by other user agents, partly because Explorer, in turn, came to dominate.

More recently, with Internet Explorer becoming by far the dominant browser, rivals such as Opera and Safari implemented systems whereby the user could select a false User-Agent string to send, such as that of a recent version of Explorer. Some, like Safari duplicate the User-Agent string they are trying to spoof exactly. Others, like Opera, duplicate the User-Agent string but add the name of their own browser to the end. This, of course, leads to a string containing three names and versions: first, the user agent claims to be "Mozilla" (i.e. Netscape Navigator); then, "MSIE" (Internet Explorer); and finally, the actual browser, such as "Opera".

As of 2004, more websites are standards-compliant than at other times in the history of the web. However, out-dated JavaScript, which effectively locks out browsers other than Explorer or Navigator, is still in use - especially on smaller, non-corporate, websites. This is often blamed on use of voodoo programming, in the form of copying and pasting older code without actually understanding what effect this will have on the website.

Example user-agent strings

Browsers:

  • Internet Explorer 5.5 on Windows 2000: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)
  • Internet Explorer 6.0 in MSN on Windows 98: Mozilla /4.0 (compatible; MSIE 6.0; MSN 2.5; Windows 98)
  • Konqueror 3.1 (French): Mozilla/5.0 (compatible; Konqueror/3.1; Linux 2.4.22-10mdk; X11; i686; fr, fr_FR)
  • Mozilla 1.6 on Linux: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040113
  • Mozilla Firefox 1.0 on Windows XP: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041210 Firefox/1.0
  • Netscape 4.8 on Windows XP: Mozilla/4.8 [en] (Windows NT 5.0; U)
  • Netscape 7 on Sun Solaris 8: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.0.1) Gecko/20020920 Netscape/7.0
  • Opera 6.03 on Windows 2000, cloaked as MSIE: Mozilla/4.0 (compatible; MSIE 5.0; Windows 2000) Opera 6.03 [en]
  • Opera 7.23 on Windows 98: Opera/7.23 (Windows 98; U) [en]
  • Opera 8.00 on Windows XP: Opera/8.00 (Windows NT 5.1; U; en)
  • Safari v125 on Mac OS X: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/124 (KHTML, like Gecko) Safari/125
  • Safari v125 on Mac OS X, cloaked as MSIE: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2)

Bots:

  • Crawler for Ask Jeeves/Teoma: Mozilla/2.0 (compatible; Ask Jeeves/Teoma)
  • Googlebot: Googlebot/2.1 (+http://www.google.com/bot.html)
  • Grub: Mozilla/4.0 (compatible; grub-client-1.4.3; Crawl your own stuff with http://grub.org)
  • MSN bot: msnbot/0.11 (+http://search.msn.com/msnbot.htm)
  • wget: Wget/1.9
  • Yahoo! Slurp: Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

External links


Personal tools
In other languages