Your continued donations keep Wikipedia running!    

World Wide Web

From Wikipedia, the free encyclopedia

Jump to: navigation, search
The Web redirects here. For the 1942 comic book character, see The Web (comics).
For the world's first web browser, see WorldWideWeb.

The World Wide Web ("WWW" or simply the "Web") is a global information space which people can read-from and write-to via a large number of different Internet-connected devices. For example, computers, personal digital assistants, cellular phones, telephone kiosks, etc. The World Wide Web is also available (sometimes only partially) through digital television services, exposing content onto television screens.

The term is often mistakenly used as a synonym for the Internet itself, but the Web is actually a service that operates over the Internet, just like e-mail. WWW is the complete set of documents residing on all Internet servers that use the HTTP protocol, accessible to users via a simple point-and-click system.

Contents

Basic terms

The World Wide Web is the combination of four basic ideas:

  • Hypertext, that is the ability, in a computer environment, to move from one part of a document to another or from one document to another through internal connections among these documents (called "hyperlinks");
  • Resource Identifiers, that is the ability, on a computer network, to locate a particular resource (computer, document or other resource) on the network through a unique identifier;
  • The Client-server model of computing, in which client software or a client computer makes requests of server software or a server computer that provides the client with resources or services, such as data or files; and
  • Markup language, in which characters or codes embedded in text indicate to a computer how to print or display the text, e.g. as in italics or bold type or font.

On the World Wide Web, a client program called a web browser retrieves information resources, such as web pages and other computer files, from web servers using their network addresses and displays them, typically on a computer monitor, using a markup language that determines the details of the display. One can then follow hyperlinks in each page to other resources on the World Wide Web of information whose location is provided by these hyperlinks. It is also possible, for example by filling in and submitting web forms, to send information back to the server to interact with it. The act of following hyperlinks is often called "browsing" or "surfing" the Web. Web pages are often arranged in collections of related material called "websites."

The phrase "surfing the Internet" was first popularized in print by Jean Armour Polly, a librarian, in an article called Surfing the INTERNET, published in the Wilson Library Bulletin in June, 1992. Although Polly may have developed the phrase independently, slightly earlier uses of similar terms have been found on the Usenet from 1991 and 1992, and some recollections claim it was also used verbally in the hacker community for a couple years before that. Polly is famous as "NetMom" in the history of the Internet.

For more information on the distinction between the World Wide Web and the Internet itself — as in everyday use the two are sometimes confused — see Dark internet where this is discussed in more detail.

Although the English word worldwide is normally written as one word (without a space or hyphen), the proper name World Wide Web and abbreviation WWW are now well-established even in formal English. The earliest references to the Web called it the WorldWideWeb (an example of computer programmers' fondness for intercaps) or the World-Wide Web (with a hyphen, this version of the name is the closest to normal English usage).

Ironically, the abbreviation "WWW" is somewhat impractical as it contains three times as many syllables as the full term "World Wide Web", and thus takes longer to say; however it is easier to type.

How the Web works

When a viewer wants to access a web page or other "resource" on the World Wide Web, he normally begins either by typing the URL of the page into his or her web browser, or by following a hypertext link to that page or resource. The first step, behind the scenes, is for the server-name part of the URL to be resolved into an IP address by the global, distributed Internet database known as the Domain name system or DNS.

The next step is for an HTTP request to be sent to the web server working at that IP address for the page required. In the case of a typical web page, the HTML text, graphics and any other files that form a part of the page will be requested and returned to the client (the web browser) in quick succession.

The web browser's job is then to render the page as described by the HTML, CSS and other files received, incorporating the images, links and other resources as necessary. This produces the on-screen 'page' that the viewer sees.

Most web pages will themselves contain hyperlinks to other relevant and informative pages and perhaps to downloads, source documents, definitions and other web resources.

Such a collection of useful, related resources, interconnected via hypertext links, is what has been dubbed a 'web' of information. Making it available on the Internet produced what Tim Berners-Lee first called the World Wide Web in the early 1990s [1] [2].

Caching

If the user returns to a page fairly soon, it is likely that the data will not be retrieved from the source web server, as above, again. By default, browsers cache all web resources on the local hard drive. An HTTP request will be sent by the browser that asks for the data only if it has been updated since the last download. If it has not, the cached version will be reused in the rendering step.

This is particularly valuable in reducing the amount of web traffic on the internet. The decision about expiry is made independently for each resource (image, stylesheet, JavaScript file etc, as well as for the HTML itself). Thus even on sites with highly dynamic content, many of the basic resources are only supplied once per session or less. It is worth any web site designer collecting all the CSS and JavaScript into a few site-wide files so that they can be downloaded into users' caches and reduce page download times and demands on the server.

There are other components of the internet that can cache web content. The most common in practice are often built into corporate and academic firewalls where they cache web resources requested by one user for the benefit of all.

Apart from the facilities built into web servers that can ascertain when physical files have been updated, it is possible for designers of dynamically generated web pages to control the HTTP headers sent back to requesting users, so that pages are not cached when they should not be — for example internet banking and news pages.

This helps with understanding the difference between the HTTP 'GET' and 'POST' verbs — data requested with a GET may be cached, if other conditions are met, whereas data obtained after POSTing information to the server usually will not.

Origins

This NeXTcube used by Berners-Lee at CERN became the first Web server.
Enlarge
This NeXTcube used by Berners-Lee at CERN became the first Web server.

The underlying ideas of the Web can be traced as far back as 1980, when Tim Berners-Lee, a Briton, built ENQUIRE (referring to Enquire Within Upon Everything, a book he recalled from his youth). While it was rather different from the Web we use today, it contained many of the same core ideas (and even some of the ideas of Berners-Lee's next project after the WWW, the Semantic Web).

In March 1989, Tim Berners-Lee wrote Information Management: A Proposal, which referenced ENQUIRE and described a more elaborate information management system. With help from Robert Cailliau, he published a more formal proposal for the World Wide Web on November 12, 1990. He began implementing those ideas immediately, on a recently acquired NeXT workstation.

By Christmas 1990, Berners-Lee had built all the tools necessary for a working Web [3]: the first Web browser (which was a Web editor as well), the first Web server and the first Web pages which described the project itself.

On August 6, 1991, he posted a short summary of the World Wide Web project on the alt.hypertext newsgroup. This date also marked the debut of the Web as a publicly available service on the Internet.

The crucial underlying concept of hypertext originated with older projects from the 1960s, such as Ted Nelson's Project Xanadu and Douglas Engelbart's oN-Line System (NLS). Both Nelson and Engelbart were in turn inspired by Vannevar Bush's microfilm-based "memex," which was described in the 1945 essay "As We May Think".

Berners-Lee's breakthrough was to marry hypertext to the Internet. In his book Weaving The Web, he explains that he had repeatedly suggested that a marriage between the two technologies was possible to members of both technical communities, but when no one took up his invitation, he finally tackled the project himself. In the process, he developed a system of globally unique identifiers for resources on the Web and elsewhere: the Uniform Resource Identifier.

The World Wide Web had a number of differences from other hypertext systems that were then available:

  • The WWW required only unidirectional links rather than bidirectional ones. This made it possible for someone to link to another resource without action by the owner of that resource. It also significantly reduced the difficulty of implementing Web servers and browsers (in comparison to earlier systems), but in turn presented the chronic problem of broken links.
  • Unlike certain applications, such as HyperCard, the World Wide Web was non-proprietary, making it possible to develop servers and clients independently and to add extensions without licensing restrictions.

On April 30, 1993, CERN announced that the World Wide Web would be free to anyone, with no fees due. Coming two months after the announcement that gopher was no longer free to use, this produced a rapid shift away from gopher and towards the Web.

The World Wide Web finally gained critical mass with the 1993 release of the graphical Mosaic web browser by the National Center for Supercomputing Applications developed by Marc Andreessen. Prior to the release of Mosaic, graphics were not commonly mixed with text in Web pages and its popularity was less than older protocols in use over the Internet, such as Gopher protocol and Wide area information server. Mosaic's graphical user interface allowed the Web to become by far the most popular Internet protocol.

Web standards

At its core, the Web is made up of three standards:

  • the Uniform Resource Identifier (URI), which is a universal system for referencing resources on the Web, such as Web pages;
  • the HyperText Transfer Protocol (HTTP), which specifies how the browser and server communicate with each other; and
  • the HyperText Markup Language (HTML), used to define the structure and content of hypertext documents.

Berners-Lee now heads the World Wide Web Consortium (W3C), which develops and maintains these and other standards that enable computers on the Web to effectively store and communicate different forms of information.

Java and JavaScript

Another significant advance in the technology was Sun Microsystems' Java programming language. It initially enabled Web servers to embed small programs (called applets) directly into the information being served, and these applets would run on the end-user's computer, allowing faster and richer user interaction. Eventually, it came to be more widely used as a tool for generating complex server-side content as it is requested. Java never gained as much acceptance as Sun had hoped as a platform for client-side applets for a variety of reasons, including lack of integration with other content (applets were confined to small boxes within the rendered page) and poor perfomance (particularly start up delays) of Java VMs on PC hardware of that time.

JavaScript, however, is a scripting language that was developed for Web pages. The standardised version is ECMAScript. While its name is similar to Java, it was developed by Netscape and not Sun Microsystems, and it has almost nothing to do with Java, with the only exception being that like Java its syntax is derived from the C programming language. Like Java, Javascript is also object oriented but like C++ and unlike Java, it allows mixed code — both object oriented as well as procedural. In conjunction with the Document Object Model, JavaScript has become a much more powerful language than its creators originally envisioned. Sometimes its usage is expressed under the term Dynamic HTML (DHTML), to emphasise a shift away from static HTML pages.

Ajax (Asynchronous JavaScript And XML) is a JavaScript-based technology that may have a significant effect on the development of the World Wide Web. By providing a method where only part of a page need be updated when required, rather than the whole thing, Ajax allows such updates to be much faster and more efficient. Ajax is seen as an important aspect of Web 2.0. Examples of Ajax techniques currently in use can be seen in Gmail, Google Maps etc.

Sociological implications

The Web, as it stands today, has allowed global interpersonal exchange on a scale unprecedented in human history. People separated by vast distances, or even large amounts of time, can use the Web to exchange — or even mutually develop — their most intimate and extensive thoughts, or alternately their most casual attitudes and spirits. Emotional experiences, political ideas, cultural customs, musical idioms, business advice, artwork, photographs, literature, can all be shared and disseminated digitally with less individual investment than ever before in human history. Although the existence and use of the Web relies upon material technology, which comes with its own disadvantages, its information does not use physical resources in the way that libraries or the printing press have. Therefore, propagation of information via the Web (via the Internet, in turn) is not constrained by movement of physical volumes, or by manual or material copying of information. And by virtue of being digital, the information of the Web can be searched more easily and efficiently than any library or physical volume, and vastly more quickly than a person could retrieve information about the world by way of physical travel or by way of mail, telephone, telegraph, or any other communicative medium.

The Web is the most far-reaching and extensive medium of personal exchange to appear on Earth. It has probably allowed many of its users to interact with many more groups of people, dispersed around the planet in time and space, than is possible when limited by physical contact or even when limited by every other existing medium of communication combined.

Because the Web is global in scale, some have suggested that it will nurture mutual understanding on a global scale. By definition or by necessity, the Web has such a massive potential for social exchange, it has the potential to nurture empathy and symbiosis, but it also has the potential to incite belligerence on a global scale, or even to empower demagogues and repressive regimes in ways that were historically impossible to achieve.

Publishing web pages

The Web is available to individuals outside mass media. In order to "publish" a web page, one does not have to go through a publisher or other media institution, and potential readers could be found in all corners of the globe.

Unlike books and documents, hypertext does not have a linear order from beginning to end. It is not broken down into the hierarchy of chapters, sections, subsections, etc.

Many different kinds of information are now available on the Web, and for those who wish to know other societies, their cultures and peoples, it has become easier. When travelling in a foreign country or a remote town, one might be able to find some information about the place on the Web, especially if the place is in one of the developed countries. Local newspapers, government publications, and other materials are easier to access, and therefore the variety of information obtainable with the same effort may be said to have increased, for the users of the Internet.

Although some websites are available in multiple languages, many are in the local language only. Also, not all software supports all special characters, and RTL languages. These factors would challenge the notion that the World Wide Web will bring a unity to the world.

The increased opportunity to publish materials is certainly observable in the countless personal pages, as well as pages by families, small shops, etc., facilitated by the emergence of free web hosting services.

Statistics

According to a 2001 study [4], there were more than 550 billion documents on the Web, mostly in the "invisible Web". A 2002 survey of 2,024 million web pages [5] determined that by far the most Web content was in English: 56.4%; next were pages in German (7.7%), French (5.6%) and Japanese (4.9%). A more recent study [6] which used web searches in 75 different languages to sample the Web determined that there were over 11.5 billion web pages in the publically-indexable Web as of January 2005.

Speed issues

Frustration over congestion issues in the Internet infrastructure and the high latency that results in slow browsing has led to an alternative name for the World Wide Web: the World Wide Wait. Speeding up the Internet is an ongoing discussion over the use of peering and QoS technologies. Other solutions to reduce the World Wide Wait can be found on W3C.

Academic conferences

The major academic event covering the WWW is the World Wide Web series of conferences, promoted by IW3C2. There is a list with links to all conferences in the series.

"www" in website names

There is no technical reason for a website's name to start with "www"; this is a common convention, just as many organizations once had their main public gopher site at gopher.wherever.edu and still have their public ftp servers at ftp.name.gov for example. Indeed, the first Web server was at info.cern.ch. Some organizations extend this convention by using the prefixes "www2", "www3", "www4", etc., for multiple related websites. Some browsers will automatically try adding "www." to the beginning, and possibly ".com" to the end, of typed URIs if a web page isn't found without them. With the Internet Explorer and Mozilla Firefox browsers, pressing the Control and Enter keys simultaneously will prefix 'www' and suffix '.com' to whatever has been typed into the address box.

no-www

The no-www initiative feels that the www prefix is an extraneous addition to a url, and advocates it's depreciation. At their website (no-www.org) they offer validation for websites which are accessible without the www prefix, and advice to those who would like to make their website so.

Pronunciation of "www"

Most English-speaking people pronounce the 9-syllable letter sequence www used in some domain names for websites as "double U, double U, double U" despite shorter options like "triple double U", "triple dub" or even "World Wide Web" being available.

Some languages do not have the letter w in their alphabet (for example, Italian), which leads some people to pronounce www as "vou, vou, vou." In some languages (such as Czech, Finnish and Hungarian) the w is substituted by a v, so Czechs pronounce www as "veh, veh, veh" rather than the correct but much longer pronunciation "dvojité veh, dvojité veh, dvojité veh;" the same applies to Finnish, where the correct pronunciation would be "kaksoisvee, kaksoisvee, kaksoisvee". Similarly in Hungarian it is pronounced "vé, vé, vé" instead of "duplavé, duplavé, duplavé". Also in Norwegian, and similarly in Swedish and Danish: Instead of the correct "dobbel-ve, dobbel-ve, dobbel-ve" it is pronounced "ve, ve, ve". The pronunciation of "ve" instead of "dobbel-ve" is also used in other abbreviations. Several other languages (e.g. German, Dutch, Afrikaans etc.) simply pronounce the letter W as a single syllable, so this problem doesn't occur. In French "trois double-vé" is probably the most common pronunciation among non-geeks (geeks prefer the faster veu-veu-veu) in a way similar to Spanish, where www is often pronounced as "triple doble ve" or "triple uve doble" instead of "doble ve, doble ve, doble ve" or "uve doble, uve doble, uve doble".

In English pronunciation, saying the full words "World Wide Web" takes one-third as many syllables as saying the initialism "www". According to Berners-Lee, others mentioned this fact as a reason to choose a different name, but he persisted.

"[The World Wide Web is] the only thing I know of whose shortened form — www — takes three times longer to say than what it's short for."
Douglas Adams, The Independent on Sunday, 1999

Another, less common way of saying "www" is w³, or "double u to the power of 3," ("power" because the 3 in w³ is superscripted). This is the basis of the name W3C; the original logo had a superscripted 3. However, the use of this initialism is uncommon. One further way is "All the double-U s"

In New Zealand and occasionally in Australia, "www" is often pronounced "dub-dub-dub". This is widely accepted (for example its use in TV commercials appears standard) and is more concise than some other renditions in English.

In the Southern United States the two syllable pronunciation of the letter w "dub-ya" is often used, resulting in "dub-ya dub-ya dub-ya", even when spoken by persons who would normally use the "standard English" three syllable pronunciation for a single letter w.

In the US, some uncommon pronunciations include "Triple dub," "Trip dub," and simply "wuh wuh wuh".

In Chinese language, world-wide-web is commonly translated to wàn wéi wǎng (万维网) by retaining the three w’s, and the original meaning of the original term, since wàn wéi wǎng means ten-thousand dimensional net (or web) in Chinese.

Standards

The following is a cursory list of the documents that define the World Wide Web's three core standards:

See also

References

External links

Personal tools