HTML

From Wikipedia, the free encyclopedia.

Jump to: navigation, search
HTML

Cascading Style Sheets
Character encodings
Layout engine comparison
Dynamic HTML
Font family
HTML editor
HTML element
HTML scripting
Unicode and HTML
Web colors
W3C
XHTML

In computing, HyperText Markup Language (HTML) is a markup language designed for the creation of web pages and other information viewable in a browser. HTML is used to structure information -- denoting certain text as headings, paragraphs, lists and so on -- and can be used to define the semantics of a document.

Originally defined by Tim Berners-Lee and further developed by the IETF with a simplified SGML syntax, HTML is now an international standard (ISO/IEC 15445:2000). Later HTML specifications are maintained by the World Wide Web Consortium (W3C).

Early versions of HTML were defined with looser syntactical rules which helped its adoption by those unfamiliar with web publishing. Web browsers commonly made assumptions about intent and proceeded with rendering of the page. Over time, the trend in the official standards has been to create an increasingly strict language syntax; however, browsers still continue to render pages that are far from valid HTML. The current version of the HTML specification is now XHTML 1.0, this being very similar to the earlier HTML 4.01 that it replaces. The change from HTML to XHTML applies the stricter rules of XML to HTML to make it easier to process and maintain.

Contents

Introduction

HTML is a form of markup that is oriented toward the construction of single-page text documents with specialized rendering software called HTML user agents, the most common example of which is a web browser. HTML provides a means by which the document's content can be annotated with various kinds of metadata and rendering hints. The rendering cues may range from minor text decorations, such as specifying that a certain word be underlined or that an image be inserted, to sophisticated imagemaps and form definitions. The metadata may include information about the document's title and author, structural information such as headings, paragraphs, lists, and information that allows the document to be linked to other documents to form a hypertext web.

HTML is a text based format that is designed to be both readable and editable by humans using a text editor. However, writing and updating a large number of pages by hand in this way is time consuming, requires a good knowledge of HTML and can make consistency difficult to maintain. Visual HTML editors such as Macromedia Dreamweaver, Adobe GoLive or Microsoft FrontPage allow the creation of web pages to be treated much like word processor documents. The code generated by these programs can be of poor quality. However, the open-source visual HTML editor Nvu generates code of high quality.

HTML can be generated on the fly using a server-side scripting system such as Perl, PHP, JSP, or ASP. Many web applications like content management systems, wikis and web forums generate HTML pages.

Version history of the standard

There is no official standard HTML 1.0 specification because there were multiple informal HTML standards at the time. However, some people consider the initial edition provided by Tim Berners-Lee to be the definitive HTML 1.0. That version did not include an IMG element type. Work on a successor for HTML, then called "HTML+", began in late 1993, designed originally to be "A superset of HTML…which will allow a gradual rollover from the previous format of HTML". The first formal specification was therefore given the version number 2.0 in order to distinguish it from these unofficial "standards". Work on HTML+ continued, but this never became a standard.

The HTML 3.0 standard was proposed by the newly formed W3C in March 1995, and provided many new capabilities such as support for tables, text flow around figures and the display of complex math elements. Even though it was designed to be compatible with HTML 2.0, it was too complex at the time to be implemented, and when the draft expired in September 1995 work in this direction was discontinued due to lack of browser support. HTML 3.1 was never officially proposed, and the next standard proposal was HTML 3.2 (code-named "Wilbur"), which dropped the majority of the new features in HTML 3.0 and instead adopted many browser-specific element types and attributes which had been created for the Netscape and Mosaic web browsers. Support for math as proposed by HTML 3.0 finally came about years later with a different standard, MathML.

HTML 4.0 likewise adopted many browser-specific element types and attributes, but at the same time began to try to "clean up" the standard by marking some of them as deprecated, and suggesting they not be used.

Minor editorial revisions to the HTML 4.0 specification were published as HTML 4.01.

The most common extension for files containing HTML is .html, however, older operating systems, such as DOS, limit file extensions to three letters, so a .htm extension is also used. Although perhaps less common now, the shorter form is still widely supported by current software.

Markup element types

Below are the kinds of markup element types in HTML.

  • Structural markup. Describes the purpose of text. For example,
<h2>Golf</h2>
directs the browser to render "Golf" as a second-level heading, similar to "Markup element types" at the start of this section. Structural markup does not denote any specific rendering, but most web browsers have standardised on how elements should be formatted. For example, by default, headings like these will appear in large, bold text. Further styling should be done with Cascading Style Sheets (CSS).
  • Presentational markup. Describes the appearance of the text, regardless of its function. For example,
<b>boldface</b>
will render "boldface" in bold text. In the majority of cases, using presentational markup is inappropriate, and presentation should be controlled by using CSS. In the case of both <b>bold</b> and <i>italic</i> there are elements which usually have an equivalent visual rendering but are more semantic in nature, namely <strong>strong emphasis</strong> and <em>emphasis</em> respectively. It is easier to see how an aural user agent should interpret the latter two elements.
  • Hypertext markup. Links parts of the document to other documents. For example,
<a href="http://wikipedia.org/">Wikipedia</a>
will render the word Wikipedia as a hyperlink to the specified URL.

The Document Type Definition

In order to specify which version of the HTML standard they conform to, all HTML documents should start with a Document Type Declaration (informally, a "DOCTYPE"), which makes reference to a Document Type Definition (DTD). For example:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
                      "http://www.w3.org/TR/html4/strict.dtd">

This declaration asserts that the document conforms to the Strict DTD of HTML 4.01, which is purely structural, leaving formatting to Cascading Style Sheets. In some cases, the presence or absence of an appropriate DTD may influence how a web browser will display the page.

In addition to the Strict DTD, HTML 4.01 provides Transitional and Frameset DTDs. The Transitional DTD was intended to gradually phase in the changes made in the Strict DTD, while the Frameset DTD was intended for those documents which contained frames.

Separation of style and content

Efforts of the web development community have led to a new thinking in the way a web document should be written; XHTML epitomizes this effort. Standards stress using markup which suggests the structure of the document, like headings, paragraphs, block quoted text, and tables, instead of using markup which is written for visual purposes only, like <font>, <b> (bold), and <i> (italics). Some of these elements are not permitted in certain varieties of HTML, like HTML 4.01 Strict. CSS provides a way to separate the HTML structure from the content's presentation, by keeping all code dealing with presentation defined in a CSS file. See separation of style and content.

Serving HTML

The World Wide Web primarly uses HTTP to serve HTML documents to users. In order to do this correctly, it is necessary for the document to be described correctly: the necessary metadata includes the MIME Type (typically "text/html", although other choices include "application/xhtml+xml") and the character encoding (see Character encodings in HTML).

HTML Email

HTML is also used in email messages. Many email clients include a GUI HTML editor for composing emails and a rendering engine for displaying them once received. Use of HTML in email is quite controversial due to a variety of issues. The main benefit is the ability to decorate an email with presentational attributes (bold headings etc). However, the disbenefits include:

  • the recipient may not have an email client that can display HTML
  • the email has larger size because lots of formatting will be much larger than the plain text equivalent. This issue is made slightly worse by the fact that, for compatibility, most clients send a plaintext version as well.
  • overuse of formatting (there was at one stage a craze for making letterheads using HTML and sending them as part of every e-mail)
  • potential security issues of deluding the recipient to accept an email as being from an authoriative source (such as a bank) when this is not the case; this is related to phishing scams.
  • potential security issues of simply rendering a complex format like HTML.

For these reasons many mailing lists deliberately block HTML email either stripping out the HTML part to just leave the plain text part or rejecting the entire message.

See also

External links

Wikibooks
Wikibooks has more about this subject:

W3C Specifications

Validators

Selected Tutorials/Guides

Personal tools