HTML

http | declaration | syntax | escaping | elements | attributes | selectors | properties | events | dom | versions

HTTP

An HTTP message consists of

  • a CRLF terminated first line
  • zero or more CRLF terminated message headers
  • a CRLF terminated blank line
  • an optional message body

If the message is a request, the first line consists of the method, the Request-URI, and the HTTP version. Here is an example:

GET /reader/view HTTP/1.1

nc is a convenient command for sending raw HTTP requests to a server:

$ echo "GET / HTTP/1.1\r\n\r" | nc www.google.com 80

One can also use telent to make an HTTP request:

$ telnet
telnet> open google.com 80
GET / HTTP/1.1

^]
telnet> close

If the message is a response, the first line consists of the HTTP version, the status code, and the reason phrase. Here is an example:

HTTP/1.1 404 Not Found

The methods are

  • GET
  • HEAD
  • POST
  • PUT
  • DELETE
  • OPTIONS
  • TRACE
  • CONNECT

A HEAD response is supposed to be the same as the GET response with the message body omitted.

POST is used to add data to an existing resource:

The actual function performed by the POST method is determined by the
server and is usually dependent on the Request-URI. The posted entity
is subordinate to that URI in the same way that a file is subordinate
to a directory containing it, a news article is subordinate to a
newsgroup to which it is posted, or a record is subordinate to a
database.

PUT should be used to create or replace a resource:

If the Request-URI refers to an already
existing resource, the enclosed entity SHOULD be considered as a
modified version of the one residing on the origin server. If the
Request-URI does not point to an existing resource, and that URI is
capable of being defined as a new resource by the requesting user
agent, the origin server can create the resource with that URI.

OPTIONS is a way for client to query about supported headers. The server returns the available headers in the response. If the Request-URI is *, then the headers are available for all resources. Otherwise they are for the resource specified.

TRACE causes the server to echo the request.

In regards to CONNECT:

This specification reserves the method name CONNECT for use with a
proxy that can dynamically switch to being a tunnel (e.g. SSL
tunneling

Declaration

How to declare an HTML5 document with a style sheet and some JavaScript:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>title</title>
    <link rel="stylesheet" href="style.css">
    <script src="script.js"></script>
  </head>
  <body>
    <!-- page content -->
  </body>
</html>

Syntax

A tag starts with '<' and ends with '>'.

An element usually consists of two tags around a block of text. The block of text can usually contain sub-elements.

In a start tag, the tag name immediately follows the '<'. In an end tag, the name immediately follows '</'. The start and end tag of an element have the same name.

Start tags can have attributes. These appear between the tag name and the closing '>'. They are separated from each other and the tag name by whitespace. Each attribute can consist of an attribute name and attribute value separated by an equals sign, or just a name. Whitespace is permitted around the equals sign. Attribute values can be single quote delimited or double quote delimited. An attribute value must be quoted if it contains a space or any of the characters "'`=<>

Tag names and attribute names are not case-sensitive.

Comments are of the form <!-- a comment -->.

Escaping

Character data.

Attribute values.

CSS escaping.

JavaScript escaping.

Elements

Omission rules may have caveats (i.e. html cannot be omitted if content starts with comment)

Content may have ordering rules (i.e. head before body in html)

Content may have cardinality (i.e. title at most once in head)

Global attribute may have special semantics on element (i.e. title attribute on LINK).

TEXT: Text that is not inter-element whitespace.

TRANSPARENT: same content restrictions as parent

root element
element category tag omission content additional attributes
html none o o head body manifest
document metadata
element category tag omission content additional attributes
head none o o %metadata
title %metadata - - TEXT
base %metadata - f empty href target
link %metadata - f empty href crossorigin rel media hreflang type sizes
meta %metadata - f empty charset http-equiv name content
style %metadata - - depends on type media type
scripting
element category tag omission content additional attributes
script %metadata
%flow
%phrasing
%script-supporting
- - depends on src and type src type charset async defer crossorigin
noscript %metadata
%flow
%phrasing
- - in head:
link style meta
in body:
TRANSPARENT
template %metadata
%flow
%phrasing
%script-supporting
- - complicated
sections
element category tag omission content additional attributes
body
section
nav
article
aside
h1
h2
h3
h4
h5
h6
header
footer
address
main
grouping content
element category tag omission content additional attributes
p
hr
pre
blockquote
ol
ul
li
dl
dt
dd
figure
figcaption
div
text-level semantics
element category tag omission content additional attributes
a
em
strong
small
s
cite
q
dfn
abbr
date
time
code
var
samp
kbd
sub
sup
i
b
u
mark
ruby
rt
rp
bdi
bdo
span
br
wbr
edits
element category tag omission content additional attributes
ins
del
embedded content
element category tag omission content additional attributes
img - f empty
iframe
embed
object
param
video
audio
source
track
canvas
map
area
svg
math
tabular data
element category tag omission content additional attributes
table
caption
colgroup
col
tbody
thead
tfoot
tr
td
th
forms
element category tag omission content additional attributes
form
fieldset
legend
label
input
button
select
datalist
optgroup
option
textarea
keygen
output
progress
meter
interactive elements
element category tag omission content additional attributes
details
summary
menuitem
menu

Attributes

These attributes can be used on all elements:

accesskey
class
contenteditable
dir
draggable
dropzone
hidden
id
lang
spellcheck
style
tabindex
title
translate

Selectors

dom css selector xpath
by id getElementById('foo') #foo //*[@id='foo']
by class getElementsByClassName('foo') .foo //*[@class='foo']
by tag getElementsByTagName('div') div //div
by attribute //*[@title]
by attribute value [title="foo"] //*[@title='foo']
union h1, h2 //h1 | //h2
child .foo > li [@class='foo']/li
descendant .foo td [@class='foo']//td

Properties

Events

DOM

Versions

html 1.0 | html 2.0 | css1 | html 3.2 | html 4.01 | css2 | css3 | html5

root element
element html 1.0 html 2.0 html 3.2 html 4.01   html5  
html X X X X
document metadata
element html 1.0 html 2.0 html 3.2 html 4.01 html5
head X X X X
title X X X X X
base X X X X
link X X X X
meta X X X X
style X X X
scripting
element html 1.0 html 2.0 html 3.2 html 4.01 html5
script X X X
noscript X X
template X
sections
element html 1.0 html 2.0 html 3.2 html 4.01 html5
body X X X X
section X
nav X
article X
aside X
h1 X X X X X
h2 X X X X X
h3 X X X X X
h4 X X X X X
h5 X X X X X
h6 X X X X X
header X
footer X
address X X X X X
main X
grouping content
element html 1.0 html 2.0 html 3.2 html 4.01 html5
p X X X X X
hr X X X X
pre X X X X
blockquote X X X X
ol X X X X
ul X X X X X
li X X X X X
dl X X X X X
dt X X X X X
dd X X X X X
figure X
figcaption X
div X X
text-level semantics
element html 1.0 html 2.0 html 3.2 html 4.01 html5
a X X X X X
em X X X X
strong X X X X
small X X
s D X
cite X X X X
q X X
dfn X X
abbr X X
date X
time X
code X X X X
var X X X X
samp X X X X
kbd X X X X
sub X X
sup X X
i X X X X
b X X X X
u D X
mark X
ruby X
rt X
rp X
bdi X
bdo X X
span X X
br X X X X
wbr X
edits
ins X X
del X X
embedded content
element html 1.0 html 2.0 html 3.2 html 4.01 html5
img X X X X
iframe X X
embed X
object X X
param X X X
video X
audio X
source X
track X
canvas X
map X X X X
area X X X X
svg X
math X
tabular data
element html 1.0 html 2.0 html 3.2 html 4.01 html5
table X X X X
caption X X X X
colgroup X X X X
col X X X X
tbody X X X X
thead X X X X
tfoot X X X X
tr X X X X
td X X X X
th X X X X
forms
element html 1.0 html 2.0 html 3.2 html 4.01 html5
form X X X X
fieldset X X
legend X X
label X X
input X X X X
button X X
select X X X X
datalist X
optgroup X X
option X X X X
textarea X X X X
keygen X
output X
progress X
meter X
interactive elements
element html 1.0 html 2.0 html 3.2 html 4.01 html5
details X
summary X
menuitem X
menu X X X D X
obsolete elements
element html 1.0 html 2.0 html 3.2 html 4.01 html5
acronym X
applet X D
basefont X D
big X
blink R
center X D
dir X X X D
font X D
frame F
frameset F
isindex X X X D
listing X X D
marquee R
nextid X X
noframes F
plaintext X X D
strike D
tt X X X
xmp X X D

D: deprecated
F: in separate frameset doctype
R: rejected

html 1.0

HTML Tags 1992

Berners-Lee had a working web browser and server by 1990. He first described some of the HTML elements in 1991 and the elements were described completely in 1992.

The 1992 document mentions that tags are not case sensitive and that closing tags are sometimes optional. The HTML entities &lt; &gt; and &amp; are defined.

html 2.0

RFC 1866: Hypertext Markup Language - 2.0
RFC 1867: Form-based File Upload in HTML
RFC 1942: HTML Tables
RFC 1980: Client-Side Image Maps

The HTML 2.0 standard was described by Berners-Lee in 1995. An SGML DTD was used to document the markup. The 2.0 specification adds HTML entities for non-ASCII characters in the ISO 8859-1 character set. It mentions that user agents should ignore elements with unrecognized tags. The HTML 2.0 standard introduced font tags: B, I, TT and phrase tags: CITE, CODE, EM, KBD, SAMP, STRONG, VAR.

The HTML 2.0 standard was also published by the IETF in 1996 as RFC1866 and RFC1867 (forms). The IETF also published RFC1942 and RFC1980 in 1996. These RFCs covered tables and image maps respectively and introduced elements not in the 1995 document by Berners-Lee.

css1

In January 1996 the W3C published CSS Level 1. it provided control over font, foreground and background color, and horizontal and vertical alignment. It also gave control over borders and the spacing outside (margin) and inside (padding) the border.

In order for a stylesheet to be used by an HTML page several additions had to be made which would not be incorporated into the HTML standard until 3.2 or 4.0.

The first change was the addition of the type attribute to the LINK element. This was added to the HTML standard in version 4.0.

<link rel=stylesheet type="text/css" href="http://foo.com/bar.css">

The next change was the addition of the STYLE element. The 3.2 standard defines it to contain CDATA content. A STYLE element can contain the styles directly or it can load them from an external document via the @import keyword.

@import url(http://style.com/basic);
H1 { color: blue }

Addition of the id, class, and style attributes to all elements. These were added to the HTML standard in version 4.0. The style attribute permits a style to be put on an element directly. The id and class attributes give stylesheets more fine grained control over the elements to which they apply properties. Per the HTML 4.0 specification the value assigned to an id attribute must be unique in the document.

html 3.2

HTML 3.2 Reference Specification

In January 1997 the W3C published the HTML 3.2 standard. Two tags which had already been implemented by browsers, the Netscape BLINK tag and the Internet Explorer MARQUEE tag, were rejected. Four of the tags which were introduced in 3.2 would be deprecated in 4.0:

tag status
applet deprecated in 4.0
basefont deprecated in 4.0
blink rejected
center deprecated in 4.0
font deprecated in 4.0
marquee rejected
param
script
style

The elements listing, plaintext, and xmp from HTML 2.0 were marked as deprecated.

html 4.01

HTML 4.01 Specification

In December 1997 the W3C published the HTML 4.0 standard. It consists of three standards: strict, transitional, and frameset. The HTML 4.01 standards were published in 1999. Errata to the HTML 4.01 standard were last published in 2001. XHTML 1.0 also became a recommendation at this time.

css2

CSS Level 2 was first published in 1998. CSS Level 2 Revision 1 became a candidate for recommendation in 2007. CSS2 added absolute, relative, and fixed positioning. For fixed positioning it permitted the specification of z-order.

Landmark Browsers

Standards are nice, but ultimately what matters is the capabilities of the browsers that people actually use.

The first browser was WorldWideWeb which was implemented on the NeXT computer and never ported. The original web was intended to support editing documents in place like a wiki. On the NeXT the NEXTID was used to make that possible. Browsers written for other operating system could not implement the functionality, however.

Mosaic (1993) was the first browser which could display images inline with text. The WorldWideWeb browser initially displayed them in separate windows. Mosaic was available for X Windows, Macintosh, Windows, and Amiga.

CGI and Forms circa 1993. Was Mosaic the first?

Netscape Navigator was the most popular browser from soon after its introduction in 1994 to 1998. It introduced the presentational elements that were added in HTML 3.2 and deprecated in favor of CSS in HTML 4.0. The following features all first appeared in Navigator:

feature date navigator version
HTTPS
cookies 1994 0.9beta
javascript 1995 2.0B3
frames 1996 2.0

Internet Explorer 4, released in 1997, introduced favicon.ico. In the Microsoft implementation the browser simply requested the resource favicon.ico. In 2007 the W3C recommended that the LINK element be used:

<link rel="icon" 
      type="image/png" 
      href="http://example.com/myicon.png">

Internet Explorer 5.0 was released in 1999. It was faster than Navigator 4. It had 50% market share by 2000 and 80% market share when IE6 was released in 2001. It was available on Macintosh and Solaris, unlike later versions of IE. The Macintosh version was said to be the first browser to implement 99% of CSS1. IE 5.5 (2000) upgraded the CSS capabilities of the Windows version.

Unfortunately, IE 5 departed from CSS1 when it comes to the box model. The specification states that margin, border, and padding should not be part of an element's height and width. IE 5 includes the padding and border in the element's height and width, however. The behavior was fixed in IE 6, but by that time a large number of web pages had been written which conformed to the incorrect behavior. As a result IE 6 would sometimes render a page in a "quirks mode" which used the IE 5 box model. Whether IE 6 uses "quirks mode" depends on the DOCTYPE declaration.

IE 5 could make HTTP requests and use the content in the currently rendered page without a reload. The library was called XMLHTTP in IE 5 and IE 6. The Mozilla foundation added the functionality to Gecko as part of the XMLHttpRequest library in 2002. Other browsers adopted XMLHttpRequest including IE 7 in 2006.

css3

The CSS3 specification is divided into modules. Three of these modules became recommendations in 2011: "Media Queries", "Namespaces", and "Selectors Level 3". One module became a recommendation in 2012: "Colors".

html5

HTML5

Between 2002 and 2006, work was done on XHTML 2.0, but no recommendation was ever made.

Work on HTML5 started in 2008. It became a "candidate recommendation" in 2012.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License