(X)HTML Tutorial

Introduction

Note: This web site is in its first draft form. Its content is now complete, but it may undergo further refinement.

HTML stands for Hypertext Markup Language. Markup is an old technology; it is just a system of codes (that is, a language) added to a text which indicate how it should be formatted or displayed. HTML is a markup language specifically designed for indicating how text (or other elements) should be displayed by on web pages. HTML is primarily a descriptive coding language, rather than a procedural one; it does not issue commands for the computer to perform step-by-step tasks. It uses a system of tags, notations enclosed in angular brackets, which can be interpreted by the web browser. For instance, the tag <b> indicates that the browser should render the text in bold. One of the most significant features of HTML is its ability to indicate that some text elements are links to other web pages, so that clicking these elements causes the browser to navigate to the designated pages. This ability to link between pages (or to different parts of the same page) is what is meant by hypertext.

The origins of HTML are in another markup system, Standard Generalized Markup Language (SGML), which was developed starting in the 1960s to aid in the sharing of electronic documents in government and a variety of industries. HTML is really a subset of SGML which was adopted for use in web pages. It has gone through a number of versions since it was first developed in the 1980s. By the 1990s, it's development took place under the auspices of the World Wide Web Consortioum (W3C).

Originally, HTML was intended to be a semantic markup language, in which the meaning of the encoded data is emphasised, rather than its appearance. However, as web browsers became more and more capable, the list of HTML tags grew, and many of them (like the <b> tag mentioned above) referred only to appearance. In addition, browsers implemented these tags haphazardly, so that different browsers might display the coded data differently or ignore the tags altogether. In recent years, there has been an effort by the makers of web browsers to be more standards-compliant. However, there are now millions of web pages on the internet which use tags that don't meet these standards. As new browsers are produced, there is no guarantee that they will be able to read these old web pages. In response to these issues there has been an increasing push to code according to standards, to use as little non-semantic tagging as possible, and to use a separate markup language, most commonly Cascading Stylesheets (CSS), for provding information to the browser about the style or appearance of the data.

Independently, in the 1990s, a new markup language was developed for use on internet. Extensible Markup Language (XML) is another descendant of SGML, but it was intended to be entirely semantic. In other words, it contains no tags like <b>, but it can contain tags like <name>, which indicates that the data is a name, but not how a name should appear. In addition, the author of an XML file has the ability to invent his or her own tags to suit the data. This is what is meant by "extensible"; the markup language can be extended, which is particularly useful for storing and accessing information. In addition, XML was intended to be browser independent. Since XML markup provides no information about the appearance of the data, it does not suffer from the inconsistencies of implementation or obsolete codes that plague HTML. However, XML does rely on other technologies (like CSS) to be integrated into web pages.

In order to display XML-coded data in a web page, it must be transformed into HTML, so that the browser can interpret it. Luckily, the syntax (coding conventions) of HTML and XML are very similar, reflecting their common origins. In the year 2000, the W3C released standards for an Extensible Hypertext Markup Language (XHTML). XHTML is basically HTML that strictly follows the coding practices of XML. The differences are slight. For instance, HTML is not case sensitive: <b> and <B> are the same. XML is case sensitive, so, in XHTML, tags must be consistently the same case. Other common features of XML and XHTML are discussed elsewhere in this tutorial.

Since the differences between HTML and XHTML are so slight, they are often referred to collectively as (X)HTML. Unless otherwise stated, this tutorial assumes XHTML coding conventions.

How to Use This Tutorial

This tutorial is designed to introduce you the concepts relevant to coding web pages and the issues that arise from these concepts, particularly in terms of how textual meaning is represented in digital format. Although it is intended as a teach-yourself (X)HTML tutorial, the actual (X)HTML codes for creating individual effects on web pages are introduced very gradually in small pieces. This makes learning (X)HTML much easier and more manageable, but it can also be frustrating, as you will frequently be told that the methods will be described in greater detail later in the tutorial. This is especially true where the W3C recommends that CSS be used. Don't be discouraged. Keep reading, and you gradually build up the vocabulary to make sophisticated web pages.