(X)HTML Tutorial

How Web Pages Work

A web page is a simple text file which also contains markup tags that describe how the text should be formatted on screen. The web page is stored on a computer known as a web server (server, for short). In order for the web page to be displayed on that computer or another computer, it must be accessed and interpreted by a specially designed program called the client software (client, for short). If the client software is on a computer other than the server, that computer is often also called the client. A web browser is a type of client software that is able to request web page code from a server over the internet, interpret the markup, and display it on the screen.

In order to request a web page, a browser must follow a network protocol, a set of rules for how data should be transfered. One of the easiest network protocols to understand is File Transfer Protocol (FTP). In practical terms, an FTP client requests from the server an exact copy of a file and saves it on the client (or vice versa). Although not used for directly accessing the markup of web pages, FTP programs are important parts of web publishing since web pages which may be designed on a PC must be placed on a web server to be accessible to the internet.

Web browsers use a different network protocol called Hypertext Transfer Protocol (HTTP). To see how browsers request resources from servers, it is easiest to use a concrete example. Most browsers have a bar at the top of the screen which is called something like the "address bar". You enter the address of the network resource you are requesting in this bar. The address must be formatted in a certain way. It begins with an identification of the protocol being used (HTTP), followed by the location of the server (called the domain name), then location on the server where the web page is stored, and finally the name of the web page itself. The address, given in this format, is known as Uniform Resource Locator, or URL. Let's look at the URL of this page as an example, using colour coding to help identify the various parts of the URL:

protocol://domain/server-location/filename

http://www.csun.edu/~sk36711/WWW/tutorials/publishing.html

A few notes on each section:

  1. The protocol is followed by a colon and two forward slashes. You will sometimes see a variant of the "http" protocol called "https" (the s stands for "secure").
  2. The domain name is effectively an alias of the server's much hard to remember numeric Internet Protocol (IP) address. In the example above, the domain name is for California State University-Northridge's server.
  3. Following the domain name is the path to the file on the server. Each forward slash indicates a directory or folder. So the file "publishing.html" is inside the "tutorials" folder, which is inside the "WWW" folder, which is inside the folder called "~sk36711".
  4. Web pages most commonly, but not always, end in .html, which makes them easy to identify as web pages.