Basic Web Page Structure

Reading Pages 81-85

At the time HTML was developed, there were a number of ways to mark up text. In this context, mark up means to put marks in the text detailing what it is supposed to look like. This comes from the old days when handwritten text was marked up by an editor to tell the typesetter what to do. There were proprietary systems like Microsoft Word or other commercial word processors. There were free systems like LaTeX or troff. These had in common that they were very large and complex. What was needed was something that was simple and free.

Large publishing companies had handled this problem through the development of SGML. This is the Standard Generalized Markup Languagee. Rather than a markup system itself, it is a way to design markup systems. The author creates a Document Type Definition or DTD that describes all the elements of the markup language. Then other pieces of software can read the DTD and know what to do with the document.

The intent here was to separate the structure and content of the document from the appearance of the document. That is, you would mark bits of the text as section or chapter headers. But you wouldn't indicate that chapter titles would be in 16 point type using a Bazooka font, centered and green. This is important, because maybe the hardcover version of the book will be printed in color, but the paperback won't and also won't use the fancy fonts. HTML kind of mixes these two ideas together. When we see style sheets, it will help separate them a bit more. So, HTML is a markup language designed for web documents that was built using an SGML DTD.

The DTD lists all the elements, called tags, of the markup language and their relationships to each other. It also specifies what attributes, or options are allowed on a tag. So lets start by looking at a very simple web page and start examining the tags.


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<html>
<head>
	<title>Hello, World!</title>
</head>

<body>

<h1 align="center">
Hello, World!
</h1>


</body>
</html>

Here is what it looks like when the browser displays it.

While this is pretty simple, it shows the main features of HTML. There are many more tags than these, but all tags have some things in common. All tags come in pairs and start with a '<'. The start tag has the name of the tag immediately after the <. The end tag has a '/' just after the < and before the name. Finally, all tags end with a '>'. So, in the example above, for the html tag, the start tag is <html:> and the end tag is </html>. Some tags have properties or attributes on them, like the h1 tag in the example.

These all have the form of


name="value"
There can be several attributes on a tag and they are separated by spaces. They all come between the tag name and the >. The quotes are not needed if the value is a single word or number, but I like them as they make it clearer where the value begins and ends. Some tags don't need to have the end tag specified, but most do.

Between the start and end tags there can be text possibly mixed with other tags. These are the contents of the tag and everything between the start and end is controlled by that tag.

Let's start at the top and work our way through the tags in the example.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
This one is special, notice the '!' after the tag start. This is telling the browser some details about the version of HTML used in the document. In this case it is version 4.0 and is in English. This was inserted by the HTML editor I used. It is formally correct to include it but it isn't needed.

The next tag
<html>
All HTML documents need this tag. It just indicates that the document is written in HTML and the browser should display it like HTML. The last line in the file will be the
</html>
end tag.

The
<head>
indicates the start of information about the page. For now, the only thing that goes in here is the <title> tag. Later we will see some other tags that can be used here.

The title tag tells the browser what to display in the title bar of the browser window.
<title>Hello, World!</title>
Notice that the start tag, the title text and the end tag are all on the same line. The browser doesn't care about space or even blank lines between tags. So, you can spread out the HTML markup to make it easier to read. The text between tags is run together and written to the screen to fit. It doesn't matter where you put the new lines. Also note that the title tags are inside the head tags.

The body tag marks the start of the interesting part of the document.
<body>
Everything between the body start and end tags is what you will see in the browser window. There are a number of useful attributes to the body tag that we will see later.


<h1 align="center"> Hello, World! </h1>
This is the text that will be displayed in the browser window. The h1 tag marks the beginning of text that is a first level header. This means that it will be displayed in a larger font and in bold. It may look slightly different in different browsers because the default type fonts may be different. Here the tag has an attribute. The align attribute controls justification of the text in the line. In this example, we specified center, so the text is centered in the line.

Finally we have the closing tags. All HTML documents should end with
</body> </html>

This document, while simple, has the same basic structure that all web documents have. There are many other tags that can be used, with many attributes. We can also control the default behavior of the tags and, using JavaScript and style sheets, we can make the pages appearance change after it has been loaded into the browser.