Now that we know how to name machines and connect to them, how do we find something that machine? The solution is to use a Universal Resource Locator (URL). A URL is the name of a resource somewhere in the web. A resource could be a file, usually an HTML document, but could be an application like ftp or telnet. It could also be files in other formats. Or a picture, sound or movie. As the essence of the web is the links between documents, we will spend some time studying the syntax and semantics of URLs.
Every URL has several parts. The first is the service. This specifies what kind of service we want from the server machine. For example, if we are requesting a document to be displayed in the browser, we use the http service. http, as you will recall, is the HyperText Transfer Protocol. Other services include ftp, telnet, net news and email. The service tells the server what to do to fulfill your request. In the case of the http service, it looks at the rest of the URL to figure out the location on the web of the file you are looking for. For other services, like ftp, it might start up an application.
After the service name, there is a separator of some sort. The actual separator varies with the service.
Service | Separator |
---|---|
http | :// |
ftp | :// |
news | : |
: | |
telnet | : |
The next part of the URL is dependent on the service. For http, this is the machine or host name, the same as we saw in the discussion on names. You could also use the IP address if you know it. You can also specify the port number. This is usually unneeded as each service has a default port number and that it is what most servers use. But in case they don't, the port number comes after the host name with a ':' (colon) character between them. So far, our URL would look like this.
http://www.xnet.comThis is a pretty common form of URL. If we put this in the browser, we would get the default page for the xnet server. Many web sites are accessed this way. This gets you the index page for the site and you can navigate from there.
We have figured out how to contact to the host where the web page we want is located. The next part of the URL specifies the location of the document on the host. So now, we have to discuss pathnames.
Files on the computers disk are organized in a hierarchical structure called a tree. The top of the tree is called the root. Part of the reason software developers get paid well is because we call the top of a tree the root and thus make the whole discipline really confusing to most people.
On a Windows machine, the top of the tree is the disk drive, usually
represented by a letter, like C.
On my machine the structure of the class web site looks partly like this
The green boxes are directories or folders and the blue are files.
The directory tree on my machine that leads to this looks like.
A path is the list of directories we have to go through on our way from
one place in the tree to another.
A full path is the list starting at the root.
Each of the directories in the list is separated by a backslash in windows
and a slash on Unix based systems.
So the full pathname for the assignment one page is
We can also use relative pathnames. These describe a route from one part of the tree to another. As we wander around in the tree, at any given time we are in some directory. We may at times want to use files that are in another directory. We could give full path names to the other files, but this causes problems if we ever move the tree as a whole. For example, if we move the web site from drive C to drive D. All the full pathnames would have to be changed.
In the relative pathnames, the folder directly above us has a nickname of '..' or dot dot. So we can go up in the tree without mentioning the name of the upper folder. This way, if we move things as a whole, the relative pathnames don't change. They only change if we move things around in the tree, not is we move the tree. So, if we are in the notes directory and want to use a picture from the images directory, like we did in the section on images, we do it with relative path names like this.
We first notice that the images and notes folders are both part of the web folder. So to get from one to the other, we have to first go up out of the notes folder into the web folder. In relative pathnames, going up means using '..'. Then we have to go down again into the images folder and finally name the file we want.
The relative pathname of the jelly picture from the notes directory is
Use relative pathnames when you can as they not only don't have to be moved when the tree moves but it is faster to do. Especially if you had to give a whole URL to get there.
Putting all this together, we combine the machine name stuff with the pathname stuff to get this,
On the web server, my web site (everything under web) is located in a directory called public_html. This is a convention to allow a little shorthand. I don't have to put a full path in the URL. When the browser sees the ~mtnr2, it knows it is to look up the location of my directories as a full path. It then appends the folder public_html to that and the rest of the URL is appended to that. So after all is done, the full path it gets is something like
The anchor or a tag is the way that links are written into an HTML page. There are two variations on the anchor tag/ They can be the source or the destination of the link. As the source, they are wrapped around the text (or other object) that the user clicks on. As the destination, they allow a person to link into a specific part of the document.
First the destination version.
Anywhere in the document that you want someone to be able to link to,
you can put an anchor tag.
This form of the tag has a very simple form
<a name="anchor">
Usually, this doesn't have a closing tag.
We will see in the source version how we use the name.
The source tag is a little more complicated. The href attribute specifies the URL of the web page we are linking to. It can be any URL using any service. You can also give the link a name that can be used as the destination of some other link. The target attribute is the name of the frame or window where web page will appear. The text or other object (images are common) that will be highlighted is between the <a> and the </a> tags.
This example shows some variations on links.