Common Gateway Interface: CGI

As the web grew, developers needed a way for web pages to interact with other programs on the server machine. The Common Gateway Interface was developed to do this. CGI is a part of the HTTP protocol that is used to transmit web page requests to the web server. The CGI allows the web page developer to request that the web server run programs and return the results to the web browser. Information can be set from the page back to the server and passed to the requested program.

We have seen this already in the form of the action attribute on a form tag. The request to run a program comes in the form of a URL. This URL ends in the name of a program rather than an HTML file. Information can be passed to the program in two ways.

Using the get method, the data from the web page is appended to the URL after the program name. A question mark is used to separate the parts. When using the post method, nothing is appended to the URL. When the program is started, the web page data is sent to it from the standard input. This is as if the data had been typed in from the keyboard. Another distinction between these two methods is one of size. The get method may have size limits on the data while the post method doesn't.

In either case, the data consists of a bunch of name-value pairs. Each named element of a form has an associated value. Each pair is separated by ampersands. Special characters are encoded since different operating systems handle them differently. Spaces are replaced with plus signs. Other characters are replaced with a percent sign followed by the ASCII code for the character.

radio 1 radio 2 radio 3
<form name="form1" method="get" action="../formtest.pl">
	<input name="box1" type="text">
	<table border="1">
   <tr>
      <td> radio 1<input type="radio" name="rad2" value="radio1"> </td>
      <td> radio 2<input type="radio" name="rad2" value="radio2"> </td>
      <td> radio 3<input type="radio" name="rad2" value="radio3"> </td>
   </tr>
   </table>
	<select name="select1">
      <option value="opt1"> option 1</option>
		<option value="opt2"> option 2</option>
		<option value="opt3"> option 3</option>
	</select>
	<input type="hidden" name="secret" value="hidden-data">
	<input type="submit" value="Submit">
</form>

I made this selections on the form
NameValue
box1kent's test-form;
rad2middle button selected
select1second option selected
If we submit this form with some selections made, the web browser sends the following URL to the web server.

http://www.archie-perkins.com/teaching/devry/comp313/web/formtest.pl?box1=kent%27s+test-form%3B&rad2=radio2&select1=opt2&secret=hidden-data
Let's look at the pieces of this URL. The first part
http://www.archie-perkins.com/teaching/devry/comp313/web/formtest.pl
Is the full URL for the program we put in the action attribute.

After this is a question mark (?) which separates the program name from the rest of the data. After the question mark is the encoded data from the form. There are three name-value pairs in the data, separated by ampersands. The three pairs are:
NameValue
box1kent%27s+test-form%3B
rad2radio2
select1opt2
secrethidden-data

There is one name-value pair for each element on the form. The data in the text box is encoded as described above. The apostrophe (') is changed to %27 where 27 is the ASCII code for the character in hex. The spaces were changed to pluses. The semi-colon (;) at the end is also changed to the ASCII code in hex. All the radio buttons are named rad2 and the value is the value attribute of the button I selected. Similarly, I chose the second option and the value of the select1 field is the text between the start and end option tags. The hidden fields are always passed along and are useful for passing other data to the CGI program without the user having to remember them.

When this URL is received by the web server, it starts the program indicated. If this is a get method, the data string is stored in a special variable that the program can access. If the post method is used, the data is available to the program but it must read it in. In either case, the program must break the string into name-value pairs and then process the pairs. Fortunately, people have written libraries of subroutines in most common languages to make this easier. We will look at some examples in Perl, which is the most common CGI programming language.