Chapter 3: CGI Environment Variables
Environment variables are a series of hidden values that the web server sends to every CGI program you run. Your program can parse them and use the data they send. Environment variables are stored in a hash named %ENV
:
Key | Value |
DOCUMENT_ROOT | The root directory of your server |
HTTP_COOKIE | The visitor's cookie, if one is set |
HTTP_HOST | The hostname of the page being attempted |
HTTP_REFERER | The URL of the page that called your program |
HTTP_USER_AGENT | The browser type of the visitor |
HTTPS | "on" if the program is being called through a secure server |
PATH | The system path your server is running under |
QUERY_STRING | The query string (see GET, below) |
REMOTE_ADDR | The IP address of the visitor |
REMOTE_HOST | The hostname of the visitor (if your server has reverse-name-lookups on; otherwise this is the IP address again) |
REMOTE_PORT | The port the visitor is connected to on the web server |
REMOTE_USER | The visitor's username (for .htaccess-protected pages) |
REQUEST_METHOD | GET or POST |
REQUEST_URI | The interpreted pathname of the requested document or CGI (relative to the document root) |
SCRIPT_FILENAME | The full pathname of the current CGI |
SCRIPT_NAME | The interpreted pathname of the current CGI (relative to the document root) |
SERVER_ADMIN | The email address for your server's webmaster |
SERVER_NAME | Your server's fully qualified domain name (e.g. www.cgi101.com) |
SERVER_PORT | The port number your server is listening on |
SERVER_SOFTWARE | The server software you're using (e.g. Apache 1.3) |
Some servers set other environment variables as well; check your server documentation for more information. Notice that some environment variables give information about your server, and will never change (such as SERVER_NAME and SERVER_ADMIN), while others give information about the visitor, and will be different every time someone accesses the program.
Not all environment variables get set. REMOTE_USER is only set for pages in a directory or subdirectory that's password-protected via a .htaccess file. (See Chapter 20 to learn how to password protect a directory.) And even then, REMOTE_USER will be the username as it appears in the .htaccess file; it's not the person's email address. There is no reliable way to get a person's email address, short of asking them for it with a web form.
You can print the environment variables the same way you would any hash value:
print "Caller = $ENV{HTTP_REFERER}\n";
Let's try printing some environment variables. Start a new file named env.cgi:
Program 3-1: env.cgi - Print Environment Variables Program#!/usr/bin/perl -wT use strict; use CGI qw(:standard); use CGI::Carp qw(warningsToBrowser fatalsToBrowser); print header; print start_html("Environment"); foreach my $key (sort(keys(%ENV))) { print "$key = $ENV{$key}<br>\n"; } print end_html;
Source code: http://www.cgi101.com/book/ch3/env-cgi.html
Working example: http://www.cgi101.com/book/ch3/env.cgi
Save the file, chmod 755 env.cgi, then try it in your web browser. Compare the environment variables displayed with the list on the previous page. Notice which values show information about your server and CGI program, and which ones give away information about you (such as your browser type, computer operating system, and IP address).
Let's look at several ways to use some of this data.
Referring Page
When you click on a hyperlink on a web page, you're being referred to another page. The web server for the receiving page keeps track of the referring page, and you can access the URL for that page via the HTTP_REFERER environment variable. Here's an example:
Program 3-2: refer.cgi - HTTP Referer Program#!/usr/bin/perl -wT use CGI qw(:standard); use CGI::Carp qw(warningsToBrowser fatalsToBrowser); use strict; print header; print start_html("Referring Page"); print "Welcome, I see you've just come from $ENV{HTTP_REFERER}!<p>\n"; print end_html;
Source code: http://www.cgi101.com/book/ch3/refer-cgi.html
Working example: http://www.cgi101.com/book/ch3/ (click on refer.cgi)
Remember, HTTP_REFERER only gets set when a visitor actually clicks on a link to your page. If they type the URL directly (or use a bookmarked URL), then HTTP_REFERER is blank. To properly test your program, create an HTML page with a link to refer.cgi, then click on the link:
HTTP_REFERER is not a foolproof method of determining what page is accessing your program. It can easily be forged.
Remote Host Name, and Hostname Lookups
You've probably seen web pages that greet you with a message like "Hello, visitor from (yourhost)!", where (yourhost) is the hostname or IP address you're currently logged in with. This is a pretty easy thing to do because your IP address is stored in the %ENV hash.
If your web server is configured to do hostname lookups, then you can access the visitor's actual hostname from the $ENV{REMOTE_HOST} value. Servers often don't do hostname lookups automatically, though, because it slows down the server. Since $ENV{REMOTE_ADDR} contains the visitor's IP address, you can reverse-lookup the hostname from the IP address using the Socket module in Perl. As with CGI.pm, you have to use the Socket module:
use Socket;
(There is no need to add qw(:standard)
for the Socket module.)
The Socket module offers numerous functions for socket programming (most of which are beyond the scope of this book). We're only interested in the reverse-IP lookup for now, though. Here's how to do the reverse lookup:
my $ip = "209.189.198.102"; my $hostname = gethostbyaddr(inet_aton($ip), AF_INET);
There are actually two functions being called here: gethostbyaddr
and inet_aton
. gethostbyaddr
is a built-in Perl function that returns the hostname for a particular IP address. However, it requires the IP address be passed to it in a packed 4-byte format. The Socket module's inet_aton
function does this for you.
Let's try it in a CGI program. Start a new file called rhost.cgi, and enter the following code:
Program 3-3: rhost.cgi - Remote Host Program#!/usr/bin/perl -wT use CGI qw(:standard); use CGI::Carp qw(warningsToBrowser fatalsToBrowser); use strict; use Socket; print header; print start_html("Remote Host"); my $hostname = gethostbyaddr(inet_aton($ENV{REMOTE_ADDR}), AF_INET); print "Welcome, visitor from $hostname!<p>\n"; print end_html;
Source code: http://www.cgi101.com/book/ch3/rhost-cgi.html
Working example: http://www.cgi101.com/book/ch3/rhost.cgi
Detecting Browser Type
The HTTP_USER_AGENT environment variable contains a string identifying the browser (or "user agent") accessing the page. Unfortunately there is no standard (yet) for user agent strings, so you will see a vast assortment of different strings. Here's a sampling of some:
-
DoCoMo/1.0/P502i/c10 (Google CHTML Proxy/1.0)
Firefly/1.0 (compatible; Mozilla 4.0; MSIE 5.5)
Googlebot/2.1 (+http://www.googlebot.com/bot.html)
Mozilla/3.0 (compatible)
Mozilla/4.0 (compatible; MSIE 4.01; MSIECrawler; Windows 95)
Mozilla/4.0 (compatible; MSIE 5.0; MSN 2.5; AOL 8.0; Windows 98; DigExt)
Mozilla/4.0 (compatible; MSIE 5.0; Mac_PowerPC)
Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt; Hotbar 4.1.7.0)
Mozilla/4.0 (compatible; MSIE 6.0; AOL 9.0; Windows NT 5.1)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; DigExt)
Mozilla/4.0 WebTV/2.6 (compatible; MSIE 4.0)
Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:1.0.2) Gecko/20020924 AOL/7.0
Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:1.0.2) Gecko/20021120 Netscape/7.01
Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-us) AppleWebKit/85 (KHTML, like Gecko) Safari/85
Mozilla/5.0 (Windows; U; Win98; en-US; m18) Gecko/20010131 Netscape6/6.01
Mozilla/5.0 (Slurp/cat; [email protected]; http://www.inktomi.com/slurp.html)
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5a) Gecko/20030718
Mozilla/5.0 (compatible; Konqueror/3.0-rc3; i686 Linux; 20020913)
NetNewsWire/1.0 (Mac OS X; Pro; http://ranchero.com/netnewswire/)
Opera/6.0 (Windows 98; U) [en]
Opera/7.10 (Linux 2.4.19 i686; U) [en]
Scooter/3.3
As you can see, sometimes the user agent string reveals what type of browser and computer the visitor is using, and sometimes it doesn't. Some of these aren't even browsers at all, like the search engine robots (Googlebot, Inktomi and Scooter) and RSS reader (NetNewsWire). You should be careful about writing programs (and websites) that do browser detection. It's one thing to collect browser info for logging purposes; it's quite another to design your entire site exclusively for a certain browser. Visitors will be annoyed if they can't access your site because you think they have the "wrong" browser.
That said, here's an example of how to detect the browser type. This program uses Perl's index
function to see if a particular substring (such as "MSIE") exists in the HTTP_USER_AGENT string. index
is used like so:
index(string, substring);
It returns a numeric value indicating where in the string the substring appears, or -1 if the substring does not appear in the string. We use an if/else block in this program to see if the index is greater than -1.
Program 3-4: browser.cgi - Browser Detection Program#!/usr/bin/perl -wT use CGI qw(:standard); use CGI::Carp qw(warningsToBrowser fatalsToBrowser); use strict; print header; print start_html("Browser Detect"); my($ua) = $ENV{HTTP_USER_AGENT}; print "User-agent: $ua<p>\n"; if (index($ua, "MSIE") > -1) { print "Your browser is Internet Explorer.<p>\n"; } elsif (index($ua, "Netscape") > -1) { print "Your browser is Netscape.<p>\n"; } elsif (index($ua, "Safari") > -1) { print "Your browser is Safari.<p>\n"; } elsif (index($ua, "Opera") > -1) { print "Your browser is Opera.<p>\n"; } elsif (index($ua, "Mozilla") > -1) { print "Your browser is probably Mozilla.<p>\n"; } else { print "I give up, I can't tell what browser you're using!<p>\n"; } print end_html;
Source code: http://www.cgi101.com/book/ch3/browser-cgi.html
Working example: http://www.cgi101.com/book/ch3/browser.cgi
If you have several different browsers installed on your computer, try testing the program with each of them.
We'll look more at if/else blocks in Chapter 5.
A Simple Form Using GET
There are two ways to send data from a web form to a CGI program: GET and POST. These methods determine how the form data is sent to the server.
With the GET method, the input values from the form are sent as part of the URL and saved in the QUERY_STRING environment variable. With the POST method, data is sent as an input stream to the program. We'll cover POST in the next chapter, but for now, let's look at GET.
You can set the QUERY_STRING value in a number of ways. For example, here are a number of direct links to the env.cgi program:
http://www.cgi101.com/book/ch3/env.cgi?test1 http://www.cgi101.com/book/ch3/env.cgi?test2 http://www.cgi101.com/book/ch3/env.cgi?test3
Try opening each of these in your web browser. Notice that the value for QUERY_STRING is set to whatever appears after the question mark in the URL itself. In the above examples, it's set to "test1", "test2", and "test3" respectively.
You can also process simple forms using the GET method. Start a new HTML document called envform.html, and enter this form:
Program 3-5: envform.html - Simple HTML Form Using GET<html><head><title>Test Form</title></head> <body> <form action="env.cgi" method="GET"> Enter some text here: <input type="text" name="sample_text" size=30> <input type="submit"><p> </form> </body></html>
Working example: http://www.cgi101.com/book/ch3/envform.html
Save the form and upload it to your website. Remember you may need to change the path to env.cgi depending on your server; if your CGI programs live in a "cgi-bin" directory then you should use action="cgi-bin/env.cgi".
Bring up the form in your browser, then type something into the input field and hit return. You'll notice that the value for QUERY_STRING now looks like this:
sample_text=whatever+you+typed
The string to the left of the equals sign is the name of the form field. The string to the right is whatever you typed into the input box. Notice that any spaces in the string you typed have been replaced with a +. Similarly, various punctuation and other special non-alphanumeric characters have been replaced with a %-code. This is called URL-encoding, and it happens with data submitted through either GET or POST methods.
You can send multiple input data values with GET:
<form action="env.cgi" method="GET"> First Name: <input type="text" name="fname" size=30><p> Last Name: <input type="text" name="lname" size=30><p> <input type="submit"> </form>
This will be passed to the env.cgi program as follows:
$ENV{QUERY_STRING} = "fname=joe&lname=smith"
The two form values are separated by an ampersand (&). You can divide the query string with Perl's split
function:
my @values = split(/&/,$ENV{QUERY_STRING});
split
lets you break up a string into a list of strings, splitting on a specific character. In this case, we've split on the "&" character. This gives us an array named @values containing two elements: ("fname=joe", "lname=smith"). We can further split each string on the "=" character using a foreach loop:
foreach my $i (@values) { my($fieldname, $data) = split(/=/, $i); print "$fieldname = $data<br>\n"; }
This prints out the field names and the data entered into each field in the form. It does not do URL-decoding, however. A better way to parse QUERY_STRING variables is with CGI.pm.
Using CGI.pm to Parse the Query String
If you're sending more than one value in the query string, it's best to use CGI.pm to parse it. This requires that your query string be of the form:
fieldname1=value1
For multiple values, it should look like this:
fieldname1=value1&fieldname2=value2&fieldname3=value3
This will be the case if you are using a form, but if you're typing the URL directly then you need to be sure to use a fieldname, an equals sign, then the field value.
CGI.pm provides these values to you automatically with the param
function:
param('fieldname');
This returns the value entered in the fieldname field. It also does the URL-decoding for you, so you get the exact string that was typed in the form field.
You can get a list of all the fieldnames used in the form by calling param
with no arguments:
my @fieldnames = param();
param is NOT a Variable
param
is a function call. You can't do this:
print "$p = param($p)<br>\n";
If you want to print the value of param($p)
, you can print it by itself:
print param($p);
Or call param
outside of the double-quoted strings:
print "$p = ", param($p), "<br>\n";
You won't be able to use param('fieldname')
inside a here-document. You may find it easier to assign the form values to individual variables:
my $firstname = param('firstname'); my $lastname = param('lastname');
Another way would be to assign every form value to a hash:
my(%form); foreach my $p (param()) { $form{$p} = param($p); }
You can achieve the same result by using CGI.pm's Vars
function:
use CGI qw(:standard Vars); my %form = Vars();
The Vars
function is not part of the "standard" set of CGI.pm functions, so it must be included specifically in the use
statement.
Either way, after storing the field values in the %form
hash, you can refer to the individual field names by using $form{'fieldname'}
. (This will not work if you have a form with multiple fields having the same field name.)
Let's try it now. Create a new form called getform.html:
Program 3-6: getform.html - Another HTML Form Using GET<html><head><title>Test Form</title></head> <body> <form action="get.cgi" method="GET"> First Name: <input type="text" name="firstname" size=30><br> Last Name: <input type="text" name="lastname" size=30><br> <input type="submit"><p> </form> </body></html>
Working example: http://www.cgi101.com/book/ch3/getform.html
Save and upload it to your webserver, then bring up the form in your web browser.
Now create the CGI program called get.cgi:
Program 3-7: get.cgi Form Processing Program Using GET#!/usr/bin/perl -wT use CGI qw(:standard); use CGI::Carp qw(warningsToBrowser fatalsToBrowser); use strict; print header; print start_html("Get Form"); my %form; foreach my $p (param()) { $form{$p} = param($p); print "$p = $form{$p}<br>\n"; } print end_html;
Source code: http://www.cgi101.com/book/ch3/get-cgi.html
Save and chmod 755 get.cgi. Now fill out the form in your browser and press submit. If you encounter errors, refer back to Chapter 1 for debugging.
Take a look at the full URL of get.cgi after you press submit. You should see all of your form field names and the data you typed in as part of the URL. This is one reason why GET is not the best method for handling forms; it isn't secure.
GET is NOT Secure
GET is not a secure method of sending data. Don't use it for forms that send password info, credit card data or other sensitive information. Since the data is passed through as part of the URL, it'll show up in the web server's logfile (complete with all the data). Server logfiles are often readable by other users on the system. URL history is also saved in the browser and can be viewed by anyone with access to the computer. Private information should always be sent with the POST method, which we'll cover in the next chapter. (And if you're asking visitors to send sensitive information like credit card numbers, you should also be using a secure server in addition to the POST method.)
There may also be limits to how much data can be sent with GET. While the HTTP protocol doesn't specify a limit to the length of a URL, certain web browsers and/or servers may.
Despite this, the GET method is often the best choice for certain types of applications. For example, if you have a database of articles, each with a unique article ID, you would probably want a single article.cgi program to serve up the articles. With the article ID passed in by the GET method, the program would simply look at the query string to figure out which article to display:
<a href="article.cgi?id=22">Article Name</a>
We'll be revisiting that idea later in the book. For now, let's move on to Chapter 4 where we'll see how to process forms using the POST method.