This is the full text for Chapter 4 of CGI Programming 101. For source code and links from this chapter, click here.

Chapter 4: Processing Forms and Sending Mail

Most forms you create will send their data using the POST method. POST is more secure than GET, since the data isn't sent as part of the URL, and you can send more data with POST. Also, your browser, web server, or proxy server may cache GET queries, but posted data is resent each time.

Your web browser, when sending form data, encodes the data being sent. Alphanumeric characters are sent as themselves; spaces are converted to plus signs (+); other characters — like tabs, quotes, etc. — are converted to "%HH" — a percent sign and two hexadecimal digits representing the ASCII code of the character. This is called URL encoding.

In order to do anything useful with the data, your program must decode these. Fortunately the CGI.pm module does this work for you. You access the decoded form values the same way you did with GET:

$value = param('fieldname');

So you already know how to process forms! You can try it now by changing your getform.html form to method="POST" (rather than method="GET"). You'll see that it works identically whether you use GET or POST. Even though the data is sent differently, CGI.pm handles it for you automatically.

The Old Way of Decoding Form Data

Before CGI.pm was bundled with Perl, CGI programmers had to write their own form-parsing code. If you read some older CGI books (including the first edition of this book), or if you're debugging old code, you'll probably encounter the old way of decoding form data. Here's what it looks like:

read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
@pairs = split(/&/, $buffer);
foreach $pair (@pairs) {
    ($name, $value) = split(/=/, $pair);
    $value =~ tr/+/ /;
    $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
    $FORM{$name} = $value;
}

This code block reads the posted form data from standard input, loops through the fieldname=value fields in the form, and uses the pack function to do URL-decoding. Then it stores each fieldname/value pair in a hash called %FORM.

This code is deprecated and should be avoided; use CGI.pm instead. If you want to upgrade an old program that uses the above code block, you can replace it with this:

my %FORM;
foreach my $field (param()) {
    $FORM{$field} = param($field);
}

Or you could use the Vars function:

use CGI qw(:standard Vars);
my %FORM = Vars();

Either method will replace the old form-parsing code, although keep in mind that this will not work if your form has multiple fields with the same name. We'll look at how to handle those in the next chapter.

Guestbook Form

One of the first CGI programs you're likely to want to add to your website is a guestbook program, so let's start writing one. First create your HTML form. The actual fields can be up to you, but a bare minimum might look like this:

<form action="post.cgi" method="POST">
Your Name: <input type="text" name="name"><br>
Email Address: <input type="text" name="email"><br>
Comments:<br>
<textarea name="comments" rows="5" 
   cols="60"></textarea><br>
<input type="submit" value="Send">
</form>

Source code: http://www.cgi101.com/book/ch4/guestbook1.html

(Stylistically it's better NOT to include a "reset" button on forms like this. It's unlikely the visitor will want to erase what they've typed, and more likely they'll accidentally hit "reset" instead of "send", which can be an aggravating experience. They may not bother to re-fill the form in such cases.)

Now you need to create post.cgi. This is nearly identical to the get.cgi from last chapter, so you may just want to copy that program and make changes:

Program 4-1: post.cgi - Form Processing Program Using POST

#!/usr/bin/perl -wT
use CGI qw(:standard);
use CGI::Carp qw(warningsToBrowser fatalsToBrowser);
use strict;

print header;
print start_html("Thank You");
print h2("Thank You");

my %form;
foreach my $p (param()) {
    $form{$p} = param($p);
    print "$p = $form{$p}<br>\n";
}
print end_html;

Source code: http://www.cgi101.com/book/ch4/post-cgi.html

Working example: http://www.cgi101.com/book/ch4/form.html

Test your program by entering some data into the fields, and pressing "send" when finished. Notice that the data is not sent in the URL this time, as it was with the GET example.

Of course, this form doesn't actually DO anything with the data, which doesn't make it much of a guestbook. Let's see how to send the data in e-mail.

Sending Mail

There are several ways to send mail. We'll be using the sendmail program for these examples. If you're using a non-Unix system (or a Unix without sendmail installed), there are a number of third-party Perl modules that you can use to achieve the same effect. See http://search.cpan.org/ (search for "sendmail") for a list of platform-independent mailers, and Chapter 14 for examples of how to install third-party modules. If you're using ActivePerl on Windows, visit http://www.cgi101.com/book/ch4/ for a link to more information about sending mail from Windows.

Before you can write your form-to-mail CGI program, you'll need to figure out where the sendmail program is installed on your webserver. (For cgi101.com, it's in /usr/sbin/sendmail. If you're not sure where it is, try doing which sendmail or whereis sendmail; usually one of these two commands will yield the correct location.)

Since we're using the -T flag for taint checking, the first thing you need to do before connecting to sendmail is set the PATH environment variable:

$ENV{PATH} = "/usr/sbin";

The path should be the directory where sendmail is located; if sendmail is in /usr/sbin/sendmail, then $ENV{PATH} should be "/usr/sbin". If it's in /var/lib/sendmail, then $ENV{PATH} should be "/var/lib".

Next you open a pipe to the sendmail program:

open (MAIL, "|/usr/sbin/sendmail -t -oi") or 
    die "Can't fork for sendmail: $!\n";

The pipe (which is indicated by the | character) causes all of the output printed to that filehandle (MAIL) to be fed directly to the /usr/sbin/sendmail program as if it were standard input to that program. Several flags are also passed to sendmail:

-t		Read message for recipients. To:, Cc:, and Bcc: lines will be scanned for recipient addresses
-oi		Ignore dots alone on lines by themselves in incoming messages.

The -t flag tells sendmail to look at the message headers to determine who the mail is being sent to. You'll have to print all of the message headers yourself:

my $recipient = '[email protected]';

print MAIL "From: sender\@cgi101.com\n";
print MAIL "To: $recipient\n";
print MAIL "Subject: Guestbook Form\n\n";

Remember that you can safely put an @-sign inside a single-quoted string, like '[email protected]', or you can escape the @-sign in double-quoted strings by using a backslash ("sender\@cgi101.com").

The message headers are complete when you print a single blank line following the header lines. We've accomplished this by printing two newlines at the end of the subject header:

print MAIL "Subject: Guestbook Form\n\n";

After that, you can print the body of your message.

Let's try it. Start a new file named guestbook.cgi, and edit it as follows. You don't need to include the comments in the following code; they are just there to show you what's happening.

Program 4-2: guestbook.cgi - Guestbook Program

#!/usr/bin/perl -wT
use CGI qw(:standard);
use CGI::Carp qw(warningsToBrowser fatalsToBrowser);
use strict;

print header;
print start_html("Results");

# Set the PATH environment variable to the same path
# where sendmail is located:

$ENV{PATH} = "/usr/sbin";

# open the pipe to sendmail
open (MAIL, "|/usr/sbin/sendmail -oi -t") or 
    &dienice("Can't fork for sendmail: $!\n");

# change this to your own e-mail address
my $recipient = '[email protected]';

# Start printing the mail headers
# You must specify who it's to, or it won't be delivered:

print MAIL "To: $recipient\n";

# From should probably be the webserver.

print MAIL "From: nobody\@cgi101.com\n";

# print a subject line so you know it's from your form cgi.

print MAIL "Subject: Form Data\n\n";

# Now print the body of your mail message.
foreach my $p (param()) {
    print MAIL "$p = ", param($p), "\n";
}

# Be sure to close the MAIL input stream so that the
# message actually gets mailed.

close(MAIL);

# Now print a thank-you page 

print <<EndHTML;
<h2>Thank You</h2>
<p>Thank you for writing!</p>
<p>Return to our <a href="index.html">home page</a>.</p>
EndHTML

print end_html;

# The dienice subroutine handles errors.

sub dienice {
    my($errmsg) = @_;
    print "<h2>Error</h2>\n";
    print "<p>$errmsg</p>\n";
    print end_html;
    exit;
}

Source code: http://www.cgi101.com/book/ch4/guestbook-cgi.html

Save and chmod the file, then modify your guestbook.html form so that the action points to guestbook.cgi:

<form action="guestbook.cgi" method="POST">

Try testing the form. If the program runs successfully, you'll get e-mail in a few moments with the results of your post. (Remember to change $recipient to your email address!)

Subroutines

In the guestbook program we used a new structure: a subroutine called "dienice." A subroutine is a user-defined function. You've already used functions like param and start_html from the CGI.pm module, and built-in functions like shift and pop. You can also define your own custom functions.

In the mail program, the dienice subroutine is only called if the program can't open the pipe to sendmail. Rather than aborting and giving you a server error (or worse, NO error), you want your program to give you some useful data about what went wrong; dienice does that, by printing the error message and closing HTML tags, and exiting the program. We'll be using the dienice subroutine throughout the rest of the book, as a generic catch-all error-handler.

Subroutines are useful for isolating blocks of code that are reused frequently in your program. The structure of a subroutine is as follows:

sub subname {
    # your code here
}

The subroutine block starts with the word sub, followed by the name of the subroutine. The code for the subroutine is then enclosed in curly braces { }.

Subroutines can be placed anywhere in your program, though for readability it's usually best to put them at the end, after the main program code.

To invoke a subroutine, enter the subroutine name and an optional list of arguments:

subname;
subname(arguments);

You may prefix the subroutine name with an &-sign:

&subname;
&subname(arguments);

The &-sign is optional. However, we'll be using this syntax throughout the book to differentiate calls to subroutines we've written ourselves. Calls to built-in functions or functions provided by external modules will not have this sign.

Here is an example of a call to a subroutine named "mysub" with three arguments:

&mysub($arg1, "whatever", 23);

The arguments are passed to the subroutine in the special Perl array @_. You can then assign the elements of that array to special temporary variables, like so:

sub mysub {
    my($arg1, $arg2, $arg3) = @_;
    # your code here
}

In this example, the my function limits the scope of $arg1, $arg2 and $arg3 to the mysub subroutine. This keeps your temporary variables visible only to the subroutine itself (where they're actually needed and used), rather than to the entire program (where they're not needed). This also means if you change one of the variables inside your subroutine, the value of the original variable won't change (unless it's a reference, which we'll look at next).

Passing Arrays and Hashes to Subroutines

When passing an array (or a hash) to a subroutine, the array is expanded into a list of its values. This might be okay if the array is the only argument:

&subname(@array1);

However if you have multiple arguments, you're going to run into problems:

&subname(@array1, $item2, $item3);

sub subname {
    my(@ary, $arg2, $arg3) = @_;
}

In this example, all of the arguments (including $item2 and $item3) are stored in @ary, and $arg2 and $arg3 are undefined. In order to pass the array or hash properly to the subroutine, you need to pass it as a reference, by prefixing the @ (or %) by a backslash:

&subname(\@array1, $item2, \%hash1);

sub subname {
    my($arrayref, $arg2, $hashref) = @_;
}

Now $arrayref is a reference to @array1, $arg2 is whatever the value of $item2 is, and $hashref is a reference to %hash1. To access individual elements of an array reference, instead of using $arrayref[1], you use $arrayref->[1]. Similarly with a hash reference you use $hashref->{key} instead of $hashref{key}.

A reference is a pointer to the original variable. If you change the value of an element of an array reference, you're changing the original array's values.

Optionally you could dereference the array inside your subroutine by doing:

my @localary = @{$arrayref};

A hash is dereferenced like so:

my %localhash = %{$hashref};

A dereferenced array (or hash) is localized to your subroutine, so you can change the values of @newarray or %newhash without altering the original variables.

You can find out a lot more about references by reading perldoc perlref and perldoc perlreftut (the Perl reference tutorial).

Subroutine Return Values

Subroutines can return a value:

sub subname {
    # your code here
    return $somevalue;
}

If you omit the return statement, then the value returned by the subroutine is the value of the last expression executed in that routine.

If you want to save the return value, be sure to assign it to a variable:

my $result = &subname(arguments);

Subroutines can also return a list:

sub subname {
    # your code here
    return $value1, $value2, 'foo';
}

Which can then be assigned to a list of variables:

my ($x, $y, $z) = &subname(arguments);

Or an array:

my @x = &subname(arguments);

Return vs. Exit

You'll notice that our dienice subroutine does not return a value at all, but rather calls the exit function. exit causes the entire program to terminate immediately.

Sendmail Subroutine

Here is an example of the mail-sending code in a compact subroutine:

sub sendmail {
    my ($from, $to, $subject, $message) = @_;
    $ENV{PATH} = "/usr/sbin";
    open (MAIL, "|/usr/sbin/sendmail -oi -t") or 
        &dienice("Can't fork for sendmail: $!\n");
    print MAIL "To: $to\n";
    print MAIL "From: $from\n";
    print MAIL "Subject: $subject\n\n";
    print MAIL "$message\n";
    close(MAIL);
}

Sending Mail to More Than One Recipient

If you want to send mail to more than one email address, just add the desired addresses to the $recipient line, separated by commas:

my $recipient = '[email protected], [email protected], [email protected]';

Defending Against Spammers

When building form-to-mail programs, you need to take precautions to prevent spammers from hijacking your programs to send unwanted e-mail to other recipients. They can do this by writing their own form (or program) to send data to your CGI. If your program prints any of the form fields as mail headers without checking them first, the spammer can insert their own mail headers (and even their own message). The end result: your program becomes a relay for spammers.

The primary defense against this is to not allow the form to specify ANY of the mail headers (such as the From, To, or Subject headers). Note that in our guestbook program, the From, To and Subject headers were all hardcoded in the program.

Of course, it would be nice to have the "From" header show the poster's e-mail address. You could allow this if you validate it first, verifying that it's really an e-mail address and doesn't contain any extra headers. You can validate e-mail addresses by using a regular expression pattern match, which we'll cover in Chapter 13, or by using the Email::Valid module, which we'll look at in Chapter 14.