This is the full text for Chapter 6 of CGI Programming 101. For source code and links from this chapter, click here.

Chapter 6: Reading and Writing Data Files

As you start to program more advanced CGI applications, you'll want to store data so you can use it later. Maybe you have a guestbook program and want to keep a log of the names and email addresses of visitors, or a page counter that must update a counter file, or a program that scans a flat-file database and draws info from it to generate a page. You can do this by reading and writing data files (often called file I/O).

File Permissions

Most web servers run with very limited permissions; this protects the server (and the system it's running on) from malicious attacks by users or web visitors. On Unix systems, the web process runs under its own userid, typically the "web" or "nobody" user. Unfortunately this means the server doesn't have permission to create files in your directory. In order to write to a data file, you must usually make the file (or the directory where the file will be created) world-writable — or at least writable by the web process userid. In Unix a file can be made world-writable using the chmod command:

chmod 666 myfile.dat

To set a directory world-writable, you'd do:

chmod 777 directoryname

See Appendix A for a chart of the various chmod permissions.

Unfortunately, if the file is world-writable, it can be written to (or even deleted) by other users on the system. You should be very cautious about creating world-writable files in your web space, and you should never create a world-writable directory there. (An attacker could use this to install their own CGI programs there.) If you must have a world-writable directory, either use /tmp (on Unix), or a directory outside of your web space. For example if your web pages are in /home/you/public_html, set up your writable files and directories in /home/you.

A much better solution is to configure the server to run your programs with your userid. Some examples of this are CGIwrap (platform independent) and suEXEC (for Apache/Unix). Both of these force CGI programs on the web server to run under the program owner's userid and permissions. Obviously if your CGI program is running with your userid, it will be able to create, read and write files in your directory without needing the files to be world-writable.

The Apache web server also allows the webmaster to define what user and group the server runs under. If you have your own domain, ask your webmaster to set up your domain to run under your own userid and group permissions.

Permissions are less of a problem if you only want to read a file. If you set the file permissions so that it is group- and world-readable, your CGI programs can then safely read from that file. Use caution, though; if your program can read the file, so can the webserver, and if the file is in your webspace, someone can type the direct URL and view the contents of the file. Be sure not to put sensitive data in a publicly readable file.

Opening Files

Reading and writing files is done by opening a file and associating it with a filehandle. This is done with the statement:

open(filehandle,filename);

The filename may be prefixed with a >, which means to overwrite anything that's in the file now, or with a >>, which means to append to the bottom of the existing file. If both > and >> are omitted, the file is opened for reading only. Here are some examples:

open(INF,"out.txt");            # opens out.txt for reading
open(OUTF,">out.txt");          # opens out.txt for overwriting
open(OUTF,">>out.txt");         # opens out.txt for appending
open(FH, "+<out.txt");          # opens existing file out.txt for reading AND writing

The filehandles in these cases are INF, OUTF and FH. You can use just about any name for the filehandle.

Also, a warning: your web server might do strange things with the path your programs run under, so it's possible you'll have to use the full path to the file (such as /home/you/public_html/somedata.txt), rather than just the filename. This is generally not the case with the Apache web server, but some other servers behave differently. Try opening files with just the filename first (provided the file is in the same directory as your CGI program), and if it doesn't work, then use the full path.

One problem with the above code is that it doesn't check the return value of open to ensure the file was really opened. open returns nonzero upon success, or undef (which is a false value) otherwise. The safe way to open a file is as follows:

open(OUTF,">outdata.txt") or &dienice("Can't open outdata.txt for writing: $!");

This uses the "dienice" subroutine we wrote in Chapter 4 to display an error message and exit if the file can't be opened. You should do this for all file opens, because if you don't, your CGI program will continue running even if the file isn't open, and you could end up losing data. It can be quite frustrating to realize you've had a survey running for several weeks while no data was being saved to the output file.

The $! in the above example is a special Perl variable that stores the error code returned by the failed open statement. Printing it may help you figure out why the open failed.

Guestbook Form with File Write

Let's try this by modifying the guestbook program you wrote in Chapter 4. The program already sends you e-mail with the information; we're going to have it write its data to a file as well.

First you'll need to create the output file and make it writable, because your CGI program probably can't create new files in your directory. If you're using Unix, log into the Unix shell, cd to the directory where your guestbook program is located, and type the following:

touch guestbook.txt
chmod 622 guestbook.txt

The Unix touch command, in this case, creates a new, empty file called "guestbook.txt". (If the file already exists, touch simply updates the last-modified timestamp of the file.) The chmod 622 command makes the file read/write for you (the owner), and write-only for everyone else.

If you don't have Unix shell access (or you aren't using a Unix system), you should create or upload an empty file called guestbook.txt in the directory where your guestbook.cgi program is located, then adjust the file permissions on it using your FTP program.

Now you'll need to modify guestbook.cgi to write to the file:

Program 6-1: guestbook.cgi - Guestbook Program With File Write

#!/usr/bin/perl -wT
use CGI qw(:standard);
use CGI::Carp qw(warningsToBrowser fatalsToBrowser);
use strict;

print header;
print start_html("Results");

# first print the mail message...

$ENV{PATH} = "/usr/sbin";
open (MAIL, "|/usr/sbin/sendmail -oi -t -odq") or 
   &dienice("Can't fork for sendmail: $!\n");
print MAIL "To: recipient\@cgi101.com\n";
print MAIL "From: nobody\@cgi101.com\n";
print MAIL "Subject: Form Data\n\n";
foreach my $p (param()) {
    print MAIL "$p = ", param($p), "\n";
}
close(MAIL);

# now write (append) to the file

open(OUT, ">>guestbook.txt") or &dienice("Couldn't open output file: $!");
foreach my $p (param()) {
    print OUT param($p), "|";
}
print OUT "\n";
close(OUT);

print <<EndHTML;
<h2>Thank You</h2>
<p>Thank you for writing!</p>
<p>Return to our <a href="index.html">home page</a>.</p>
EndHTML

print end_html;

sub dienice {
    my($errmsg) = @_;
    print "<h2>Error</h2>\n";
    print "<p>$errmsg</p>\n";
    print end_html;
    exit;
}

Source code: http://www.cgi101.com/book/ch6/guestbook-cgi.html

Working example: http://www.cgi101.com/book/ch6/guestbook.html

Now go back to your browser and fill out the guestbook form again. If your CGI program runs without any errors, you should see data added to the guestbook.txt file. The resulting file will show the submitted form data in pipe-separated form:

Someone|[email protected]|comments here

Ideally you'll have one line of data (or record) for each form that is filled out. This is what's called a flat-file database.

Unfortunately if the visitor enters multiple lines in the comments field, you'll end up with multiple lines in the data file. To remove the newlines, you should substitute newline characters (\n) as well as hard returns (\r). Perl has powerful pattern matching and replacement capabilities; it can match the most complex patterns in a string using regular expressions (see Chapter 13). The basic syntax for substitution is:

$mystring =~ s/pattern/replacement/;

This command substitutes "pattern" for "replacement" in the scalar variable $mystring. Notice the operator is a =~ (an equals sign followed by a tilde); this is Perl's binding operator and indicates a regular expression pattern match/substitution/replacement is about to follow.

Here is how to replace the end-of-line characters in your guestbook program:

foreach my $p (param()) {
    my $value = param($p);
    $value =~ s/\n/ /g;     # replace newlines with spaces
    $value =~ s/\r//g;      # remove hard returns
    print OUT "$p = $value,";
}

Go ahead and change your program, then test it again in your browser. View the guestbook.txt file in your browser or in a text editor and observe the results.

File Locking

CGI processes on a Unix web server can run simultaneously, and if two programs try to open and write the same file at the same time, the file may be erased, and you'll lose all of your data. To prevent this, you need to lock the files you are writing to. There are two types of file locks:

A shared lock allows more than one program (or other process) to access the file at the same time. A program should use a shared lock when reading from a file.
An exclusive lock allows only one program or process to access the file while the lock is held. A program should use an exclusive lock when writing to a file.

File locking is accomplished in Perl using the Fcntl module (which is part of the standard library), and the flock function. The use statement is like CGI.pm's:

use Fcntl qw(:flock);

The Fcntl module provides symbolic values (like abbreviations) representing the correct lock numbers for the flock function, but you must specify :flock in the use statement in order for Fctnl to export those values. The values are as follows:

LOCK_SH	shared lock
LOCK_EX	exclusive lock
LOCK_NB	non-blocking lock
LOCK_UN	unlock

These abbreviations can then be passed to flock. The flock function takes two arguments: the filehandle and the lock type, which is typically a number. The number may vary depending on what operating system you are using, so it's best to use the symbolic values provided by Fcntl. A file is locked after you open it (because the filehandle doesn't exist before you open the file):

open(FH, "filename") or &dienice("Can"t open file: $!");
flock(FH, LOCK_SH);

The lock will be released automatically when you close the file or when the program finishes.

Keep in mind that file locking is only effective if all of the programs that read and write to that file also use flock. Programs that don't will ignore the locks held by other processes.

Since flock may force your CGI program to wait for another process to finish writing to a file, you should also reset the file pointer, using the seek function:

seek(filehandle, offset, whence);

offset is the number of bytes to move the pointer, relative to whence, which is one of the following:

0	beginning of file
1	current file position
2	end of file

So seek(OUTF,0,2) repositions the pointer to the end of the file. If you were reading the file instead of writing to it, you'd want to do seek(OUTF,0,0) to reset the pointer to the beginning of the file.

The Fcntl module also provides symbolic values for the seek pointers:

SEEK_SET	beginning of file
SEEK_CUR	current file position
SEEK_END	end of file

To use these, add :seek to the use Fcntl statement:

use Fcntl qw(:flock :seek);

Now you can use seek(OUTF,0,SEEK_END) to reset the file pointer to the end of the file, or seek(OUTF,0,SEEK_SET) to reset it to the beginning of the file.

Closing Files

When you're finished writing to a file, it's best to close the file, like so:

close(filehandle);

Files are automatically closed when your program ends. File locks are released when the file is closed, so it is not necessary to actually unlock the file before closing it. (In fact, releasing the lock before the file is closed can be dangerous and cause you to lose data.)

Reading Files

There are two ways you can handle reading data from a file: you can either read one line at a time, or read the entire file into an array. Here's an example:

open(FH,"guestbook.txt") or &dienice("Can't open guestbook.txt: $!");

my $a = <FH>;     # reads one line from the file into 
                    # the scalar $a
my @b = <FH>;     # reads the ENTIRE FILE into array @b

close(FH);      # closes the file

If you were to use this code in your program, you'd end up with the first line of guestbook.txt being stored in $a, and the remainder of the file in array @b (with each element of @b containing one line of data from the file). The actual read occurs with <filehandle>; the amount of data read depends on the type of variable you save it into.

The following section of code shows how to read the entire file into an array, then loop through each element of the array to print out each line:

open(FH,"guestbook.txt") or &dienice("Can"t open guestbook.txt: $!");
my @ary = <FH>;
close(FH);

foreach my $line (@ary) {
    print $line;
}

This code minimizes the amount of time the file is actually open. The drawback is it causes your CGI program to consume as much memory as the size of the file. Obviously for very large files that's not a good idea; if your program consumes more memory than the machine has available, it could crash the whole machine (or at the very least make things extremely slow). To process data from a very large file, it's better to use a while loop to read one line at a time:

open(FH,"guestbook.txt") or &dienice("Can"t open guestbook.txt: $!");
while (my $line = <FH>) {
    print $line;
}
close(FH);

Poll Program

Let's try another example: a web poll. You've probably seen them on various news sites. A basic poll consists of one question and several potential answers (as radio buttons); you pick one of the answers, vote, then see the poll results on the next page.

Start by creating the poll HTML form. Use whatever question and answer set you wish.

Program 6-2: poll.html - Poll HTML Form

<form action="poll.cgi" method="POST">
Which was your favorite <i>Lord of the Rings</i> film?<br>
<input type="radio" name="pick" value="fotr">The Fellowship of the Ring<br>
<input type="radio" name="pick" value="ttt">The Two Towers<br>
<input type="radio" name="pick" value="rotk">Return of the King<br>
<input type="radio" name="pick" value="none">I didn't watch them<br>
<input type="submit" value="Vote">
</form>
<a href="results.cgi">View Results</a><br>

Working example: http://www.cgi101.com/book/ch6/poll.html

In this example we're using abbreviations for the radio button values. Our CGI program will translate the abbreviations appropriately.

Now the voting CGI program will write the result to a file. Rather than having this program analyze the results, we'll simply use a redirect to bounce the viewer to a third program (results.cgi). That way you won't need to write the results code twice.

Here is how the voting program (poll.cgi) should look:

Program 6-3: poll.cgi - Poll Program

#!/usr/bin/perl -wT
use CGI qw(:standard);
use CGI::Carp qw(warningsToBrowser fatalsToBrowser);
use strict;
use Fcntl qw(:flock :seek);

my $outfile = "poll.out";

# only record the vote if they actually picked something
if (param('pick')) {
   open(OUT, ">>$outfile") or &dienice("Couldn't open $outfile: $!");
   flock(OUT, LOCK_EX);      # set an exclusive lock 
   seek(OUT, 0, SEEK_END);   # then seek the end of file
   print OUT param('pick'),"\n";
   close(OUT);
} else {
# this is optional, but if they didn't vote, you might 
# want to tell them about it...
   &dienice("You didn't pick anything!");
}

# redirect to the results.cgi. 
# (Change to your own URL...)
print redirect("http://cgi101.com/book/ch6/results.cgi");

sub dienice {
    my($msg) = @_;
    print header;
    print start_html("Error");
    print h2("Error");
    print $msg;
    print end_html;
    exit;
}

Source code: http://www.cgi101.com/book/ch6/poll-cgi.html

Finally results.cgi reads the file where the votes are stored, totals the overall votes as well as the votes for each choice, and displays them in table format.

Program 6-4: results.cgi - Poll Results Program

#!/usr/bin/perl -wT
use CGI qw(:standard);
use CGI::Carp qw(warningsToBrowser fatalsToBrowser);
use strict;
use Fcntl qw(:flock :seek);

my $outfile = "poll.out";

print header;
print start_html("Results");

# open the file for reading
open(IN, "$outfile") or &dienice("Couldn't open $outfile: $!");
# set a shared lock
flock(IN, LOCK_SH); 
# then seek the beginning of the file
seek(IN, 0, SEEK_SET);

# declare the totals variables
my($total_votes, %results);
# initialize all of the counts to zero:
foreach my $i ("fotr", "ttt", "rotk", "none") {
   $results{$i} = 0;
}

# now read the file one line at a time:
while (my $rec = <IN>) {
   chomp($rec);
   $total_votes = $total_votes + 1;
   $results{$rec} = $results{$rec} + 1;
}
close(IN);

# now display a summary:
print <<End;
<b>Which was your favorite <i>Lord of the Rings</i> film?
</b><br>
<table border=0 width=50%>
<tr>
  <td>The Fellowship of the Ring</td>
  <td>$results{fotr} votes</td>
</tr>
<tr>
  <td>The Two Towers</td>
  <td>$results{ttt} votes</td>
</tr>
<tr>
  <td>Return of the King</td>
  <td>$results{rotk} votes</td>
</tr>
<tr>
  <td>didn't watch them</td>
  <td>$results{none} votes</td>
</tr>
</table>
<p>
$total_votes votes total
</p>
End

print end_html;

sub dienice {
    my($msg) = @_;
    print h2("Error");
    print $msg;
    print end_html;
    exit;
}

Source code: http://www.cgi101.com/book/ch6/results-cgi.html

Working example: http://www.cgi101.com/book/ch6/results.cgi

The results program only shows the total number of votes. You may also want to calculate the percentages and display a bar-graph for each vote relative to the overall total. We'll look at how to calculate percentages in the next chapter.