Chapter 1: Getting Started
Our programming language of choice for this book is Perl. Perl is a simple, easy to learn language, yet powerful enough to accomplish very difficult and complex tasks. It is widely available, and is probably already installed on your Unix server. You don't need to compile your Perl programs; you simply write your code, save the file, and run it (or have the web server run it). The program itself is a simple text file; the Perl interpreter does all the work. The advantage to this is you can move your program with little or no changes to any machine with a Perl interpreter. The disadvantage is you won't discover any bugs in your program until you run it.
You can write and edit your CGI programs (which are often called scripts) either on your local machine or in the Unix shell. If you're using Unix, try pico — it's a very simple, easy to use text editor. Just type pico filename to create or edit a file. Type man pico for more information and help using pico. If you're not familiar with the Unix shell, see Appendix A for a Unix tutorial and command reference.
You can also use a text editor on your local machine and upload the finished programs to the web server. You should either use a plain text editor, such as Notepad (PC) or BBEdit (Mac), or a programming-specific editor that provides some error- and syntax-checking for you. Visit /book/editors.html for a list of some editors you can use to write your CGI programs.
If you use a text editor, be sure to turn off special characters such as smartquotes. CGI files must be ordinary text.
Once you've written your program, you'll need to upload it to the web server (unless you're using pico and writing it on the server already). You can use any FTP or SCP (secure copy) program to upload your files; a list of some popular FTP and SCP programs can be found at /book/connect/.
It is imperative that you upload your CGI programs as plain text (ASCII) files, and not binary. If you upload your program as a binary file, it may come across with a lot of control characters at the end of the lines, and these will cause errors in your program. You can save yourself a lot of time and grief by just uploading everything as text (unless you're uploading pictures — for example, GIFs or JPEGs — or other true binary data). HTML and Perl CGI programs are not binary, they are plain text.
Once your program is uploaded to the web server, you'll want to be sure to move it to your cgi-bin (or public_html directory — wherever your ISP has told you to put your CGI programs). Then you'll also need to change the permissions on the file so that it is "executable" (or runnable) by the system. The Unix shell command for this is:
-
chmod 755 filename
This sets the file permissions so that you can read, write, and execute the file, and all other users (including the webserver) can read and execute it. See Appendix A for a full description of chmod and its options.
Most FTP and SCP programs allow you to change file permissions; if you use your FTP client to do this, you'll want to be sure that the file is readable and executable by everyone, and writable only by the owner (you).
One final note: Perl code is case-sensitive, as are Unix commands and filenames. Please keep this in mind as you write your first programs, because in Unix "perl" is not the same as "PERL".
What Is The Unix Shell?
It's a command-line interface to the Unix machine — somewhat like DOS. You have to use a Telnet or SSH (secure shell) program to connect to the shell; see /class/connect.html for a list of some Telnet and SSH programs you can download. Once you're logged in, you can use shell commands to move around, change file permissions, edit files, create directories, move files, and much more.
If you're using a Unix system to learn CGI, you may want to stop here and look at Appendix A to familiarize yourself with the various shell commands. Download a Telnet or SSH program and login to your shell account, then try out some of the commands so you feel comfortable navigating in the shell.
Throughout the rest of this book you'll see Unix shell commands listed in bold to set them apart from HTML and CGI code. If you're using a Windows server, you can ignore most of the shell commands, as they don't apply.
Basics of a Perl Program
You should already be familiar with HTML, and so you know that certain things are necessary in the structure of an HTML document, such as the <head> and <body> tags, and that other tags like links and images have a certain allowed syntax. Perl is very similar; it has a clearly defined syntax, and if you follow those syntax rules, you can write Perl as easily as you do HTML.
The first line of your program should look like this:
#!/usr/bin/perl -wT
The first part of this line, #!
, indicates that this is a script. The next part,
/usr/bin/perl
, is the location (or path) of the Perl interpreter. If you aren't sure where Perl lives on your system, try typing which perl or whereis perl in the shell. If the system can find it, it will tell you the full path name to the Perl interpreter. That path is what you should put in the above statement. (If you're using ActivePerl on Windows, the path should be /perl/bin/perl
instead.)
The final part contains optional flags for the Perl interpreter. Warnings are enabled by the -w
flag. Special user input taint checking is enabled by the -T
flag. We'll go into taint checks and program security later, but for now it's good to get in the habit of using both of these flags in all of your programs.
You'll put the text of your program after the above line.
Basics of a CGI Program
A CGI is simply a program that is called by the webserver, in response to some action by a web visitor. This might be something simple like a page counter, or a complex form-handler. Shopping carts and e-commerce sites are driven by CGI programs. So are ad banners; they keep track of who has seen and clicked on an ad.
CGI programs may be written in any programming language; we're just using Perl because it's fairly easy to learn. If you're already an expert in some other language and are just reading to get the basics, here it is: if you're writing a CGI that's going to generate an HTML page, you must include this statement somewhere in the program before you print out anything else:
print "Content-type: text/html\n\n";
This is a content-type header that tells the receiving web browser what sort of data it is about to receive — in this case, an HTML document. If you forget to include it, or if you print something else before printing this header, you'll get an "Internal Server Error" when you try to access the CGI program.
Your First CGI Program
Now let's try writing a simple CGI program. Enter the following lines into a new file, and name it "first.cgi". Note that even though the lines appear indented on this page, you do not have to indent them in your file. The first line (#!/usr/bin/perl) should start in column 1. The subsequent lines can start in any column.
Program 1-1: first.cgi - Hello World Program#!/usr/bin/perl -wT print "Content-type: text/html\n\n"; print "Hello, world!\n";
Source code: /book/ch1/first-cgi.html
Working example: /book/ch1/first.cgi
Save (or upload) the file into your web directory, then chmod 755 first.cgi to change the file permissions (or use your FTP program to change them). You will have to do this every time you create a new program; however, if you're editing an existing program, the permissions will remain the same and shouldn't need to be changed again.
Now go to your web browser and type the direct URL for your new CGI. For example:
/book/ch1/first.cgi
Your actual URL will depend on your ISP. If you have an account on cgi101, your URL is:
/~youruserid/first.cgi
You should see a web page with "Hello, world!" on it. (If it you get a "Page Not Found" error, you have the URL wrong. If you got an "Internal Server Error", see the "Debugging Your Programs," section at the end of this chapter.)
Let's try another example. Start a new file (or if you prefer, edit your existing first.cgi) and add some additional print statements. It's up to your program to print out all of the HTML you want to display in the visitor's browser, so you'll have to include print statements for every HTML tag:
Program 1-2: second.cgi - Hello World Program 2#!/usr/bin/perl -wT print "Content-type: text/html\n\n"; print "<html><head><title>Hello World</title></head>\n"; print "<body>\n"; print "<h2>Hello, world!</h2>\n"; print "</body></html>\n";
Source code: /book/ch1/second-cgi.html
Working example: /book/ch1/second.cgi
Save this file, adjust the file permissions if necessary, and view it in your web browser. This time you should see "Hello, world!" displayed in a H2-size HTML header.
Now not only have you learned to write your first CGI program, you've also learned your first Perl statement, the print
function:
print "somestring";
This function will write out any string, variable, or combinations thereof to the current output channel. In the case of your CGI program, the current output is being printed to the visitor's browser.
The \n
you printed at the end of each string is the newline character. Newlines are not required, but they will make your program's output easier to read.
You can write multiple lines of text without using multiple print statements by using the here-document syntax:
print <<endmarker; line1 line2 line3 etc. endmarker
You can use any word or phrase for the end marker (you'll see an example next where we use "EndOfHTML" as the marker); just be sure that the closing marker matches the opening marker exactly (it is case-sensitive), and also that the closing marker is on a line by itself, with no spaces before or after the marker.
Let's try it in a CGI program:
Program 1-3: third.cgi - Hello World Program, with here-doc#!/usr/bin/perl -wT print "Content-type: text/html\n\n"; print <<EndOfHTML; <html><head><title>Test Page</title></head> <body> <h2>Hello, world!</h2> </body></html> EndOfHTML
Source code: /book/ch1/third-cgi.html
Working example: /book/ch1/third.cgi
When a closing here-document marker is on the last line of the file, be sure you have a line break after the marker. If the end-of-file mark is on the same line as the here-doc marker, you'll get an error when you run your program.
The CGI.pm Module
Perl offers a powerful feature to programmers: add-on modules. These are collections of pre-written code that you can use to do all kinds of tasks. You can save yourself the time and trouble of reinventing the wheel by using these modules.
Some modules are included as part of the Perl distribution; these are called standard library modules and don't have to be installed. If you have Perl, you already have the standard library modules.
There are also many other modules available that are not part of the standard library. These are typically listed on the Comprehensive Perl Archive Network (CPAN), which you can search on the web at http://search.cpan.org.
The CGI.pm module is part of the standard library, and has been since Perl version 5.004. (It should already be installed; if it's not, you either have a very old or very broken version of Perl.) CGI.pm has a number of useful functions and features for writing CGI programs, and its use is preferred by the Perl community. We'll be using it frequently throughout the book.
Let's see how to use a module in your CGI program. First you have to actually include the module via the use command. This goes after the #!/usr/bin/perl line and before any other code:
use CGI qw(:standard);
Note we're not doing use CGI.pm
but rather use CGI
. The .pm is implied in the use statement. The qw(:standard)
part of this line indicates that we're importing the "standard" set of functions from CGI.pm.
Now you can call the various module functions by typing the function name followed by any arguments:
functionname(arguments)
If you aren't passing any arguments to the function, you can omit the parentheses.
A function is a piece of code that performs a specific task; it may also be called a subroutine or a method. Functions may accept optional arguments (also called parameters), which are values (strings, numbers, and other variables) passed into the function for it to use. The CGI.pm module has many functions; for now we'll start by using these three:
header; start_html; end_html;
The header
function prints out the "Content-type" header. With no arguments, the type is assumed to be "text/html". start_html
prints out the <html>, <head>, <title> and <body> tags. It also accepts optional arguments. If you call start_html with only a single string argument, it's assumed to be the page title. For example:
print start_html("Hello World");
will print out the following*:
<html> <head> <title>Hello World</title> <head> <body>
You can also set the page colors and background image with start_html
:
print start_html(-title=>"Hello World", -bgcolor=>"#cccccc", -text=>"#999999", -background=>"bgimage.jpg");
Notice that with multiple arguments, you have to specify the name of each argument with -title=>, -bgcolor=>
, etc. This example generates the same HTML as above, only the body tag indicates the page colors and background image:
<body bgcolor="#cccccc" text="#999999" background="bgimg.jpg">
The end_html
function prints out the closing HTML tags:
</body> </html>
So, as you can see, using CGI.pm in your CGI programs will save you some typing. (It also has more important uses, which we'll get into later on.)
The Other Way To Use CGI.pm or "There's More Than One Way To Do Things In Perl"
As you learn Perl you'll discover there are often many different ways to accomplish the same task. CGI.pm exemplifies this; it can be used in two different ways. The first way you've learned already: function-oriented style. Here you must specify use CGI qw(:standard); print header; print start_html("Hello World"); The other way is object-oriented style, where you create an object (or instance of the module) and use that to call the various functions of CGI.pm: use CGI; # don't need qw(:standard) $cgi = CGI->new; # ($cgi is now the object) print $cgi->header; # function call: $obj->function print $cgi->start_html("Hello World"); Which style you use is up to you. The examples in this book use the function-oriented style, but feel free to use whichever style you're comfortable with. |
Let's try using CGI.pm in an actual program now. Start a new file and enter these lines:
Program 1-4: fourth.cgi - Hello World Program, using CGI.pm#!/usr/bin/perl -wT use CGI qw(:standard); print header; print start_html("Hello World"); print "<h2>Hello, world!</h2>\n"; print end_html;
Source code: /book/ch1/fourth-cgi.html
Working example: /book/ch1/fourth.cgi
Be sure to change the file permissions (chmod 755 fourth.cgi), then test it out in your browser.
CGI.pm also has a number of functions that serve as HTML shortcuts. For instance:
print h2("Hello, world!");
Will print an H2-sized header tag. You can find a list of all the CGI.pm functions by typing perldoc CGI in the shell, or visiting http://perldoc.perl.org/ and entering "CGI.pm" in the search box.
Documenting Your Programs
Documentation can be embedded in a program using comments. A comment in Perl is preceded by the #
sign; anything appearing after the #
is a comment:
#!/usr/bin/perl -wT use CGI qw(:standard); # This is a comment # So is this # # Comments are useful for telling the reader # what's happening. This is important if you # write code that someone else will have to # maintain later. print header; # here's a comment. print the header print start_html("Hello World"); print "<h2>Hello, world!</h2>\n"; print end_html; # print the footer # the end.
Source code: /book/ch1/fifth-cgi.html
Working example: /book/ch1/fifth.cgi
You'll notice the first line (#!/usr/bin/perl
) is a comment, but it's a special kind of comment. On Unix, it indicates what program to use to run the rest of the script.
There are several situations in Perl where an #-sign is not treated as a comment. These depend on specific syntax, and we'll look at them later in the book.
Any line that starts with an #-sign is a comment, and you can also put comments at the end of a line of Perl code (as we did in the above example on the header and end_html lines). Even though comments will only be seen by someone reading the source code of your program, it's a good idea to add comments to your code explaining what's going on. Well-documented programs are much easier to understand and maintain than programs with no documentation.
Debugging Your Programs
A number of problems can happen with your CGI programs, and unfortunately the default response of the webserver when it encounters an error (the "Internal Server Error") is not very useful for figuring out what happened.
If you see the code for the actual Perl program instead of the desired output page from your program, this probably means that your web server isn't properly configured to run CGI programs. You'll need to ask your webmaster how to run CGI programs on your server. And if you ARE the webmaster, check your server's documentation to see how to enable CGI programs.
If you get an Internal Server Error, there's either a permissions problem with the file (did you remember to chmod 755 the file?) or a bug in your program. A good first step in debugging is to use the CGI::Carp module in your program:
use CGI::Carp qw(warningsToBrowser fatalsToBrowser);
This causes all warnings and fatal error messages to be echoed in your browser window. You'll want to remove this line after you're finished developing and debugging your programs, because Carp errors can give away important security info to potential hackers.
If you're using the Carp module and are still seeing the "Internal Server Error", you can further test your program from the command line in the Unix shell. This will check the syntax of your program without actually running it:
-
perl -cwT fourth.cgi
If there are errors, it will report any syntax errors in your program:
-
% perl -cwT fourth.cgi
syntax error at fourth.cgi line 5, near "print"
fourth.cgi had compilation errors.
This tells you there's a problem on or around line 5; make sure you didn't forget a closing semicolon on the previous line, and check for any other typos. Also be sure you saved and uploaded the file as text; hidden control characters or smartquotes can cause syntax errors, too.
Another way to get more info about the error is to look at the webserver log files. Usually this will show you the same information that the CGI::Carp module does, but it's good to know where the server logs are located, and how to look at them. Some usual locations are /usr/local/etc/httpd/logs/error_log, or /var/log/apache2/error_log. Ask your ISP if you aren't sure of the location. In the Unix shell, you can use the tail
command to view the end of the log file:
-
tail /var/log/apache2/error_log
The last line of the file should be your error message (although if you're using a shared webserver like an ISP, there will be other users' errors in the file as well). Here are some example errors from the error log:
[Fri Jan 16 02:06:10 2004] access to /home/book/ch1/test.cgi failed for 205.188.198.46, reason: malformed header from script.
In string, @yahoo now must be written as \@yahoo at /home/book/ch1/test.cgi line 331, near "@yahoo"
Execution of /home/book/ch1/test.cgi aborted due to compilation errors.
[Fri Jan 16 10:04:31 2004] access to /home/book/ch1/test.cgi failed for 204.87.75.235, reason: Premature end of script headers
A "malformed header" or "premature end of script headers" can either mean that you printed something before printing the "Content-type: text/html" line, or your program died. An error usually appears in the log indicating where the program died, as well.