A painless introduction to CGI

We aren't making movies full of special effects; this CGI is the Common Gateway Interface. It's a way to write scripts to produce dynamic web pages.

A CGI script is just like any other script except that it's launched by a web server like Apache (actually the web server launches a dispatcher, like its mod_cgi, and that launches the script). The script is launched with an environment that has certain characteristics:

  • it has environment variables that carry the HTTP header and other related information,
  • its standard input received the HTTP request's body,
  • its standard output is the HTTP response.

(HTTP uses the RFC822 message format, which is the same as email: essentially some headers followed by a blank like and a body.)

Various languages can be used to write CGI scripts. We'll start out using Bash because its available on just about any unix-like server that you're ever likely to encounter. It also gives you a raw insight into what's going on (other languages might augment the basic environment by expanding query parameters, for example).

Bash and CGI

The environment that a Bash CGI script runs in contains environment variables that give it lot of information. Here is an example:

SERVER_SIGNATURE=  
HTTP_USER_AGENT=Mozilla/5.0 (X11; Linux x86_64; rv:32.0) Gecko/20100101 Firefox/32.0  
SERVER_PORT=80  
HTTP_HOST=ccgi.username.plus.com  
DOCUMENT_ROOT=/services/webpages/c/c/ccgi.username.plus.com/public  
SCRIPT_FILENAME=/services/webpages/c/c/ccgi.username.plus.com/cgi-bin/test.cgi  
GDFONTPATH=/services/share/fonts  
REQUEST_URI=/cgi-bin/test.cgi  
SCRIPT_NAME=/cgi-bin/test.cgi  
REMOTE_HOST=username.plus.com  
SERVER_DOMAIN=ccgi.username.plus.com  
SCRIPT_URI=http://ccgi.username.plus.com/cgi-bin/test.cgi  
HTTP_CONNECTION=keep-alive  
REMOTE_PORT=59613  
PATH=/usr/bin:/bin:/usr/sbin:/sbin  
MvCONFIG_LIBRARY=/services/websoftware/miva/Empresa/cgi-bin/libmivaconfig.so  
SCRIPT_URL=/cgi-bin/test.cgi  
PWD=/services/webpages/util/c/c/ccgi.username.plus.com/cgi-bin  
SERVER_ADMIN=or webmaster  
HTTP_ACCEPT_LANGUAGE=en-gb,en;q=0.7,es;q=0.3  
HTTP_DNT=1  
HTTP_ACCEPT=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8  
REMOTE_ADDR=aaa.bbb.ccc.ddd  
SHLVL=1  
SERVER_NAME=ccgi.username.plus.com  
SERVER_SOFTWARE=Apache  
QUERY_STRING=  
SERVER_ADDR=eee.fff.ggg.hhh  
GATEWAY_INTERFACE=CGI/1.1  
SERVER_PROTOCOL=HTTP/1.1  
HTTP_ACCEPT_ENCODING=gzip, deflate  
REQUEST_METHOD=GET  

Parameters: GET vs POST

A HTTP request can send data in various ways, the main ones being GET and POST and their main difference is how they convey parameters. The GET request sends them in the headers, in the QUERY_STRING, whereas the POST sends them in the body. Their format, however, is the same: it's defined in the HTML specification as application/x-www-form-urlencoded.

The specifications allow for other formats but forms post in this format by default.

The header contains a CONTENT_LENGTH which is the size of the body: it will be zero for a GET and non-zero if a POST contains parameters.

The encoded parameters come from either the QUERY_STRING or standard input stream:

case "$REQUEST_METHOD" in
  GET)  encoded_params="$QUERY_STRING" ;;
  POST) read encoded_params            ;;
esac

Similar code can then be used to extract parameters for both methods:

ifs=$IFS
IFS='=&'
params=($encoded_params)
IFS=$ifs

This places them into an array in a (key,value,key,value...) sequence. An alternative with Bash version 4 is to further process this into an associative array:

declare -A params_associative
for ((i=0;i<${#params[@]};i+=2))
do
  eval "params_associative+=([${params[i]}]=\"$(decode ${params[i+1]})\")"
done

The above describes the default encoding mechanism (application/x-www-form-urlencoded) but another encoding needs to be used if a submitting form is posting files. This is multipart/form-data and is a MIME encoding; it's much more complicated. See this post for a detailed explanation but decoding MIME is probably beyond what a Bash CGI can be useful for.

Alternatives and other examples are referenced below.

CGI Pre-processors

A CGI Pre-processor performs the decoding of parameters so that the CGI script receives them as environment variables. One example is Un-CGI.

Template Engines

Template engines allow embedding one language inside another and are typically used to embed code inside HTML documents. The most basic example being Server-side Includes (SSI).

Running a template engine involves installing a module into the web server or as an interpreter (shebang) invoked via CGI.

Haserl

Haserl is the Html And Shell Embedded Runtime Language and offers Dynamic web content in 20K. AUR

It allows embedded Bash (or Lua) to be written in an HTML file by enclising it between <% and %> in a similar fashion to other languages (think ERB for Bash).

Mod-Ruby

mod-ruby is vapourware but the idea is to allow the web server to serve pages containing embedded Ruby (ERB) in a similar fashion to how PHP pages are served.

Other Apache modules

The templating idea, via a mod_xxx Apache module lies down a well trodden path.

Finally FastCGI|mod-fastcgi provides a language-independent way to improve the performace of CGI scripts by removing the need to start a new process for each request.