Writing Doctor HTML Modules

by Thomas Tongue. Last Modified November 6th, 2001.


1.0 Introduction

Starting with Doctor HTML version 6, the testing framework for producing reports is fully extensible through the use of custom modules. This new feature allows site license owners to integrate their own tests into the Doctor HTML program without modifying the software distribution. New modules will also be available from Imagiware, either as upgrades or new features.

2.0 Framework

Before you can write a custom module for Doctor HTML, a basic understanding of Perl and the program's framework is needed. The heart of the Doctor HTML system is the report generator, which is used for both single page and site analysis. The report generator scans the cgi-bin/modules directory, loads the module code, and determines which tests will be performed based on HTML form or command-line input. When you are ready to install or test a new module, you should place the code in the cgi-bin/modules directory.

To make a module that will be recognized by the report generator, and produce output within the reports, you need to work with a number of global hashes. Some of the hashes, such as %MODULE_REGISTRY, determine how the code within a module will be run. Others, such as %MODULE_DESC, provide information about the module for use on help screens. The global hashes can be considered in two groups: Definitions and Input/Output.

2.1 Definitions

The definitions group usually appear at the top of the module code, and provide some information on how the module should be used. The first definition is usually $MODULE_NAME, which is used to make the other definitions when the module is loaded. The $MODULE_NAME is the key for the other variable definitions, so it's important to choose a unique value. The value given in $MODULE_NAME is also used as the title for the test in the selection form and the report, so something short yet descriptive would be a good choice.

An entry in %MODULE_REGISTRY is needed to indicate what subroutine (defined in this module) should be called when a test is to be performed. Here is an example from the 'Font Support' module provided with Doctor HTML:

                # MODULE_REGISTRY is used to determine what modules are
                # available, and what is the primary subroutine that needs to
                # be run in order to get its report out in %TEST_OUTPUT.
$MODULE_REGISTRY{$MODULE_NAME}="DoFonts";

A subroutine called "DoFonts" is defined later in the module, and will be executed (if selected) when the report is generated.

To simplify adding new tests to the program, the module is required to describe itself, so that the information can be automatically added to the help pages. This eliminates the need to edit the documentation provided with Doctor HTML in order to add a well integrated function. The %MODULE_SHORT_DESC hash holds a short description of the test which is used when a full explanation would be too long (eg: invoking the --help option on the command line report generator gives you the text found in %MODULE_SHORT_DESC). The %MODULE_DESC is used to provide the full description of the test, and can be found on the explanation pages linked off the test selection form. Here is an example of these definitions:

                # MODULE_SHORT_DESC is a short, single line description of the
                # test that will be performed by this module. It's used in
                # various built pages. It also appears in the command line
                # version when --help is invoked.
$MODULE_SHORT_DESC{$MODULE_NAME}="All fonts specified on the page are checked ".
                                 "for compatibility with default fonts on ".
                                 "Windows, Macintosh, and Unix computers.";

                # MODULE_DESC contains the text that will be used to describe
                # the test to the user. It's mainly for dynamically generating
                # help pages.
$MODULE_DESC{$MODULE_NAME}=qq[

The Font Support test checks all font faces specified on the Web page.
These can come either from FONT tags or from cascading style sheets.
This module looks to see if at least one of the fonts listed in each
instance is available as a default font on Windows, Macintosh, and
Unix computers.  Making sure the font list includes options for each
platform prevents the use of unexpected fonts on different computers
and gives the designer greater control over the visual layout of the page.

];

To help determine if the module's code should be invoked, the module must define a form variable in %MODULE_FORM_VARIABLE to be used in the test selection form, and a command line tag in %MODULE_TAG for direct use from a command shell. Like the choice of module name, these values should be unique. However, if you want to have a set of modules that always run as a group, using the same values for each module is one way to achieve that effect. Here is the sample from the 'Font Support' module:

                # MODULE_TAG is the command line flag that is used to trigger
                # the Doctor HTML program to run this module's test.
$MODULE_TAG{$MODULE_NAME}="-fonts";

                # MODULE_FORM_VARIABLE is the variable in the HTML form that
                # will determine if the single page and multi page versions of
                # Doctor HTML will run this module's test.
$MODULE_FORM_VARIABLE{$MODULE_NAME}="Fonts";

Finally, we need to indicate where the module's report output should appear with respect to other modules. The default is for the test results to be printed in declining order of test scores (we'll get to that in a moment). However, many have requested that the order in which tests are presented is kept fixed, regardless of results. To accommodate this wish, and to break ties when two test scores are equal, we must define %MODULE_ORDER. Below are the current order values for the modules provided with Doctor HTML:

Module Name$MODULE_ORDER
HTML Parse50
Document Structure75
Verify Hyperlinks100
Spelling200
Image Syntax250
Image Analysis300
Meta Tags450
Table Analysis500
Form Structure600
Format HTML700
Squish HTML800
Frames Expansion850
Display Cookies950
Browser Support2500
Font Support2600
Show HTML Hierarchy4000
Form Output Test5000
Show Doctor HTML Performance8000
OLD Document Structure9075
OLD Table Analysis9500
OLD Form Structure9600

In addition to the required definitions above, there are several optional hashes that can be used to tune the usage of a module. The %MODULE_EXPERIMENTAL and %MODULE_DEPRECATED hashes are used to determine if the code should be flagged as experimental or deprecated respectively. This is useful if you want to warn your users that a specific module may not be suitable for general use. You can also configure Doctor HTML not to show experimental or deprecated modules in the test selection screen if so desired (see installation notes for details).

Below is a complete sample header from 'Font Support' module provided with Doctor HTML:

                # MODULE_NAME is the name of the module as it will be used to id
                # various module properties. This name should be descriptive and
                # unique. It will be used to build the forms for the single and
                # multi-page version of the program. NOTE: do not rely on this
                # variable ANYWHERE except in the initial module definition,
                # since it will be overwritten/undefined after the next
                # module is loaded.
$MODULE_NAME="Font Support";

                # MODULE_REGISTRY is used to determine what modules are
                # available, and what is the primary subroutine that needs to
                # be run in order to get its report out in %TEST_OUTPUT.
$MODULE_REGISTRY{$MODULE_NAME}="DoFonts";

                # MODULE_SHORT_DESC is a short, single line description of the
                # test that will be performed by this module. It's used in
                # various built pages. It also appears in the command line
                # version when --help is invoked.
$MODULE_SHORT_DESC{$MODULE_NAME}="All fonts specified on the page are checked ".
                                 "for compatibility with default fonts on ".
                                 "Windows, Macintosh, and Unix computers.";

                # MODULE_DESC contains the text that will be used to describe
                # the test to the user. It's mainly for dynamically generating
                # help pages.
$MODULE_DESC{$MODULE_NAME}=qq[

The Font Support test checks all font faces specified on the Web page.
These can come either from FONT tags or from cascading style sheets.
This module looks to see if at least one of the fonts listed in each
instance is available as a default font on Windows, Macintosh, and
Unix computers.  Making sure the font list includes options for each
platform prevents the use of unexpected fonts on different computers
and gives the designer greater control over the visual layout of the page.

];
                # MODULE_TAG is the command line flag that is used to trigger
                # the Doctor HTML program to run this module's test.
$MODULE_TAG{$MODULE_NAME}="-fonts";

                # MODULE_FORM_VARIABLE is the variable in the HTML form that
                # will determine if the single page and multi page versions of
                # Doctor HTML will run this module's test.
$MODULE_FORM_VARIABLE{$MODULE_NAME}="Fonts";

                # MODULE_ORDER determines which order the tests appear on the
                # report output. The output is sorted in ascending order.
$MODULE_ORDER{$MODULE_NAME}=2600;

                # MODULE_EXPERIMENTAL determines whether the module should be
                # selected by default in test installations, and whether the
                # item appears at ALL in the public editions, regardless of the
                # modules presence in the distribution.
$MODULE_EXPERIMENTAL{$MODULE_NAME}=0;

2.2 Input/Output

In order for different sections of Doctor HTML to pass information into and out of the modules, a number of global hashes have been made available. All modules have access to the parsed document structure contained in the @HTML array of hashes. Each array element contains a hash of information on an individual tag, including what line it was found on in the document. A standard method of working through the array is:

foreach $i ( 0 .. $#HTML ) {
  $tag = ${$HTML[$i]}{"tag"};
  $text = ${$HTML[$i]}{"text"};
  $line = ${$HTML[$i]}{"line"};
  $isClose = ${$HTML[$i]}{"isClose"};
  %attributes = %{${$HTML[$i]}{"attributes"}};
      ...

The %attributes hash defined above contains the key=value pairs found within the tag. As an example, a line like:

<a href="http://www.imagiware.com" target="home">

would have the following pairs defined in %attributes:

$attributes{'href'}   = "http://www.imagiware.com"
$attributes{'target'} = "home"

To get the test results out of the module and into the final report, the tests HTML must be stored in %TEST_OUTPUT under the $MODULE_NAME key. The HTML placed in %TEST_OUTPUT for each module will be wrapped by a table in the final report, with the tests $MODULE_NAME as the heading.

The summary results of the test are passed through the %TEST_SUMMARY hash, again keyed off $MODULE_NAME. The summary should be a brief statement of the test's results. There should always be some value added to %TEST_SUMMARY for each module, even if it simply indicates that no errors were found.

As mentioned in section 2.1, the test results are ordered by a score assigned by the module. The method of calculating the score is up to the module designer, as long as the score ranges from 0 to 4. Here are some guidelines for assigning scores:

Score Interpretation
0 No errors. Everything is fine.
1 A few errors / Minor / Might be OK
2 Many Errors/Cause for concern / Probably worth looking at
3 Numerous Errors / Serious / Should be checked
4 Choked With Errors / MAJOR / Must be Fixed

In addition to the score, the test should report the total number of errors that it detected in %TEST_ERRORS. The number of errors is reported in the summary report, and is also used in conjunction with the score to determine what order the test results appear.

Finally, in order for pages to be summarized in the Site Doctor, this information must be preserved in a string within the report. To do this, add the following line to the end of your test module's main subroutine:

$TEST_STATUS_INFO{$MODULE_NAME}=$MODULE_FORM_VARIABLE{$MODULE_NAME}.
                                qq[_ERRORS=$numErr ].
                                $MODULE_FORM_VARIABLE{$MODULE_NAME}.
                                qq[_SCORE=$score ];

where $numErr is the number of errors from the test, and $score is the score that was assigned.

2.3 A Sample module

Pulling together the information above, we can now look at a sample module that counts how many tags are found in the document. This may not be a terribly useful test, but it covers all the requirements to be a working module in Doctor HTML:

# Doctor HTML Module: A Sample Module
#    This module is loaded by the Doctor HTML report generator. The main
#    subroutine (mySample) will be executed if the form variable "mysample"
#    has a non-null value in the submitting form, or if the program is
#    called on the command line with the "-mysample" flag.
#
#    All we're going to do is count how many HTML tags were found.
#
#    Module Version 0.1
#    Started 11/5/2001
#    Written by Thomas Tongue

$MODULE_NAME="Sample Module";                   # Name of the module.

$MODULE_REGISTRY{$MODULE_NAME}="mySample";      # Subroutine to exec if test is
                                                # needed.

$MODULE_TAG{$MODULE_NAME}="-mysample";          # Command line flag

$MODULE_FORM_VARIABLE{$MODULE_NAME}="mysample"; # HTML form variable

                                                # Short description of what
                                                # this module does.
$MODULE_SHORT_DESC{$MODULE_NAME}="A check to see how many HTML tags are in the page.";

                                                # The full description of what
                                                # this module does.
$MODULE_DESC{$MODULE_NAME}=qq[
This Sample Module is used to figure out how many HTML tags are in the page.
];
                                                # Where we want the results
                                                # relative to other tests.
$MODULE_ORDER{$MODULE_NAME}=2100;

$MODULE_EXPERIMENTAL{$MODULE_NAME}=1;           # Experimental? You bet.

##############################################################################
# mySample
#   This is the main subroutine of this sample module. It counts how many
#   tags are in the @HTML array.

sub mySample {
  local($MODULE_NAME)=$_[0];
  my($open_tags,$close_tags);
  foreach $i ( 0 .. $#HTML ) {
    $tag = ${$HTML[$i]}{"tag"};
    $text = ${$HTML[$i]}{"text"};
    $line = ${$HTML[$i]}{"line"};
    $isClose = ${$HTML[$i]}{"isClose"};
    %attributes = %{${$HTML[$i]}{"attributes"}};
    if ($isClose) {
      $close_tags++;
    } else {
      $open_tags++;
    }
  }
  my $num_tags=$close_tags+$open_tags;
  $TEST_OUTPUT{$MODULE_NAME}=qq[There were ].$num_tags.
                             qq[ tags in the document, ].$open_tags.
                             qq[ open tags and ].$close_tags.qq[ close tags.];
  $TEST_SUMMARY{$MODULE_NAME}=qq[Found ].$#HTML.qq[ tags in the document.];
  $TEST_SCORE{$MODULE_NAME}=0;
  $TEST_ERRORS{$MODULE_NAME}=0;
  $TEST_STATUS_INFO{$MODULE_NAME}=$MODULE_FORM_VARIABLE{$MODULE_NAME}.
                                  qq[_ERRORS=$numErr ].
                                  $MODULE_FORM_VARIABLE{$MODULE_NAME}.
                                  qq[_SCORE=$score ];
}
#
############################################################################

3.0 Closing Remarks

We've covered the basics of writing a Doctor HTML module. There are quite a few additional topics that could be covered in this document, such as layout and style sheet recommendations, but for now that is left as an exercise to the reader. The style sheet for the reports is found in report.css, and is well commented. Examination of the report output should give you a fairly good idea of how we've structured the individual report tables (it's not rocket science after all).

If you're looking for additional examples of modules and how they work, have a look at the source in the modules directory. The code has been written so that it's easy to read, and you can even make copies of the modules to experiment with. Please let us know if you have any questions.

Good Luck!


Doctor HTML is Copyright 1995-2001 by Imagiware, Inc..