This tutorial assumes...
- ...that ActivePerl
build 623 or greater is installed on your system.
ActivePerl is a free distribution of the core Perl language.
See Komodo's Installation Guide for
configuration instructions.
- ...that you have a connection to the Internet.
- ...that you are interested in Perl. You don't need to have
previous knowledge of Perl; the tutorial will walk you through
a simple program and suggest some resources for further
information.
You have exported a number of email messages to a text file.
You want to extract the name of the sender and the contents of
the email, and convert it to XML format. You intend to eventually
transform it to HTML using XSLT. To create an XML file
from a text source file, you will use a Perl program that parses
the data and places it within XML tags. In this tutorial you
will:
- Install a
Perl module for parsing text files containing
comma-separated values.
- Open the Perl
Tutorial Project and associated files.
- Analyze
parse.pl the Perl program included in the Tutorial
Project.
- Generate
output by running the program.
- Debug the
program using the Komodo debugger.
One of the great strengths of Perl is the wealth of free
modules available for extending the core Perl distribution.
ActivePerl includes the Perl Package Manger (PPM) that makes it
easy to browse, download and update Perl modules from module
repositories on the internet. These modules are added to the core
ActivePerl installation.
The Text::CSV_XS Perl module is necessary for this tutorial.
To install it using PPM:
- Open the Run Command dialog box. Select Tools|Run
Command.
- In the Run field, enter the command:
ppm install Text::CSV_XS
- Click the Run button to run the command.
PPM connects to the default repository, downloads the necessary
files and installs them.
- PPM can be run directly from the command line with the
ppm
command. Enter ppm help
for more
information on command-line options.
- By default, PPM accesses the Perl Package repository at
http://ppm.activestate.com.
The ActiveState repository contains binary versions of most
packages available from CPAN, the Comprehensive
Perl Archive Network.
- More information about PPM is available on
ASPN. PPM documentation is also included with your
ActivePerl distribution.
- On Linux systems where ActivePerl has been installed by the
super-user (i.e.
root
), most users will not have
permissions to install packages with PPM. Run ppm
as root at the command line to install packages.
Perl Pointer It is also possible to
install Perl modules without PPM using the CPAN shell. See
the
CPAN FAQ for more information.
|
On the Start Page under Tutorials and
Documentation, click Perl Tutorial, or open
the perl_tutorial.kpf file from the
samples/perl_tutorials subdirectory of Komodo's user data directory
The tutorial project will open in the Projects sidebar.
On the Projects sidebar,
double-click the files parse.pl
,
mailexport.xml
and mailexport.txt
.
These files will open in the Editor Pane; a tab at the top of the
pane displays their names.
- mailexport.txt This file was generated by
exporting the contents of an email folder (using the email
program's own Export function) to a comma-separated text file.
Notice that the key to the file contents are listed on the
first line. The Perl program will use this line as a reference
when parsing the email messages.
- parse.pl This is the Perl program that
will parse mailexport.txt and generate mailexport.xml.
- mailexport.xml This file was generated by
parse.pl, using mailexport.txt as input. When you run parse.pl
(in Generating Output),
this file will be regenerated.
In this step, you will examine the Perl program on a
line-by-line basis. Ensure that Line Numbers are enabled in
Komodo (View|View Line Numbers). Ensure that the
file "parse.pl" is displayed in the Komodo Editor Pane.
Line 1 - Shebang Line
- Komodo analyzes this line for hints about what language the
file contains
- warning messages are enabled with the "-w" switch
Komodo Tip notice that syntax elements
are displayed in different colors. You can adjust the
display options for language elements in the Preferences
dialog box.
|
Lines 2 to 4 - External Modules
- these lines load external Perl modules used by the
program
- Perl module files have a ".pm" extension; "use strict" uses
the "strict.pm" module, part of the core Perl distribution
- "use Text::CSV_XS" refers to the module installed in Step
One
Lines 6 to 7 - Open Files
- input and output files are opened; if the output file does
not exist, it is created
- scalar variables, indicated by the "$" symbol, store the
files
- "strict" mode (enabled by loading "strict.pm" in line 2)
requires that variables be declared using the format "my
$variable"
Perl Pointer scalar variables store
"single" items; their symbol ("$") is shaped like an "s",
for "scalar".
|
Lines 9 to 13 - Print the Header to the Output File
- "<<" is a "here document" indicator that defines the
string to be printed
-
- the text "EOT" is arbitrary and user-defined, and
defines the beginning and end of the string
- the second EOT on line 13 indicates the end of
output
- lines 10 and 11 are data that will be printed to the output
file
Lines 15 to 16 - Assign Method Call to Scalar Variable
- the result of the method call "new" is assigned to the
scalar variable $csv
- the method "new" is contained in the module
Text::CSV_XS
({binary => 1})
tells the method to treat
the data as binary
Perl Pointer good Perl code is
liberally annotated with comments (indicated by the "#"
symbol).
|
Lines 18 to 19 - Method "getline"
- the method "getline" is contained in the module
Text::CSV_XS, referenced in the $csv scalar variable
- "getline" reads the first line of mailexport.txt
(referenced in the $in variable), parses the line into fields,
and returns a reference to the resulting array to the $fields
variable
Line 21 - "while" Loop
- the "while" statement is conditional
-
- the condition is "1', so the program endlessly repeats
the loop because the condition is always met
- the logic for breaking out of the loop is on line
25
- the loop is enclosed in braces; the opening brace is on
line 21, the closing brace on line 51
Komodo Tip Click on the minus symbol to
the left of line 21. The entire section of nested code will
be collapsed. This is Code Folding.
|
Komodo Tip click the mouse pointer on
line 21. Notice that the opening brace changes to a bold
red font. The closing brace on line 51 is displayed the
same way.
|
Lines 22 to 25 - Extracting a Line of Input Data
- the "getline" function extracts one line of data from the
input file and places it in the $record scalar variable
- if "getline" returns an empty array, the input file has
been fully processed and the program exits the loop and
proceeds to line 52
Perl Pointer variable arrays store
lists of items indexed by number; their symbol ("@") is
shaped like an "a", for "array".
|
Lines 27 to 31 - "foreach"
- "foreach" cycles through the elements stored in the
@$record array
- the regular expressions on lines 29 and 30 find the
characters "<" and "&", and replace them with their
character entity values ("<" and "&" are reserved
characters in XML)
Lines 33 to 35 - hash slice
- line 35 combines the @$record array with the field
reference generated in line 19
Perl Pointer variable hashes are
indicated by the symbol "%", and store lists of items
indexed by string.
|
Lines 37 to 50 - Writing Data to the Output File
- one line at a time, lines from the input file are processed
and written to the output file
- portions of the data line (stored in the $record scalar
variable) are extracted based on the corresponding text in the
field reference (the first line in the input file, stored in
the $fields variable)
Line 51 - Closing the Processing Loop
- at line 51, processing will loop back to the opening brace
on line 21
- the logic to exit the loop is on line 25
Lines 52 to 54 - Ending the Program
- line 52 prints the closing tag to the XML file
- line 53 closes the output file or, if it cannot, fails with
the error "Can't write mailexport.xml"
- line 54 closes the input file (it is not necessary to check
the status when closing the input file because this only fails
if the program contains a logic error.)
To start, you will simply generate the output by running the
program through the debugger without setting any breakpoints.
- Clear the contents of mailexport.xml Click
on the "mailexport.xml" tab in the Editor Pane. Delete the
contents of the file - you will regenerate it in the next step.
Save the file.
- Run the Debugger Click on the "parse.pl"
tab in the editor. From the menu, select
Debug|Go/Continue. In the Debugging
Options dialog box,
click OK to accept the defaults.
- View the contents of mailexport.xml Click
on the "mailexport.xml" tab in the editor. Komodo informs you
that the file has changed. Click OK to reload
the file.
In this step you'll add breakpoints to the program and "debug"
it. Adding breakpoints lets you to run the program in chunks,
making it possible to watch variables and view output as it is
generated. Before you begin, ensure that line numbering is
enabled in Komodo (View|View Line Numbers).
- Set a breakpoint: On the "parse.pl" tab,
click in the grey margin immediately to the left of the code on
line 9 of the program. This will set a breakpoint, indicated by
a red circle.
- Run the Debugger: Select
Debug|Go/Continue. In the Debugging
Options dialog box, click OK to accept the
defaults. The debugger will process the program until it
encounters the first breakpoint.
Komodo Tip Debugger commands can be
accessed from the Debug menu, by shortcut keys, or from the
Debug Toolbar. For a summary of debugger commands, see
Debugger
Command List.
|
- Watch the debug process: A yellow arrow on
the breakpoint indicates the position at which the debugger has
halted. Click on the "mailexport.xml" tab. Komodo informs you
that the file has changed. Click OK to reload
the file.
- View variables: In the Bottom Pane, see
the Debug tab. The variables "$in" and "$out"
appear in the Locals tab.
- Line 9 - Step In: Select
Debug|Step In. "Step In" is a debugger command
that causes the debugger to execute the current line and then
stop at the next processing line (notice that the lines between
9 and 13 are raw output indicated by "here" document
markers).
- Line 16 - Step In: On line 16, the
processing transfers to the module Text::CSV_XS. Komodo opens
the file CSV_XS.pm and stops the debugger at the active line in
the module.
- Line 61 - Step Out: Select
Debug|Step Out. The Step Out command will make
the debugger execute the function in Text::CSV_XS and pause at
the next line of processing, which is back in parse.pl on line
19.
- Line 19 - Step Over: Select
Debug|Step Over. The debugger will process the
function in line 19 without opening the module containing the
"getline" function.
Komodo Tip What do the debugger
commands do?
- Step In executes the current line of code and
pauses at the following line.
- Step Over executes the current line of code.
If the line of code calls a function or method, the
function or method is executed in the background and the
debugger pauses at the line that follows the original
line.
- Step Out when the debugger is within a
function or method, Step Out will execute the code
without stepping through the code line by line. The
debugger will stop on the line of code following the
function or method call in the calling program.
|
- Line 22 - Set Another Breakpoint: After
the debugger stops at line 21, click in the grey margin
immediately to the left of the code on line 22 to set another
breakpoint.
Perl Pointer The perl debugger will not
break on certain parts of control structures, such as lines
containing only braces ( {
} ).
With Perl 5.6 and earlier, the debugger will also not break
at the start of while , until ,
for , or foreach statements.
|
- Line 22 - Step Out: It appears that
nothing happened. However, the debugger actually completed one
iteration of the "while loop" (from lines 21 to 51). To see how
this works, set another breakpoint at line 37, and Step Out
again. The debugger will stop at line 37. On the Debug Session
tab, look at the data assigned to the $record variable. Then
Step Out, and notice that $record is no longer displayed, and
the debugger is back on line 21. Step Out again, and look at
the $record variable - it now contains data from the next
record in the input file.
- Line 37 - Stop the Debugger: Select
Debug|Stop to stop the Komodo debugger.
Perl Pointer Did you notice that output
wasn't written to mailexport.xml after every iteration of
the while loop?
This is because Perl maintains an internal buffer for
writing to files. You can set the buffer to "autoflush"
using the special Perl variable "$|".
|
ASPN, the ActiveState Programmer Network
ASPN, the ActiveState
Programmer Network, provides extensive resources for Perl
programmers:
- Free downloads of ActivePerl,
ActiveState's Perl distribution
- Searchable Perl documentation
- Trial versions of Perl tools, like the
Perl Dev Kit and Visual Perl
- The Rx Cookbook, a collaborative library
of regular expressions for Perl
Documentation
There is a wealth of documentation available for Perl. The
first source for language documentation is the Perl distribution
installed on your system. To access the documentation contained
in the Perl distribution, use the following commands:
- Open the Run Command dialog box
(Tools|Run Command), and then type
perldoc perldoc
. A description of the "perldoc"
command will be displayed on your screen. Perldoc is used to
navigate the documentation contained in your Perl
distribution.
Tutorials and Reference Sites
There are many Perl tutorials and beginner Perl sites on the
Internet, such as: