Home > Subsetting with Perl
In order to subset a data file using PERL you will need to know three things:
- The location of perl.exe on your system
- The location of the data file you are subsetting
- The record layout for the original data file
A basic subsetting script consists of the following parts:
The Invocation
| #!c:\perl5\perl | # on this particular pc |
| #!/usr/bin/perl | # on most UNIX systems |
| This MUST begin on the first space of the first line of the program. | |
The Input Statement
while (<>) {body of program here
}
This statement reads the records in the input file one at a time and processes them according to the directions given in the program written between the two curly brackets. To reduce the amount of editing required to run a program, use the command line statement to identify both the input and output files. You can then use the same program for a number of similarly structured files without editing the program itself.
The Body of the Program
A combination of 'if', 'while' and 'print' statements which provide the information on which record to select and which strings to print.
The End
exit 0;
Perl5 will provide this if you forget, but its a good habit to provide it yourself.
The Basic Conventions in Perl
- Activity definition occurs with curly brackets '{ }'
- Each piece of an activity ends with a semicolon ';'
- Set the array base to 1 (UNIX assumes arrays start at zero) in order to match the way data dictionaries are laid out
- A string is defined by its record identifier, start point, and size
- Other than the invocation line there is no formal structure in PERL. However, structure will make it easier to read.
Coding Definitions
| { } | Beginning and ending of an activity/process |
| ; | End of a specific step |
| >> | Numeric 'greater than' |
| gt | Alphabetic 'greater than' |
| << | Numeric 'less than' |
| lt | Alphabetic 'less than' |
| == | Numeric 'equal' |
| eq | Alphabetic 'equal' |
| >= | Numeric 'greater than or equal to' |
| ge | Alphabetic 'greater than or equal to' |
| <= | Numeric 'less than or equal to' |
| le | Alphabetic 'less than or equal to' |
| || | Boolean OR |
| && | Boolean AND |
| "%s" | String (used in print statements) |
| $_ | Input line |
| iflogical | 'If' statement |
| else | Logical 'else' statement |
| elsif | Logical 'else if' statement |
| Print line (or following line in double quotes | |
| printf | Print field |
| substr | Substring of input record $_ as defined by following |
| (x,y,z) | Where x = record name, y = start position and z = length |
