185x140Header
Home > Subsetting with Perl

In order to subset a data file using PERL you will need to know three things:

A basic subsetting script consists of the following parts:

The Invocation

#!c:\perl5\perl # on this particular pc
#!/usr/bin/perl # on most UNIX systems
This MUST begin on the first space of the first line of the program.

The Input Statement

while (<>) {

body of program here

}

This statement reads the records in the input file one at a time and processes them according to the directions given in the program written between the two curly brackets. To reduce the amount of editing required to run a program, use the command line statement to identify both the input and output files. You can then use the same program for a number of similarly structured files without editing the program itself.

The Body of the Program

A combination of 'if', 'while' and 'print' statements which provide the information on which record to select and which strings to print.

The End

exit 0;

Perl5 will provide this if you forget, but its a good habit to provide it yourself.

The Basic Conventions in Perl

Coding Definitions

{ } Beginning and ending of an activity/process
; End of a specific step
>> Numeric 'greater than'
gt Alphabetic 'greater than'
<< Numeric 'less than'
lt Alphabetic 'less than'
== Numeric 'equal'
eq Alphabetic 'equal'
>= Numeric 'greater than or equal to'
ge Alphabetic 'greater than or equal to'
<= Numeric 'less than or equal to'
le Alphabetic 'less than or equal to'
|| Boolean OR
&& Boolean AND
"%s" String (used in print statements)
$_ Input line
iflogical 'If' statement
else Logical 'else' statement
elsif Logical 'else if' statement
print Print line (or following line in double quotes
printf Print field
substr Substring of input record $_ as defined by following
(x,y,z) Where x = record name, y = start position and z = length