Perl pattern matching and extraction

Perl has regular expression operators for identifying patterns. The operator

          /regular expression/

returns true of false depending on whether the regular expression matches the contents of $_. For example

         if (/perl/)
               print "String contains perl as a substring";
         if (/(Sat|Sun)day/)
               print "Weekend day....";

The effect is rather like the grep command. To use this operator on other variables you

would write:         


Regular expression can contain parenthetic sub-expressions, e.g.

       if (/(SatlSun)day(..)th(.*)/)
          $first = $1;
          $second = $2;
          $third = $3;

in which case pert places the objects matched by such sub-expressions in the variables $1, $2 etc.

Perl string replace and searching

The `sed'-like function for replacing all occurrences of a string is easily implemented in Perl using

           while (<input>)
                print output;

This example replaces the string inside the default variable. To replace in a general variable we use the operator `=~’, with syntax:             


Here is an example of some of this operator in use. The following is a program which searches and replaces a string in several files. This is useful program indeed for making a change globally in a group of files! The program is called 'file-replace'.

#Look through files for find string and change to new string

#in all files.

#Define a temporary file and check it doesn't exist

        $outputfile   "tmpmarkfind";
        unlink $outputfile;
        #Check command line for list of files
             die "Syntax:file-replace [file list]\n";
        print "Enter the string you want to find (Don't use quotes):\n\n:";
        print "Enter the string you want to replace with (Don't use quotes):\n\n:";
        chop $findstring;
        chop $replacestring;
        print "\nFind: $findstring\n";
        print "Replace: $replacestring\n";
        print "\nConfirm (y/n) ";
        chop $y;
        if($y ne "y")
              die "Aborted -- nothing done.\n";
              print "Use CTRL-C to interrupt...\n";
        #Now shift default array 0ARGV to get arguments 1 by 1
        while ($file=shift)
           if($file eq "file-replace")
                 print "Findmark will not operate on itself!";
           #Save existing mode of file for later
           open(INPUT,$file)|| warn "Couldn't open $file\n"; 
           open(OUTPUT,"> $outputfile")|| warn "Can't open tmp";
           $notify = 1;
           while (<INPUT>)
                 if(/$findstring/&& $notify)
                      print "Fixing $file...\n";
                      $notify = 0;
                 print OUTPUT;
           close (OUTPUT);

           #If nothing went wrong(if outfile not empty)
           #move temp file to original and reset thefile mode saved above 
           if(!-z $outputfile)
              print "Warning: file empty! \n";

Perl regular expression


regex '.*'

- prints every line (matches everything)


Regex ‘.‘

- all lines except those containing only blanks



(. doesn't match ws/white-space)


regex '[a-z]'

- matches any line containing lowercase


Regex [^a-z]

- matches any line containing something which is not lowercase a-z


regex '(A-Za-z]'

- matches any line containing letters of any kind


Regex [0-9]

- match any line containing numbers


regex '#.*'

- line containing a hash symbol followed by anything


regex '^#.*'

- line starting with hash symbol (first char)


regex ';\n'

- match line ending in a semi-colon

Example: convert mail to WWW pages

Here is an example program which you could use to automatically turn a mail message of the form

From: Newswire

To: Nail2html

Subject: Nothing happened

On the 13th February at kl. 09:30 nothing happened. No footprints were found leading to the scene of a terrible murder, no evidence of a struggle .... etc etc

           Into an html-file for the world wide web. The program works by extracting the message body and subject from the              mail and writing html-commands around these to make a web page. The subject field of the mail becomes the title.             The other headers get skipped, since the script searches for lines containing the sequence "colon-space" or '. A                     regular expression is used for this.

        #Make HTML from mail
        sub BeginWebPage
            print "<HTML>\n";
            print "<BODY>\n";
        sub EndWebPage
            print "</BODY>\n";
            print "</HTML>\n";
        sub ReadNewMail
           while (<>)
              if (/Subject:/) # Search for subject line
                  # Extract subject text...
                  ($left,$right) = split(":",$_);
                  print "<H1> $right </H1>\n";
               elsif (/.*:.*/)   # Search for - anything: anything
                   next;         # skip other headers

Generate perl web pages / WWW

     The following program scans through the password database and build a standardized html-page for each user it finds there. It fills in the name of the user in each cach. Note the use of the ‘<<’ operator for extended input, already use in the context of the shell, see <undefined> [ pipes and redirection ], page <undefined>. This allows us to format a whole passage of taxt, inserting variables at strstegic places, and avoid having to the print over many lines

      #build a default home page for each user in/etc/passwd
      $true = 1;
      $false = 0;
      #First build an associated array of users and full names setpwent();
          ($name,$passwd,$uid,$gid,$quota,$comment,$fullname) = getpwent; 
           $FullName{$name} = $fullname;
           print "$name - $FullName{name}\n";
           last if($name eq "");
       print "\n";
       # Nov make a unique filename for each page and open a file
       foreach $user (sort keys(%FullName))
           next if ($user eq "");
           print "Making page for $user\n"; $outputfile = "$user.html";
           open (out',"> $outputfile") || die "Can't open $outputfile\n";
       sub MakePage
          print OUT<<ENDMARKER;
                     <TITLE>$FullName{$user}'s Home Page</TITLE>
                     <H1>$FullName{$user}'s Home Page</H1>
                     Hi welcome to my home page. In case you hadn't got it yet my name is:
                     I study at <a href=””>Ontario,Canada</a>