Perl pattern matching and extraction

Perl has regular expression operators for identifying patterns. The operator

          /regular expression/

returns true of false depending on whether the regular expression matches the contents of $_. For example

         if (/perl/)
         {
               print "String contains perl as a substring";
         }
         if (/(Sat|Sun)day/)
         {
               print "Weekend day....";
         }

The effect is rather like the grep command. To use this operator on other variables you

would write:         

         $variable=~/regexp/

Regular expression can contain parenthetic sub-expressions, e.g.

       if (/(SatlSun)day(..)th(.*)/)
       {
          $first = $1;
          $second = $2;
          $third = $3;
       }

in which case pert places the objects matched by such sub-expressions in the variables $1, $2 etc.

Perl string replace and searching

The `sed'-like function for replacing all occurrences of a string is easily implemented in Perl using

           while (<input>)
           {
                s/$search/$replace/g;
                print output;
           }
          

This example replaces the string inside the default variable. To replace in a general variable we use the operator `=~’, with syntax:             

           $variable=~s/search/replace/

Here is an example of some of this operator in use. The following is a program which searches and replaces a string in several files. This is useful program indeed for making a change globally in a group of files! The program is called 'file-replace'.

#Look through files for find string and change to new string

#in all files.

#Define a temporary file and check it doesn't exist

        #!/local/bin/perl
        $outputfile   "tmpmarkfind";
        unlink $outputfile;
        #Check command line for list of files
        if($#ARGV<0)
        {
             die "Syntax:file-replace [file list]\n";
        }
        print "Enter the string you want to find (Don't use quotes):\n\n:";
        $findstring=<STDIN>;
        print "Enter the string you want to replace with (Don't use quotes):\n\n:";
        chop $findstring;
        $replacestring=<STDIN>;
        chop $replacestring;
        print "\nFind: $findstring\n";
        print "Replace: $replacestring\n";
        print "\nConfirm (y/n) ";
        $y=<STDIN>;
        chop $y;
        if($y ne "y")
        {
              die "Aborted -- nothing done.\n";
        }
        else
        {
              print "Use CTRL-C to interrupt...\n";
        }
        #Now shift default array 0ARGV to get arguments 1 by 1
        while ($file=shift)
        {
           if($file eq "file-replace")
           {
                 print "Findmark will not operate on itself!";
                 next;
           }
           #Save existing mode of file for later
           ($dev,$ino,$mode)=stat($file);
           open(INPUT,$file)|| warn "Couldn't open $file\n"; 
           open(OUTPUT,"> $outputfile")|| warn "Can't open tmp";
           $notify = 1;
           while (<INPUT>)
           {
                 if(/$findstring/&& $notify)
                 {
                      print "Fixing $file...\n";
                      $notify = 0;
                 }
                 s/$findstring/$replacestring/g;
                 print OUTPUT;
           }
           close (OUTPUT);

           #If nothing went wrong(if outfile not empty)
           #move temp file to original and reset thefile mode saved above 
           if(!-z $outputfile)
           {
              rename($outputfile,$file);
              chmod($mode,$file);
           }
           else
           {
              print "Warning: file empty! \n";
           }
        }

Perl regular expression

#

regex '.*'

- prints every line (matches everything)

#

Regex ‘.‘

- all lines except those containing only blanks

 

 

(. doesn't match ws/white-space)

#

regex '[a-z]'

- matches any line containing lowercase

#

Regex [^a-z]

- matches any line containing something which is not lowercase a-z

 #

regex '(A-Za-z]'

- matches any line containing letters of any kind

#

Regex [0-9]

- match any line containing numbers

#

regex '#.*'

- line containing a hash symbol followed by anything

#

regex '^#.*'

- line starting with hash symbol (first char)

#

regex ';\n'

- match line ending in a semi-colon

Example: convert mail to WWW pages

Here is an example program which you could use to automatically turn a mail message of the form

From: Newswire

To: Nail2html

Subject: Nothing happened

On the 13th February at kl. 09:30 nothing happened. No footprints were found leading to the scene of a terrible murder, no evidence of a struggle .... etc etc

           Into an html-file for the world wide web. The program works by extracting the message body and subject from the              mail and writing html-commands around these to make a web page. The subject field of the mail becomes the title.             The other headers get skipped, since the script searches for lines containing the sequence "colon-space" or '. A                     regular expression is used for this.

        #!/local/bin/per'
        #Make HTML from mail
        &BeginWebPage();
        &ReadNewMail();
        &EndWebPage();
        sub BeginWebPage
        {
            print "<HTML>\n";
            print "<BODY>\n";
        }
        sub EndWebPage
        {
            print "</BODY>\n";
            print "</HTML>\n";
        }
        sub ReadNewMail
        {
           while (<>)
           {
              if (/Subject:/) # Search for subject line
              {
                  # Extract subject text...
                  chop;
                  ($left,$right) = split(":",$_);
                  print "<H1> $right </H1>\n";
                  next;
               }
               elsif (/.*:.*/)   # Search for - anything: anything
               {
                   next;         # skip other headers
                   print;
               }
            }
         }

Generate perl web pages / WWW

     The following program scans through the password database and build a standardized html-page for each user it finds there. It fills in the name of the user in each cach. Note the use of the ‘<<’ operator for extended input, already use in the context of the shell, see <undefined> [ pipes and redirection ], page <undefined>. This allows us to format a whole passage of taxt, inserting variables at strstegic places, and avoid having to the print over many lines

      #!/local/bin/perl
      #build a default home page for each user in/etc/passwd
      $true = 1;
      $false = 0;
      #First build an associated array of users and full names setpwent();
      while($true)
      {
          ($name,$passwd,$uid,$gid,$quota,$comment,$fullname) = getpwent; 
           $FullName{$name} = $fullname;
           print "$name - $FullName{name}\n";
           last if($name eq "");
       }
       print "\n";
       # Nov make a unique filename for each page and open a file
       foreach $user (sort keys(%FullName))
       {
           next if ($user eq "");
           print "Making page for $user\n"; $outputfile = "$user.html";
           open (out',"> $outputfile") || die "Can't open $outputfile\n";
           &MakePage;
           close(OUT);
       }
       sub MakePage
       {
          print OUT<<ENDMARKER;
          <HTML>
               <HEAD>
                     <TITLE>$FullName{$user}'s Home Page</TITLE>
               </HEAD>
               <BODY> 
                     <H1>$FullName{$user}'s Home Page</H1>
                     Hi welcome to my home page. In case you hadn't got it yet my name is:
                     $FullName{$user}...
                     I study at <a href=”http://www.abctutorial.com”>Ontario,Canada</a>
               </BODY>
         </HTML>
         ENDMARKER
       }