Matching characters

hewarn has asked for the wisdom of the Perl Monks concerning the following question:

HI, I am trying to find in several files certain lines and print then to one file. These lines all have one common feature, all have word VERSION or Version on them. Character * or ; or # can be also placed before the word, also one of previous characters or nothing plus 1 or 4 or 8 spaces like

"        VERSION:     0.11 12-Jul-2002  HWe"
[download]

I would like to find elegant string matching solution to find all these lines, now I am using something like

......
my $vers_1 = "VERSION         : "; 
my $vers1 = "VERSION";  
my vers3="#Version";
$vers5="*Version";
. ..  


if($rivi =~ /^$vers_1/ || $rivi =~ /^$vers1/ || 
$rivi =~ /^$vers3/  
......      
          
          {         
           
  $rivi=~ tr/:/ /d; # To remove ":" marks
  $rivi=~ tr/;//d; # To remove ";" marks
  $teksti = $teksti.$rivi;
   open (T,">>desc.txt");        
 print T $teksti;     
 close T;
$teksti="";       
           }
                                       
........
[download]

Most problems cause those lines that have 4 or 8 spaces before the searched word as in my example. Problem number two is how to remove all but the version number (here 0.11) from that line, I get off all but HWe stays occasionally in the line--> 0.11 HWe. Also sometimes when name is written like Firstname LAstname or /Firstname LAstname after date, only Firstname is removed. My combinations in code are:

....
$m0="";
$rivi=~ s/\d+-\w+-\d+/$m0/i; # for /Name type of date
$rivi=~ s/$cv\w+/$m0/; # NEW trial for /Name type of date
$rivi=~ s/\s+\w+\s+\w+/$m0/;
$rivi=~ s/\s+\w+\s+\w+/$m0/;
.....
[download]

Has anyone any ideas? I have gained a lot by reading Perl Monks FAQs during my first 7 months of using Perl BR Hewarn

Comment on Matching characters Select or Download Code

Replies are listed 'Best First'.
Re: Matching characters by PodMaster (Abbot) on Aug 05, 2002 at 07:50 UTC
Most problems cause those lines that have 4 or 8 spaces before the searched word as in my example. Problem number two is how to remove all but the version number (here 0.11) from that line, You're looking for a match, not a substitution. Here's how I'd do it (even if i really have no clue what you're trying to do, perhaps you ought to investigate perlfunc:split, and perhaps define the data you're processing better). use strict; use warnings; my @LINES = ( " VERSION: 0.11 12-Jul-2002 HWe", " aaaa#VERSION: 0.11 12-Jul-2002 HWe", " aaaa;VERSION: 0.11 12-Jul-2002 HWe", ); for my $L ( @LINES ) { print "We got VERSION $1\n" if $L =~ m{ VERSION: # literal \s+ # followed by 1 or more space (\S+) # non-space 1 or more, captured to $1 \s # followed by a space }ix; # ignore case, use extended patterns } eval q{ require YAPE::Regex::Explain; print "\n", YAPE::Regex::Explain->new( qr[VERSION: \s+ (\S+) \s]ix )->explain; }; print $@ if $@; __END__ # ran as `perl oy>>oy' We got VERSION 0.11 We got VERSION 0.11 We got VERSION 0.11 The regular expression: (?ix-ms:VERSION: \s+ (\S+) \s) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?ix-ms: group, but do not capture (case-insensitive) (disregarding whitespace and comments) (with ^ and $ matching normally) (with . not matching \n): ---------------------------------------------------------------------- VERSION: 'VERSION:' ---------------------------------------------------------------------- \s+ whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- \S+ non-whitespace (all but \n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- \s whitespace (\n, \r, \t, \f, and " ") ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- [download] `____________________________________________________` ** The Third rule of perl club is a statement of fact: pod is sexy.	[reply] [d/l]

Replies are listed 'Best First'.

Re: Matching characters
by PodMaster (Abbot) on Aug 05, 2002 at 07:50 UTC

Most problems cause those lines that have 4 or 8 spaces before the searched word as in my example. Problem number two is how to remove all but the version number (here 0.11) from that line,

perlfunc:split

use strict;
use warnings;
my @LINES = (
"        VERSION:     0.11 12-Jul-2002  HWe",
" aaaa#VERSION:     0.11 12-Jul-2002  HWe",
" aaaa;VERSION:     0.11 12-Jul-2002  HWe", );

for my $L ( @LINES ) {
    print "We got VERSION $1\n"
      if $L =~ m{
          VERSION: # literal
          \s+      # followed by 1 or more space
          (\S+)    #  non-space 1 or more, captured to $1
          \s       # followed by a space
      }ix;         # ignore case, use extended patterns
}

eval q{
    require YAPE::Regex::Explain; 
    print "\n", 
    YAPE::Regex::Explain->new(
        qr[VERSION: \s+ (\S+) \s]ix
    )->explain;
};
print $@ if $@;
__END__
# ran as `perl oy>>oy'


We got VERSION 0.11
We got VERSION 0.11
We got VERSION 0.11

The regular expression:

(?ix-ms:VERSION: \s+ (\S+) \s)

matches as follows:
  
NODE                     EXPLANATION
----------------------------------------------------------------------
(?ix-ms:                 group, but do not capture (case-insensitive)
                         (disregarding whitespace and comments) (with
                         ^ and $ matching normally) (with . not
                         matching \n):
----------------------------------------------------------------------
  VERSION:                 'VERSION:'
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    \S+                      non-whitespace (all but \n, \r, \t, \f,
                             and " ") (1 or more times (matching the
                             most amount possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
[download]

____________________________________________________
** The Third rule of perl club is a statement of fact: pod is sexy.

[reply]
[d/l]