hewarn has asked for the wisdom of the Perl Monks concerning the following question:

HI, I am trying to find in several files certain lines and print then to one file. These lines all have one common feature, all have word VERSION or Version on them. Character * or ; or # can be also placed before the word, also one of previous characters or nothing plus 1 or 4 or 8 spaces like
" VERSION: 0.11 12-Jul-2002 HWe"
I would like to find elegant string matching solution to find all these lines, now I am using something like
...... my $vers_1 = "VERSION : "; my $vers1 = "VERSION"; my vers3="#Version"; $vers5="*Version"; . .. if($rivi =~ /^$vers_1/ || $rivi =~ /^$vers1/ || $rivi =~ /^$vers3/ ...... { $rivi=~ tr/:/ /d; # To remove ":" marks $rivi=~ tr/;//d; # To remove ";" marks $teksti = $teksti.$rivi; open (T,">>desc.txt"); print T $teksti; close T; $teksti=""; } ........
Most problems cause those lines that have 4 or 8 spaces before the searched word as in my example. Problem number two is how to remove all but the version number (here 0.11) from that line, I get off all but HWe stays occasionally in the line--> 0.11 HWe. Also sometimes when name is written like Firstname LAstname or /Firstname LAstname after date, only Firstname is removed. My combinations in code are:
.... $m0=""; $rivi=~ s/\d+-\w+-\d+/$m0/i; # for /Name type of date $rivi=~ s/$cv\w+/$m0/; # NEW trial for /Name type of date $rivi=~ s/\s+\w+\s+\w+/$m0/; $rivi=~ s/\s+\w+\s+\w+/$m0/; .....
Has anyone any ideas? I have gained a lot by reading Perl Monks FAQs during my first 7 months of using Perl BR Hewarn

Replies are listed 'Best First'.
Re: Matching characters
by PodMaster (Abbot) on Aug 05, 2002 at 07:50 UTC
    Most problems cause those lines that have 4 or 8 spaces before the searched word as in my example. Problem number two is how to remove all but the version number (here 0.11) from that line,
    You're looking for a match, not a substitution. Here's how I'd do it (even if i really have no clue what you're trying to do, perhaps you ought to investigate perlfunc:split, and perhaps define the data you're processing better).
    use strict; use warnings; my @LINES = ( " VERSION: 0.11 12-Jul-2002 HWe", " aaaa#VERSION: 0.11 12-Jul-2002 HWe", " aaaa;VERSION: 0.11 12-Jul-2002 HWe", ); for my $L ( @LINES ) { print "We got VERSION $1\n" if $L =~ m{ VERSION: # literal \s+ # followed by 1 or more space (\S+) # non-space 1 or more, captured to $1 \s # followed by a space }ix; # ignore case, use extended patterns } eval q{ require YAPE::Regex::Explain; print "\n", YAPE::Regex::Explain->new( qr[VERSION: \s+ (\S+) \s]ix )->explain; }; print $@ if $@; __END__ # ran as `perl oy>>oy' We got VERSION 0.11 We got VERSION 0.11 We got VERSION 0.11 The regular expression: (?ix-ms:VERSION: \s+ (\S+) \s) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?ix-ms: group, but do not capture (case-insensitive) (disregarding whitespace and comments) (with ^ and $ matching normally) (with . not matching \n): ---------------------------------------------------------------------- VERSION: 'VERSION:' ---------------------------------------------------------------------- \s+ whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- \S+ non-whitespace (all but \n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- \s whitespace (\n, \r, \t, \f, and " ") ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

    ____________________________________________________
    ** The Third rule of perl club is a statement of fact: pod is sexy.