Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Regular expression

by PugSA (Beadle)
on Sep 16, 2008 at 13:17 UTC ( [id://711676]=perlquestion: print w/replies, xml ) Need Help??

PugSA has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks

I postes in chatterbox but thought to post a more accurate picture of what I am trying to accomplish Thank you for your time

I am tryning to match in $record CPOSTA_1221170039_C1_F1 but it includes the whole path when it matches

This is what I match => 49\backup\rbgmst02_dd2\stu1\CPOSTA_1221170039_C1_F1

how do I get it not to include the \ and why does it match the \ for interest sake?

I have tried [^\\] but it then just does not match anything

$record = "FRAG 1 1 2000000 0 0 0 0 \\\\172.20.13.49\\backup\\rbgmst02 +_dd2\\stu1\\CPOSTA_1221170039_C1_F1 rbgmst02 65536 0 0 -1 0 *NULL* 12 +22379639 1 65537 0 0 0 0 0 0 0"; if($record =~ m|FRAG|) { if($record =~ m|([A-z0-9]+[_][0-9]+[_]C[0-9]+[_][A-z0-9]+)|) { #print $2."\n"; push(@netbck_list,$1); print $1."\n"; } #if($record =~ m|FRAG(.*?)\/([A-z0-9_-]+[_][0-9]+[_]C[0-9]+[_][A +-z0-9]+)|) else { #print "FRAG RECORD NOT MATCHED => $record \n"; } } #if($record =~ m|FRAG|)

Replies are listed 'Best First'.
Re: Regular expression
by Fletch (Bishop) on Sep 16, 2008 at 13:23 UTC

    Rather than trying to roll your own there's the core File::Basename module . . .

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Re: Regular expression
by johngg (Canon) on Sep 16, 2008 at 14:36 UTC
    Corion and shmem have both suggested an approach using a split on whitespace. You would need to take precautions if the path can include spaces. This can be done by isolating the path field using three-argument splits from either end of the record.

    use strict; use warnings; my @records = ( qq{FRAG 1 1 2000000 0 0 0 0 \\\\172.20.13.49\\backup\\rbgmst02_dd2\ +\stu1\\CPOSTA_1221170039_C1_F1 rbgmst02 65536 0 0 -1 0 *NULL* 1222379 +639 1 65537 0 0 0 0 0 0 0}, qq{FRAG 1 1 2000000 0 0 0 0 \\\\172.20.13.49\\backup\\rbgmst02_dd2\ +\New Folder\\Space File rbgmst02 65536 0 0 -1 0 *NULL* 1222379639 1 6 +5537 0 0 0 0 0 0 0}, ); foreach my $record ( @records ) { my @leftFields = split m{\s+}, $record, 9; my $path = ( reverse map { $_ = reverse } split m{\s+}, reverse( $leftFields[ 8 ] ), 18 )[ 0 ]; my $basename = ( split m{\\}, $path )[ -1 ]; print qq{$path\n $basename\n}; }

    The output,

    \\172.20.13.49\backup\rbgmst02_dd2\stu1\CPOSTA_1221170039_C1_F1 CPOSTA_1221170039_C1_F1 \\172.20.13.49\backup\rbgmst02_dd2\New Folder\Space File Space File

    I hope this is of interest.

    Cheers,

    JohnGG

Re: Regular expression
by Corion (Patriarch) on Sep 16, 2008 at 13:27 UTC

    Also in the chatterbox, I pointed out File::Basename and File::Spec to you. Both will only work for your filenames if you're on Win32. If you're trying this on a non-Windows platform, the mentioned alternative of using /([^\\+])$//([^\\]+)$/ to extract the filename part will still work, with the minor change of using whitespace instead of end of line as the anchor.

    Personally, I would approach your log parsing task by splitting on whitespace and then looking closer at the filename part:

    my @entries = split "", $record; if ($record[0] eq 'FRAG') { warn "File is $record[8]"; $record[8] =~ /([^\\+])$/ or die "Malformed file entry: '$record[8]'"; warn "Basename is '$1'"; };

    Update: Fixed bad typo that rendered the RE useless.

      I'm on a Solaris machine and tried ([^\\+]) but can't match it at end of line thats why I posted the code here I'll try the split Thank you

        You could use File::Basename, split and tr:

        use File::Basename; $record = "FRAG 1 1 2000000 0 0 0 0 \\\\172.20.13.49\\backup\\rbgmst02 +_dd2\\stu1\\CPOSTA_1221170039_C1_F1 rbgmst02 65536 0 0 -1 0 *NULL* 12 +22379639 1 65537 0 0 0 0 0 0 0"; if($record =~ m|FRAG|) { my $str = (split " ", $record)[8]; $str =~ tr[\\][/]; print basename($str),"\n"; } __END__ CPOSTA_1221170039_C1_F1

        Alternative:

        $record = "FRAG 1 1 2000000 0 0 0 0 \\\\172.20.13.49\\backup\\rbgmst02 +_dd2\\stu1\\CPOSTA_1221170039_C1_F1 rbgmst02 65536 0 0 -1 0 *NULL* 12 +22379639 1 65537 0 0 0 0 0 0 0"; if($record =~ m|FRAG|) { $record =~ /(?<=\\)(\w+)(?:\s|$)/; print "$1\n"; } __END__ CPOSTA_1221170039_C1_F1

        See perlre.

Re: Regular expression
by pjotrik (Friar) on Sep 16, 2008 at 13:45 UTC
    A-z
    That's why, there's several non-alphabetic characters between 'Z' and 'a', including the backslash. [A-Za-z0-9] or [[:alnum:]] will match alphanumeric characters.

    Update: When you're not sure why your regex doesn't work, use re 'debug';

Re: Regular expression
by Grey Fox (Chaplain) on Sep 16, 2008 at 14:38 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://711676]
Approved by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2024-04-24 08:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found