vek has asked for the wisdom of the Perl Monks concerning the following question:

Regular expressions are not my strong point so perhaps a monk or two could take pity on me.

I've got an error string: INVALID ORDER-NO 4546090, the string is sometimes prepended with a directory path & filename: /base-dir/dir/foo.txt INVALID ORDER-NO 4546090. If the string does contain the directory-path/filename, I want to remove said directory path & filename and end up with just INVALID ORDER-NO 4546090. I used the following regular expression:
s/^[\/\S+?]*//g;
Which appears to work until I come across an error string without the directory path (i.e INVALID ORDER-NO 4546090). I end up with this string: ORDER-NO 4546090. I'm sure this is something very simple I'm overlooking.

TIA!

Replies are listed 'Best First'.
Re: A poor bloke and his regex...
by dws (Chancellor) on Mar 01, 2002 at 17:56 UTC
    I want to remove said directory path & filename and end up with just INVALID ORDER-NO 4546090.

    Two approaches. One might fit for you.

    Approach 1 - nuke leading paths   s#^/[^ ] //; Approach 2 - isolate only the part you want

    m/(INVALID ORDER-NO \d+)/; $_ = $1;
Re: A poor bloke and his regex...
by grep (Monsignor) on Mar 01, 2002 at 17:53 UTC
    I think this should work for you (WARNING: I did it really quickly):
    #!/usr/bin/perl -w use strict; while (<DATA>) { s/^\/[\S]+\s//; print "$_\n"; } __DATA__ INVALID ORDER-NO 4546090 INVALID ORDER-NO 4546090 INVALID ORDER-NO 4546090 INVALID ORDER-NO 4546090 INVALID ORDER-NO 4546090 /base-dir/dir/foo.txt INVALID ORDER-NO 4546090 INVALID ORDER-NO 4546090 INVALID ORDER-NO 4546090
    This assumes your error msg will never start with /.

    UPDATE: I knew I did this too fast:
  • I had a useless character class
  • didn't chomp THX tadman

  • so rewritten:
    while (<DATA>) { chomp; s/^\/\S+\s//; print "$_\n"; }


    grep
    grep> grep clue /home/users/*
      Cheers grep, works like a charm!

      UPDATE: Thanks to all who replied, all good suggestions BTW.
Re: A poor bloke and his regex...
by tadman (Prior) on Mar 01, 2002 at 18:00 UTC
    grep has a quick solution which will probably work, but there's one thing to keep in mind. Filenames can have spaces in them, and this can break things. Maybe you want to search for the error message, like so:
    my ($error) = /(INVALID.*)/;
    Which should grab any error which begins with the text "INVALID".

    This, of course, presupposes that you do not have a file called "INVALID.FILE" or somesuch. Without modifying the reporting routine, you're forced to have some faith..

    On the other hand, if you can be certain of the errors you are trying to trap, and their relevant formatting, you should probably try and extract the error rather than remove the filename. Give this a shot:
    my ($error) = /(INVALID ORDER-NO \d+)$/;
    Make sure you've chomped accordingly.
Re: A poor bloke and his regex...
by dmmiller2k (Chaplain) on Mar 01, 2002 at 18:09 UTC

    Question: is the error string ALWAYS specifically begin with, 'INVALID ORDER ...', or are there other error messages, too? If other kinds of errors may appear (with or without a leading /path/and/filename), then I would recommend grep's solution -- remove the offending path, leaving the rest.

    Of course, if the ONLY error message you will get ALWAYS starts with 'INVALID ORDER ', then the solution can be made even less prone to mismatches (just take 'INVALID ERROR ' and anything that follows it):

    <CODE> s/(INVALID ORDER .*)$/$1/;

    dmm

    If you GIVE a man a fish you feed him for a day
    But,
    TEACH him to fish and you feed him for a lifetime
Re: A poor bloke and his regex...
by dmouille (Novice) on Mar 01, 2002 at 18:56 UTC
    Just to add a bit. Your original regex didn't work because you included the first slash as part of a character class. The regex would match a string beginning with a slash or any (other) non-whitespace until it came to a whitespace. This is why INVALID was being removed when you didn't want to. By moving the slash outside the character class, the string only matches when it begins with a slash.
Re: A poor bloke and his regex...
by simon.proctor (Vicar) on Mar 01, 2002 at 17:59 UTC
    I worked on the assumption that your path started with a /. The problem sounded like a reverse regex problem to me so thats how I solved it. The other methods are probably better but what the hey :).
    use strict; use warnings; my $string = "/base-dir/dir/foo.txt INVALID ORDER-NO 4546090"; my $string2 = "INVALID ORDER-NO 4546090"; print $string,"\n"; $string = join('',reverse(split('', $string))); $string =~ s/\s*?(\/|-|\w|\.|\d)+$//; $string = join('',reverse(split('', $string))); print $string,"\n\n"; print $string2,"\n"; $string2 = join('',reverse(split('', $string2))); $string2 =~ s/\s*?\/(-|\w|\.|\d)+$//; $string2 = join('',reverse(split('', $string2))); print $string2;

    This produced:
    H:\HTML_T~1\perl>perl regex.pl /base-dir/dir/foo.txt INVALID ORDER-NO 4546090 INVALID ORDER-NO 4546090 INVALID ORDER-NO 4546090 INVALID ORDER-NO 4546090
    HTH,

    Simon
Re: A poor bloke and his regex...
by Anonymous Monk on Mar 01, 2002 at 19:55 UTC
    use strict; use warnings; while (<DATA>) { chomp; my ($dir, $crap) = split ' ', $_, 2; printf "%s\n", $dir eq 'INVALID' ? $dir.' '.$crap : $crap; } __DATA__ /base-dir/dir/foo.txt INVALID ORDER-NO 4546090 /base-dir/dir/foo.txt INVALID ORDER-NO 4546090 INVALID ORDER-NO 4546090 INVALID ORDER-NO 4546090