Brian268 has asked for the wisdom of the Perl Monks concerning the following question:

Hi I'm trying to understand regurlar exprresions in the folowing only one works (merilyn14) every thing else comes out in FoxyM and i'm not sure why can you help?

In print staitmets give the example of what i'm exspecting

#!/usr/bin/perl @Files =( "filename=merilyn14.jpg.jpeg", "filename=003_Merilyn23.jpg.jpeg", "filename=890FoxyM.jpg.jpeg", "filename=006.jpg.jpeg" ); foreach my $file (@Files) { if ($file =~ m/filename=(\d.*?)([a-zA-Z].*?)\.j/i) { $nFold =$2; $nFile = $1 . $2 . ".jpg"; print "FoxyM New fold = $nFold file = $nFile\n"; } elsif ($file =~ m/filename=([a-zA-Z].*?)(\d.*)\.j/i) { $nFold =$1; $nFile = $1 . $2 . ".jpg"; print "merilyn14 New fold = $nFold file = $nFile\n"; } elsif ($file =~ m/filename=(\d.*?)_([a-zA-Z].*)(\d.*?)\.j/i) { $nFold =$1 ."_" . $2; $nFile = $1 ."_" . $2 . $3 . ".jpg"; print "003_Merilyn23 New fold = $nFold file = $nFile\n"; } elsif ($file =~ m/filename=(\d.*?)\.j/i) { $nFold ="FileNum"; $nFile = $1 . ".jpg"; print "006 New fold = $nFold file = $nFile\n"; } }

Replies are listed 'Best First'.
Re: RegEx Help
by BrowserUk (Patriarch) on Dec 17, 2011 at 12:34 UTC
    m/filename=(\d.*?)([a-zA-Z].*?)\.j/i)

    I think that you probably want \d*? and [a-zA-Z]*?. Note the absence of the '.'.

    Ie. You are applying the quantifier (*?) to the '.', not to the character class you want to match.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

Re: RegEx Help
by ww (Archbishop) on Dec 17, 2011 at 21:10 UTC

    For starters, Perl and CPAN will help you solve your puzzle:

    use YAPE::Regex::Explain; say YAPE::Regex::Explain->new( qr{/(\d.*?)/i} )->explain;
    prints, in relevant part,:
    ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- \d digits (0-9) ---------------------------------------------------------------------- .*? any character except \n (0 or more times (matching the least amount possible)) ---------------------------------------------------------------------- ) end of \1

    On other words, at lines 11 and 24, you'll be better off with a simple "\d+" to deal with any number of digits, before moving on to other possibilities.

    Semi-OT NOTE: YAPE::Regex's use of "digits" plural ! for a single "\d" seems misleading, to me. Yes, of course the next char, matched by the dot, could be a digit, but that explanation is obscured by the use of the plural in the explanation of \d).

    Here's a very loose extrapolation/translation of YAPE::Regex's output in language I think is actually more precise:

    ---------------------------------------- \d one digit <br> ---------------------------------------- .*? anything (except a newline), with as few as zero instances (times) of anything ...and absolutely no more than (in the case of yo +ur sample data) 'one additional digit'." ----------------------------------------

    When the best tools you know how to use come up short, you can use a few print statements for debugging many problems -- t'ain't necessarily the best way, but you might find illumination were you to insert just two such:

    1. as the first line of the main if -- (print "\t I'm in if \n";)
      ...and
    2. another as a new line in the first elsif -- (print "\t I'm in the first elsif \n";).

    Obligatory observation: you're not using strict and warnings. You should, at least until you know Perl so well that their assistance is no longer needed to write code that (usually) does what you want and expect... and so well that you know when you can save yourself the line(s) of code.

    In case you're wondering at the much-belated and possibly-redundant post, this node has been a work in progress for many hours -- interspersed with Fire calls (2 each), SWMBO's decision (1 each) that this was the day to select and cut an Xmas tree, and loading/delivering a cord of wood (1 each, $350/cord) with a friend... Probably not much point in saying more; maybe not even in this, at this late date.

      Unfortunately \d is not always [0-9]. Under unicode it includes rather more codepoints than 10. If you are writing validation logic it is better to not use \d and use [0-9] instead.
        Good point. ++
Re: RegEx Help
by pvaldes (Chaplain) on Dec 17, 2011 at 18:35 UTC
    @Files =( "filename=merilyn14.jpg.jpeg", "filename=...) $file = "filename=merilyn14.jpg.jpeg"

    mmmh... That seems a strange way to me to call a file, a source of potential troubles with your script in the future, "filename=merilyn14.jpg.jpeg" is not a file, nor even a filename, I will fix this if I were you.

    $file = "merilyn14.jpg.jpeg"

    more questions

    print "FoxyM New fold = $nFold file = $nFile\n";

    Maybe you will prefer this, probably less ambiguous and more readable, form...

    print "FoxyM New fold = ", $nFold, " file = ",$nFile, "\n";

    And

    foreach my $file (@Files) { if ($file =~ m/filename=(\d.*?)([a-zA-Z].*?)\.j/i)

    ... same as:

    foreach my $file (@Files) { if (/^filename=(\d.*?)([a-zA-Z].*?)\.j/)

    Don't need to use /i in the match if you are using also a-zA-Z

      ++ for the rest, but "wrong" to your final comment:

      The /i has no impact with the character class ... but may very well have to do with the "j". If OP was concerned about the various combinations of .jpg, .JPG, Jepg, JPEG and so on, then the /i is, in fact, very much needed.

      Updated to clarify the last sentence above

        I didn't see the extension. Welldone, smart note...

        But, although you are pointing in the right way, in this particular case could be a false problem...

        Remember, this is what we are really feeding to the script.

        @Files =( "filename=merilyn14.jpg.jpeg", "filename=003_Merilyn23.jpg.jpeg", "filename=890FoxyM.jpg.jpeg", "filename=006.jpg.jpeg" );

        At least in some operative systems "filename=merilyn14.jpg.JPEG" will never be evaluated or modified by the regexp, because is not in the foreach loop, so be aware of this point. We can safely avoid /i here if we are in Linux, cause its our duty to check before what we put in @Files, and even if it was one or several JPG we should change only the regexp for those matches. There is not point into having a regexp searching something that we know will never match.

        You say "/i has no impact with the character class". Which in this case is true as the class is [A-Za-z] but is not generally true:
        perl -Mre=debug -e'/[a-z]/i' Compiling REx "[a-z]" Final program: 1: ANYOF[A-Za-z][] (12) 12: END (0) stclass ANYOF[A-Za-z][] minlen 1 Freeing REx: "[a-z]"
Re: RegEx Help
by TJPride (Pilgrim) on Dec 17, 2011 at 16:11 UTC
    If I were you, I'd also standardize the naming format and lowercase everything while I was at it, but maybe that's just me.

    use strict; use warnings; my @files = qw| filename=merilyn14.jpg.jpeg filename=003_Merilyn23.jpg.jpeg filename=890FoxyM.jpg.jpeg filename=006.jpg.jpeg |; my ($file, $folder); for (@files) { next if !s/^filename=//; s/(\.jpe?g)+$//; $file = "$_.jpg"; s/^\d+_?//; s/_?\d+$//; $folder = $_ || 'FileNum'; print "$file -> $folder\n"; } __DATA__ OUTPUT: merilyn14.jpg -> merilyn 003_Merilyn23.jpg -> Merilyn 890FoxyM.jpg -> FoxyM 006.jpg -> FileNum