submersible_toaster has asked for the wisdom of the Perl Monks concerning the following question:

Mellow Funks,
My problem (no sniggering up the back there) , in this case is to build a regular expression to match and extract some significant digits from some strings, given a mask with a placeholder. For example.

numberedfiles.@.tif # mask with placeholder. foreground.001.tif foreground.002.tif # etc ad nauseum.
of course I am limiting myself for the moment to strings that are dot seperated, and to be honest the code I have works nicely, and allows me to extract the significant digits, but how might I apply this to include a placeholder for extention (in this case image-format) - it becomes more tricky. I am posting what has dribbled out my ears today in the hope that some of you lateral thinkers can widen the crack that floods my brain with light.


#!/usr/bin/perl -w use strict; my $mask = shift @ARGV; my @segment = split /\./, $mask; my @re = map { my $r; if ($_ eq '@') { $r ='(\d+)' } else { $r= '\w+' } $r } @segment; my $re ='^' . join ('\.' , @re) . '$'; my $name = shift @ARGV; my ($digits) = $name =~ /$re/; print $re , $/; print $digits, $/;
thanks.
I can't believe it's not psellchecked

Replies are listed 'Best First'.
Re: Building regexp from a 'mask' string of placeholders.
by tachyon (Chancellor) on Feb 10, 2003 at 06:40 UTC

    The logic of what you are trying to do eludes me, can you be more specific? Why not just pass your script an appropriate regex string:

    $_ = 'some.567.jpg'; m/^[^\.]*\.(\d+)\.(\w+)$/; print "$1 $2"

    This will capture the digits and extension into $1 and $2 but I fail to see how one might extrapolate that from a string like 'numberedfiles.@.tif' as this does not give sufficient detail of what is required. It seems to me that by the time you develop a pseudo language to describe what you want you might as well just use the Perl RE language - after all that is what it is designed for.

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

      Good point , sorry for skimming the specifics,
      As much fun as it would doubtless be to try to teach 3d animators perl regex syntax, it's hard enough getting them to consistantly name their files, as you see I have graciously been afforded the '.' as a seperator in filenames. So the final objective is a script that given...

      ./myscript.pl /nfs/bigdisk/renders/monsterrender.cin.@ 50 920
      the script is then able to run some verification on each file in the sequence, perform a few operations, etc. The mechanics of this I already have working, but of course they're for my use, so I asked the 3d team for their preferred CLI. Which I am now trying to accomodate.


      I can't believe it's not psellchecked

        You asked them for an interface that would seem to not have enough flexibility.... Why not give them a syntax like:

        # & = words & digits (alphanumeric) # @ = digits # # = word chars ie [a-zA-Z_] # . = literal . used to separate parts of interest # Anything else is taken to be literal in meaning my $str = "some-file.123.tif"; my $find_str = qq!some-#.@.&!; print "Here is our filename: $str\nHere is the interface string: $find +_str\n"; my @bits = split '\.', $find_str; for (@bits) { s/([^&@#a-zA-Z_])/\\$1/g; s/&/\\w*/g; s/@/\\d*/g; s/#/[a-zA-Z_]*/g; $_ = "($_)"; } my $re = join "\\.", @bits; print "Here is the RE: m/$re/"; $re = qr/$re/; my (@matches) = $str =~ m/^$re$/; $" = ', '; print "\nAnd we got: @matches"; __DATA__ Here is our filename: some-file.123.tif Here is the interface string: some-#.@.& Here is the RE: m/(some\-[a-zA-Z_]*)\.(\d*)\.(\w*)/ And we got: some-file, 123, tif

        The syntax is pretty basic - just three chars for word, dogit and alphanumeric. . for the separator. all other chars are literal. This gives you a lot of power to match subsets of filenames and should take no more that 2 minutes to learn....

        Update

        Changed " token to # so you don't have to escape it in the shell as pointed out by waswas-fng

        tachyon

        s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Re: Building regexp from a 'mask' string of placeholders.
by parv (Parson) on Feb 10, 2003 at 08:06 UTC

    Dear Toaster,

    Could you rephrase the problem as i cannot find if there is a problem? Did you want to capture the digits matched by \d+?

    You are capturing number of matches (or match return code) w/ ($digits) = $name =~ /$re/ not the digits in the pattern. To capture the digits by \d+, try...

    my ($digits) = ($name =~ m/$re/);

    In addition, since you are not adding anything to m// in...

    my ($digits) = $name =~ /$re/;

    ...complie the $re before use (for speed gain)...

    my $re = join( '\.' , @re ); $re = qr/^ $re $/x; my $name = shift @ARGV; #my ($digits) = $name =~ /$re/; my ($digits) = ($name =~ /$re/);
Re: Building regexp from a 'mask' string of placeholders.
by Hofmator (Curate) on Feb 10, 2003 at 10:22 UTC
    If I understand you correctly, you'd like to have a regex that extracts digits out of a filename pattern. This pattern contains normal characters matching \w as well as one occurrence of '@' - separated from the rest by '.'

    Then the following should do what you want:

    my $pattern = shift @ARGV; my $name = shift @ARGV; # quotemeta is important to quote all the special # regex chars in the filename pattern $pattern = quotemeta $pattern; # the @ got escaped to '\@' so we have to look for # this sequence instead $pattern =~ s/\\@/(\\d+)/; die 'more than one "@" in pattern' if $pattern =~ tr/@//; my ($number) = $name =~ /^$pattern$/; print $number, $/;
    This just replaces the '@' in the string with the capturing '(\d+)' and then does the match. You don't even need the '.' as delimiters - as long as you can ensure that there are only non-digits left and right of the '@'.

    -- Hofmator

Re: Building regexp from a 'mask' string of placeholders.
by submersible_toaster (Chaplain) on Feb 10, 2003 at 22:42 UTC

    Ack, I would seem the interface has changed (big surprise), if anything it's now even simpler and less flexible. <sigh>oh well</sigh> thanks to everyone for helping out here. tachyon++ for that place holder code. I'll not use it 'til I can understand it properly but, as I say this could all have been an academic exercise in terms of the immediate problem.
    The only bad experience is that which you learn nothing from