Building regexp from a 'mask' string of placeholders.

submersible_toaster has asked for the wisdom of the Perl Monks concerning the following question:

Mellow Funks,
My problem (no sniggering up the back there) , in this case is to build a regular expression to match and extract some significant digits from some strings, given a mask with a placeholder. For example.

numberedfiles.@.tif # mask with placeholder.
foreground.001.tif
foreground.002.tif # etc ad nauseum.
[download]

of course I am limiting myself for the moment to strings that are dot seperated, and to be honest the code I have works nicely, and allows me to extract the significant digits, but how might I apply this to include a placeholder for extention (in this case image-format) - it becomes more tricky. I am posting what has dribbled out my ears today in the hope that some of you lateral thinkers can widen the crack that floods my brain with light.

#!/usr/bin/perl -w
use strict;

my $mask = shift @ARGV;
my @segment = split /\./, $mask;
my @re = map {  my $r;
                if ($_ eq '@') { $r ='(\d+)' }
                else { $r= '\w+' }
                $r } @segment;

my $re ='^' .  join ('\.' , @re) . '$';
my $name = shift @ARGV;
my ($digits) = $name =~ /$re/;
print $re , $/;
print $digits, $/;
[download]

thanks.
I can't believe it's not psellchecked

Comment on Building regexp from a 'mask' string of placeholders. Select or Download Code

Replies are listed 'Best First'.
Re: Building regexp from a 'mask' string of placeholders. by tachyon (Chancellor) on Feb 10, 2003 at 06:40 UTC
The logic of what you are trying to do eludes me, can you be more specific? Why not just pass your script an appropriate regex string: `$_ = 'some.567.jpg'; m/^[^\.]*\.(\d+)\.(\w+)$/; print "$1 $2"` [download] This will capture the digits and extension into $1 and $2 but I fail to see how one might extrapolate that from a string like 'numberedfiles.@.tif' as this does not give sufficient detail of what is required. It seems to me that by the time you develop a pseudo language to describe what you want you might as well just use the Perl RE language - after all that is what it is designed for. cheers tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply] [d/l]
Re: Re: Building regexp from a 'mask' string of placeholders. by submersible_toaster (Chaplain) on Feb 10, 2003 at 06:47 UTC
Good point , sorry for skimming the specifics, As much fun as it would doubtless be to try to teach 3d animators perl regex syntax, it's hard enough getting them to consistantly name their files, as you see I have graciously been afforded the '.' as a seperator in filenames. So the final objective is a script that given... `./myscript.pl /nfs/bigdisk/renders/monsterrender.cin.@ 50 920` [download] the script is then able to run some verification on each file in the sequence, perform a few operations, etc. The mechanics of this I already have working, but of course they're for my use, so I asked the 3d team for their preferred CLI. Which I am now trying to accomodate. I can't believe it's not psellchecked	[reply] [d/l]
Re: Re: Re: Building regexp from a 'mask' string of placeholders. by tachyon (Chancellor) on Feb 10, 2003 at 07:35 UTC
You asked them for an interface that would seem to not have enough flexibility.... Why not give them a syntax like: # & = words & digits (alphanumeric) # @ = digits # # = word chars ie [a-zA-Z_] # . = literal . used to separate parts of interest # Anything else is taken to be literal in meaning my $str = "some-file.123.tif"; my $find_str = qq!some-#.@.&!; print "Here is our filename: $str\nHere is the interface string: $find +_str\n"; my @bits = split '\.', $find_str; for (@bits) { s/([^&@#a-zA-Z_])/\\$1/g; s/&/\\w/g; s/@/\\d/g; s/#/[a-zA-Z_]/g; $_ = "($_)"; } my $re = join "\\.", @bits; print "Here is the RE: m/$re/"; $re = qr/$re/; my (@matches) = $str =~ m/^$re$/; $" = ', '; print "\nAnd we got: @matches"; __DATA__ Here is our filename: some-file.123.tif Here is the interface string: some-#.@.& Here is the RE: m/(some\-[a-zA-Z_])\.(\d)\.(\w)/ And we got: some-file, 123, tif [download] The syntax is pretty basic - just three chars for word, dogit and alphanumeric. . for the separator. all other chars are literal. This gives you a lot of power to match subsets of filenames and should take no more that 2 minutes to learn.... Update Changed " token to # so you don't have to escape it in the shell as pointed out by waswas-fng tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print	[reply] [d/l]
Re: Re: Re: Re: Building regexp from a 'mask' string of placeholders. by waswas-fng (Curate) on Feb 12, 2003 at 05:17 UTC
Re: Re: Re: Re: Re: Building regexp from a 'mask' string of placeholders. by tachyon (Chancellor) on Feb 12, 2003 at 18:35 UTC
Re: Building regexp from a 'mask' string of placeholders. by parv (Parson) on Feb 10, 2003 at 08:06 UTC
Dear Toaster, Could you rephrase the problem as i cannot find if there is a problem? Did you want to capture the digits matched by `\d+`? You are capturing number of matches (or match return code) w/ `($digits) = $name =~ /$re/` not the digits in the pattern. To capture the digits by `\d+`, try... `my ($digits) = ($name =~ m/$re/);` [download] In addition, since you are not adding anything to `m//` in... `my ($digits) = $name =~ /$re/;` [download] ...complie the `$re` before use (for speed gain)... `my $re = join( '\.' , @re ); $re = qr/^ $re $/x; my $name = shift @ARGV; #my ($digits) = $name =~ /$re/; my ($digits) = ($name =~ /$re/);` [download]	[reply] [d/l] [select]
Re: Building regexp from a 'mask' string of placeholders. by Hofmator (Curate) on Feb 10, 2003 at 10:22 UTC
If I understand you correctly, you'd like to have a regex that extracts digits out of a filename pattern. This pattern contains normal characters matching \w as well as one occurrence of '@' - separated from the rest by '.' Then the following should do what you want: `my $pattern = shift @ARGV; my $name = shift @ARGV; # quotemeta is important to quote all the special # regex chars in the filename pattern $pattern = quotemeta $pattern; # the @ got escaped to '\@' so we have to look for # this sequence instead $pattern =~ s/\\@/(\\d+)/; die 'more than one "@" in pattern' if $pattern =~ tr/@//; my ($number) = $name =~ /^$pattern$/; print $number, $/;` [download] This just replaces the '@' in the string with the capturing '(\d+)' and then does the match. You don't even need the '.' as delimiters - as long as you can ensure that there are only non-digits left and right of the '@'. -- Hofmator	[reply] [d/l]
Re: Building regexp from a 'mask' string of placeholders. by submersible_toaster (Chaplain) on Feb 10, 2003 at 22:42 UTC
Ack, I would seem the interface has changed (big surprise), if anything it's now even simpler and less flexible. <sigh>oh well</sigh> thanks to everyone for helping out here. tachyon++ for that place holder code. I'll not use it 'til I can understand it properly but, as I say this could all have been an academic exercise in terms of the immediate problem. The only bad experience is that which you learn nothing from	[reply]

Update