RegEx Help

Brian268 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: RegEx Help by BrowserUk (Patriarch) on Dec 17, 2011 at 12:34 UTC
`m/filename=(\d.?)([a-zA-Z].?)\.j/i)` I think that you probably want `\d?` and `[a-zA-Z]?`. Note the absence of the '.'. Ie. You are applying the quantifier (*?) to the '.', not to the character class you want to match. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply] [d/l] [select]
Re: RegEx Help by ww (Archbishop) on Dec 17, 2011 at 21:10 UTC
For starters, Perl and CPAN will help you solve your puzzle: `use YAPE::Regex::Explain; say YAPE::Regex::Explain->new( qr{/(\d.?)/i} )->explain;` [download] prints, in relevant part,: `---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- \d digits (0-9) ---------------------------------------------------------------------- .? any character except \n (0 or more times (matching the least amount possible)) ---------------------------------------------------------------------- ) end of \1` [download] On other words, at lines 11 and 24, you'll be better off with a simple "`\d+`" to deal with any number of digits, before moving on to other possibilities. Semi-OT NOTE: YAPE::Regex's use of "digits" plural ! for a single "`\d`" seems misleading, to me. Yes, of course the next char, matched by the dot, could be a digit, but that explanation is obscured by the use of the plural in the explanation of `\d`). Here's a very loose extrapolation/translation of YAPE::Regex's output in language I think is actually more precise: `---------------------------------------- \d one digit <br> ---------------------------------------- .? anything (except a newline), with as few as zero instances (times) of anything ...and absolutely no more than (in the case of yo +ur sample data) 'one additional digit'." ----------------------------------------` [download] When the best tools you know how to use come up short, you can use a few print statements for debugging many problems -- t'ain't necessarily the best way, but you might find illumination were you to insert just two such: as the first line of the main `if` -- (`print "\t I'm in if \n";`) ...and another as a new line in the first `elsif` -- (`print "\t I'm in the first elsif \n";`). Obligatory observation: you're not using `strict` and `warnings`. You should, at least until you know Perl so well that their assistance is no longer needed to write code that (usually) does what you want and expect... and so well that you know when you can save yourself the line(s) of code. In case you're wondering at the much-belated and possibly-redundant post, this node has been a work in progress for many hours -- interspersed with Fire calls (2 each), SWMBO's decision (1 each) that this* was the day to select and cut an Xmas tree, and loading/delivering a cord of wood (1 each, $350/cord) with a friend... Probably not much point in saying more; maybe not even in this, at this late date.	[reply] [d/l] [select]
Re^2: RegEx Help by Anonymous Monk on Dec 20, 2011 at 14:54 UTC
Unfortunately \d is not always `[0-9]`. Under unicode it includes rather more codepoints than 10. If you are writing validation logic it is better to not use \d and use `[0-9]` instead.	[reply] [d/l] [select]
Re^3: RegEx Help by ww (Archbishop) on Dec 20, 2011 at 17:21 UTC
Good point. ++	[reply]
Re: RegEx Help by pvaldes (Chaplain) on Dec 17, 2011 at 18:35 UTC
`@Files =( "filename=merilyn14.jpg.jpeg", "filename=...) $file = "filename=merilyn14.jpg.jpeg"` [download] mmmh... That seems a strange way to me to call a file, a source of potential troubles with your script in the future, "filename=merilyn14.jpg.jpeg" is not a file, nor even a filename, I will fix this if I were you. `$file = "merilyn14.jpg.jpeg"` more questions `print "FoxyM New fold = $nFold file = $nFile\n";` Maybe you will prefer this, probably less ambiguous and more readable, form... `print "FoxyM New fold = ", $nFold, " file = ",$nFile, "\n";` And `foreach my $file (@Files) { if ($file =~ m/filename=(\d.?)([a-zA-Z].?)\.j/i)` [download] ... same as: `foreach my $file (@Files) { if (/^filename=(\d.?)([a-zA-Z].?)\.j/)` [download] Don't need to use /i in the match if you are using also a-zA-Z	[reply] [d/l] [select]
Re^2: RegEx Help by ww (Archbishop) on Dec 17, 2011 at 22:11 UTC
++ for the rest, but "wrong" to your final comment: The `/i` has no impact with the character class ... but may very well have to do with the "j". If OP was concerned about the various combinations of `.jpg, .JPG, Jepg, JPEG` and so on, then the `/i` is, in fact, very much needed. Updated to clarify the last sentence above	[reply] [d/l] [select]
Re^3: RegEx Help by pvaldes (Chaplain) on Dec 19, 2011 at 00:11 UTC
I didn't see the extension. Welldone, smart note... But, although you are pointing in the right way, in this particular case could be a false problem... Remember, this is what we are really feeding to the script. `@Files =( "filename=merilyn14.jpg.jpeg", "filename=003_Merilyn23.jpg.jpeg", "filename=890FoxyM.jpg.jpeg", "filename=006.jpg.jpeg" );` [download] At least in some operative systems "filename=merilyn14.jpg.JPEG" will never be evaluated or modified by the regexp, because is not in the foreach loop, so be aware of this point. We can safely avoid /i here if we are in Linux, cause its our duty to check before what we put in @Files, and even if it was one or several JPG we should change only the regexp for those matches. There is not point into having a regexp searching something that we know will never match.	[reply] [d/l]
Re^3: RegEx Help by Anonymous Monk on Dec 20, 2011 at 14:49 UTC
You say "/i has no impact with the character class". Which in this case is true as the class is `[A-Za-z]` but is not generally true: `perl -Mre=debug -e'/[a-z]/i' Compiling REx "[a-z]" Final program: 1: ANYOF[A-Za-z][] (12) 12: END (0) stclass ANYOF[A-Za-z][] minlen 1 Freeing REx: "[a-z]"` [download]	[reply] [d/l] [select]
Re^4: RegEx Help by ww (Archbishop) on Dec 20, 2011 at 17:18 UTC
Re: RegEx Help by TJPride (Pilgrim) on Dec 17, 2011 at 16:11 UTC
If I were you, I'd also standardize the naming format and lowercase everything while I was at it, but maybe that's just me. `use strict; use warnings; my @files = qw\| filename=merilyn14.jpg.jpeg filename=003_Merilyn23.jpg.jpeg filename=890FoxyM.jpg.jpeg filename=006.jpg.jpeg \|; my ($file, $folder); for (@files) { next if !s/^filename=//; s/(\.jpe?g)+$//; $file = "$_.jpg"; s/^\d+_?//; s/_?\d+$//; $folder = $_ \|\| 'FileNum'; print "$file -> $folder\n"; } __DATA__ OUTPUT: merilyn14.jpg -> merilyn 003_Merilyn23.jpg -> Merilyn 890FoxyM.jpg -> FoxyM 006.jpg -> FileNum` [download]	[reply] [d/l]