davies has asked for the wisdom of the Perl Monks concerning the following question:
Yesterday I wrote ~50 lines of Perl that solved a one-off problem. In among that was a group of simple regexes that I am sure could be combined or written better if only I knew how. So I'd like to learn more, if someone would be kind enough to go all Perlmonks on me or attempt a Socratic dialogue. The code I wrote is:
my $plsname = $curdir; $plsname =~ s/^\Q$startdir\E//i; $plsname =~ s/\//_/g; $plsname =~ s/^_?//; $plsname .= '.pls';
with (e.g.) $curdir = 'Y:\Music\Schubert\Lieder\Terfel', $startdir = 'Y:\Music' and wanting 'Schubert_Lieder_Terfel.pls'
I tried to combine the second and fourth lines of the snippet, but while I think I could have solved that on my own, I realised that I was going to need multiple regexes and idleness set in. I suspect I should need only one regex, but my thinking processes are flawed.
What the rest of the code is doing is using File::Find to recurse through every subdirectory in the start directory and write a playlist file to another location. I have some 6,700 files in 300 directories. I now have a playlist for every directory, which is what I wanted.
Regards,
John Davies
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Combining regexes
by BrowserUk (Patriarch) on May 21, 2016 at 12:34 UTC | |
I tried to combine the second and fourth lines of the snippet, You know that the start dir is going to be followed by a '\', so you could do $plsname =~ s/^\Q$startdir\E\\//i; and the fourth line becomes redundant. You could combine the 5th line at the same time: $plsname =~ s[^\Q$startdir\E\\(.+)$][$1.pls]i The /g on the third line makes that pretty much impossible to combine with the rest; and single character substitutions are better done with tr/// giving:
Is that better? Your call. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
by davies (Monsignor) on May 21, 2016 at 15:52 UTC | |
I'm not following your order, which may be a mistake, but anyway... "The /g on the third line makes that pretty much impossible to combine" is exactly the sort of guidance I was hoping to get. "single character substitutions are better done with tr///" is not something I have seen documented anywhere. Is this a question of experience, or something I should have found for myself? When I tried to sort out the backslash following the variable, I think I was trying to put it within the \Q\E part. Is this even possible? Things line \Q\E, quotemeta and qr are an area where my searches have been pretty fruitless. Are they identical? Are there any good docs on them? A point on which I was expecting correction was another construct I tried to use without success. I have seen (and cargo culted) something like my ($plsname) = $curdir =~ regex;. Would that just mean having different operations on the same number of lines, or is there a better reason why it would be inappropriate here? Thanks for the help & regards, John Davies | [reply] [d/l] |
by BrowserUk (Patriarch) on May 21, 2016 at 16:26 UTC | |
"single character substitutions are better done with tr///" is not something I have seen documented anywhere. Is this a question of experience, or something I should have found for myself? Um. Not sure about how I first learned it; probably reading a post hereabouts along time ago. The thing to note is that tr/// is dedicated to replacing single chars with other single chars; and builds a translation table at compile time. Ie. It does just that one thing and does all the preparations up front. On the other hand, s/// does all kind of stuff and has to interpret both the input specifications and replacements at runtime; so it is less efficient for this purpose. When I tried to sort out the backslash following the variable, I think I was trying to put it within the \Q\E part. Is this even possible? Backslashes pretty much always have to be escaped -- you can get away with them unescaped in single quotes if they don't come in front of a '. If you put \ inside: \Q\\E, the second backslash escapes the third and the E is just an ordinary E. If you do \Q\\\E, the backslash ends up doubled in the results. Things line \Q\E, quotemeta and qr are an area where my searches have been pretty fruitless. Are they identical? \Q\E do the same as quotemeta, but only to that subset of the string or search term to which they are applied. qr// is a quite different animal that is documented in http://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators; and is intended for building regex strings, but turns out not to be as useful as you'd think because what it builds get re-interpreted if you include it as part of a another qr// or m// or s///. A point on which I was expecting correction was another construct I tried to use without success. I have seen (and cargo culted) something like my ($plsname) = $curdir =~ regex; In the form you've posted that would assign (the first) capture group in the regex to $plsname; which isn't applicable here. You could do ( my $plsname = $curdir ) =~ s/.../.../; which would do the assignment, then operated on the new variable; but it's much of a muchness. I always find it hard to answer 'how would I have learnt that' questions, because I've long since forgotten when/how I learnt them. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
by AnomalousMonk (Archbishop) on May 21, 2016 at 17:17 UTC | |
Further to some of BrowserUk's comments ++above: Give a man a fish: <%-{-{-{-< | [reply] [d/l] [select] |
by BrowserUk (Patriarch) on May 21, 2016 at 17:48 UTC | |
by AnomalousMonk (Archbishop) on May 21, 2016 at 17:58 UTC | |
| |
by Marshall (Canon) on May 22, 2016 at 08:18 UTC | |
"single character substitutions are better done with tr///" is not something I have seen documented anywhere. Is this a question of experience, or something I should have found for myself? Just adding on to BrowserUk's comments re: tr. Another big factor is that tr doesn't have to worry about the string getting longer! You can't substitute one character with 2 others. But that means that tr doesn't have any memory allocation worries (getting shorter is a whole different deal than getting longer). The net result of all of these simplifications means that tr runs like a rocket. Update: This thread about tr got me thinking... I volunteer as a TA for a MASM (Microsoft Assembly) class at a local college. We are always thinking of new labs. A "tr" lab is likely to appear in the Fall 2016! The C version of tr is fast, the assembly language version will be really, really fast. And we can teach some other stuff along the way. Quoting my prof for the suggestion of a tr lab: Implementing tr is a good exercise! It makes use of the optimized array instructions and it reinforces the idea that characters are just integers, which, as you recall from the last ASM class, some students had a hard time converting a digit to its ASCII character. Thanks! Sometimes these Perl questions spawn other thoughts. Have no doubt that tr can be implemented very efficiently in ASM class 101. That is definitely not true of regex! | [reply] |
|
Re: Combining regexes
by AnomalousMonk (Archbishop) on May 21, 2016 at 15:42 UTC | |
I didn't think the OPed code was really so terrible to begin with (once the $plsname =~ s/\//_/g; statement is fixed). But in an attempt to "one-liner-ize" the code, and building on BrowserUk's post (and using my preferred regex practices), here's this: And is that really any better? Again, your call.
If you have Perl version 5.14+, it's possible to slightly simplify the s/// replacement executable code above to I hope this hasn't gone too "all PerlMonks" on you, and includes at least a hint (update: but not a single drop!) of Socrates. Update: For info on the /r modifier added in Perl version 5.14, see s/// in Regexp Quote-Like Operators and tr/// in Quote-Like Operators, both in perlop. Give a man a fish: <%-{-{-{-< | [reply] [d/l] [select] |
by davies (Monsignor) on May 21, 2016 at 19:06 UTC | |
Much reading later - and thank you for all the references in your other post, only some of which I had read previously - I think I understand what you are doing. I do have some questions, though. Am I right in thinking that the (?i) has the same effect as an i qualifier with xmsr? If not, what is the difference (and where is it documented, please)? If so, and I appreciate that you have stated "my own preferred practices" which exempts you from needing a reason, is there a reason why you prefer to have the qualifiers separately? How come none of the spaces in { \A (?i) \Q$startdir\E \\? } are treated as matchable characters? I suspect it's one of your qualifiers, but I couldn't find it explained in any of the docs I read, unless it's part of x, and I haven't looked for the documentation for that beyond reading that it means "extended regexes". I find your final assignment ($var = $var =~ regex =~ regex) rather confusing, as it seems to need to work from left to right at some times and from right to left at others. I know things joining things like map, grep & sort result in the expression working right to left. I don't remember seeing three assignment-like = signs in a Perl statement before, which may be part of my confusion. Is the order irrelevant, and if not, what are the rules for the direction of evaluation? Thanks and regards, John Davies | [reply] [d/l] |
by AnomalousMonk (Archbishop) on May 21, 2016 at 22:22 UTC | |
Am I right in thinking that the (?i) has the same effect as an i qualifier with xmsr? Yes — almost. In, e.g., s{ ... }{...}xmsi the /i modifier affects the regex match globally. A (?i) extended pattern only has effect from the point of its appearance in the regex to the end of the regex pattern scope — which may or may not be the end of the regex! (See Extended Patterns. I must apologize: I did not mention that extended patterns are only available from Perl version 5.10 onward; I assume you have this.) I prefer the embedded (?i) form because it is more visible and because it gives more control: case sensitivity can be turned on and off at will in a regex, and its scope closely controlled. In general, "my own preferred practices" are that the qr// operator has only the /xms modifiers applied to the operator, and all other modifiers , e.g., (?i), scoped within the operator. This practice just generalizes to the m//xms and s///xms operators. The latter two operators can also take /g /e /r modifiers which can only be applied to the operator as a whole. ... none of the spaces in { ... } are treated as matchable characters? I suspect it's ... x ... Yes, exactly. Allowing whitespace not to be part of the pattern eases the strain on these old eyes, a great blessing after so many years toiling in the scriptorium. See /x in Modifiers. I find your final assignment ($var = $var =~ regex =~ regex) rather confusing, as it seems to need to work from left to right at some times and from right to left at others.
It might have been helpful if I had thrown in a few disambiguating parentheses. But you can always add your own with O and B::Deparse: Working in order of precedence (more or less inside-out in this case):
Is the order irrelevant, and if not, what are the rules for the direction of evaluation? The order is very relevant, and is discussed in Operator Precedence and Associativity in perlop: see the =~ (binding) and . (concatenation) and = (assignment) operators. Update: Layout of <ol> above changed – improved? Give a man a fish: <%-{-{-{-< | [reply] [d/l] [select] |
|
Re: Combining regexes
by Laurent_R (Canon) on May 21, 2016 at 15:45 UTC | |
one possible way demonstrated under the Perl debugger: I don't know if this is any better. You could even do it in a single code line: Again, this is shorter, but I do not claim that this is any better, you decide. | [reply] [d/l] [select] |
|
Re: Combining regexes
by haukex (Archbishop) on May 21, 2016 at 13:52 UTC | |
Hi davies, One Way To Do It... without regexen ;-)
I used unixish paths but it works the same on Windows. The grep {length} may not be necessary, I just figured it was the closest equivalent to your s/^_?//;. Path::Class::Dir also has functions for searching that can replace File::Find. Hope this helps, | [reply] [d/l] [select] |
|
Re: Combining regexes
by AnomalousMonk (Archbishop) on May 22, 2016 at 23:01 UTC | |
Yet another one-liner. This is a bit tricky, and I can't say I'm particularly proud of it. It assumes $startdir always has a trailing \ (backslash) delimiter. Needs Perl version 5.14+ for the s/// substitution /r modifier. But again, I must say I can't see anything really wrong with the basic approach of the OPed code. Update: Even more perverse, but requires no /r modifier, so it can run under pre-5.14, and no /e modifier: The Devil (but not Terfel) finds work for idle hands. Update 2: Ok, slightly simpler, but needs captures: Good luck maintaining this. Give a man a fish: <%-{-{-{-< | [reply] [d/l] [select] |