in reply to Re^4: example of 'm / / m' related example and compare to 'm / / s'
in thread PERL regex modifiers for m//

Sorry, but I respectfully disagree.

In my experience, matching start- and end-of-line is far more commonly needed that matching start- and end-of-string. The default behaviour is wrong practically every time anyone has to deal with multi-line data.

I'll bet you 100 hours of my time on any (on-line accessible) project of your choosing, that if we do a survey of the regex uses on this site, not only will most of them be targeted at single line strings, an overwhelming majority will be targeted at single line strings.

For sake of putting a number on overwhelming" let's say 10 single line uses to every one multi-line. I'd probably be quite happy to go to 20 to 1 if it would sway you into accepting the bet.

You might find a slightly reduced ratio if you searched CPAN, but I doubt it would be by much.

And once you squash the idea that matching against multi-line strings is the norm, giving away the heads-up that seeing those options explicitly stated should give the programmer, in favour of cargo-culting a 'throw it all in there cos it probably won't cause any problems' mandate, is a really bad idea in my book. In preference to asking the programmer to look up the documentation when they need it is dangerous.

Every time educationalists have tried to "simplify the learning process", by dumbing down, it has increased the pass rate but also wholly devalued it. There's no point in having more people pass if they don't understand how to apply what they've learnt.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
  • Comment on Re^5: example of 'm / / m' related example and compare to 'm / / s'

Replies are listed 'Best First'.
Re^6: example of 'm / / m' related example and compare to 'm / / s'
by Anonymous Monk on Dec 04, 2011 at 07:23 UTC
    I'll bet you 100 hours of my time on any (on-line accessible) project of your choosing, that if we do a survey of the regex uses on this site, not only will most of them be targeted at single line strings, an overwhelming majority will be targeted at single line strings.

    No bet. I don't doubt that most regexes uses are single-line oriented. I never claimed otherwise. What I said was that people far more often use ^ as start-of-line instead of start-of-string.

    But that's what makes the defaults of ^ and $ so unfortunate. Because they happen to work okay most of the time (i.e. line-by-line), they only bite people when those people attempt something less usual and more intrinsically difficult (such as multiline parsing).

    And once you squash the idea that matching against multi-line strings is the norm, giving away the heads-up that seeing those options explicitly stated should give the programmer, in favour of cargo-culting a 'throw it all in there cos it probably won't cause any problems' mandate, is a really bad idea in my book. In preference to asking the programmer to look up the documentation when they need it is dangerous.

    Yes, that's fine for good programmers, such as yourself. But the problem is that most programmers don't know they need those options. They think regexes already work as if /s and /m are already on.

    Every time educationalists have tried to "simplify the learning process", by dumbing down, it has increased the pass rate but also wholly devalued it. There's no point in having more people pass if they don't understand how to apply what they've learnt.

    This has nothing to do with simplifying any learning process. It has to do with making Perl work better (and, in particular, work better with the weaknesses and blindspots of human nature). I have argued that habitual use of /xms does that. You disagree. That's your right and privilege.

    However, the fact that Perl 6 has (the equivalent of) /s on by default, and also does away with /m by offering separate always-on start-of-line/end-of-line anchors suggests that I'm not alone in believing that permanent /ms is the more appropriate default.

    Damian

      But that's what makes the defaults of ^ and $ so unfortunate. Because they happen to work okay most of the time (i.e. line-by-line), they only bite people when those people attempt something less usual and more intrinsically difficult (such as multiline parsing).

      Without the options, any attempt to use a regex to match a multi-line string will fail early and obviously. With the options, you might get away without the understanding of what they do for a while, but eventually your misunderstanding will bite you, but instead of being immediately obvious, it will likely become a mysterious and difficult to debug transient failure.

      Personally, I'd much rather that I got bitten by my misunderstandings the first time, or the first few times, I tried to do something that exposed that misunderstanding, than have only have it come to light when my cargo-culting mysteriously fails to match my actual requirements.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

        Without the options, any attempt to use a regex to match a multi-line string will fail early and obviously.
        In my own experience that "early and obviously" is...optimistic.
        Personally, I'd much rather that I got bitten by my misunderstandings the first time, or the first few times, I tried to do something that exposed that misunderstanding, than have only have it come to light when my cargo-culting mysteriously fails to match my actual requirements.
        It's not cargo-cult when people choose to do it deliberately, as a mechanism to help prevent the types of mistakes they habitually make. It's like regularly using strict (incidentally, yet another default that has been changed in more recent versions of Perl).

        As long as the vast majority of Perl users I encounter unthinkingly describe /^foo/ as "foo at the start of a line" and /.*/ as "match any number of any character" (and describe them that way even when they actually know better), then I'm going to go on suggesting that people always use the regex flags that make their code work they way their brain thinks.

        I sincerely respect your right to disagree, and admire your determination to encourage people to better understand the actual meaning of the constructs they use.

        For myself, I'd rather adapt the language defaults to the way its users actually think than force its users to adapt their thinking to the way the language unfortunately defaults.

        Damian