in reply to Re^5: example of 'm / / m' related example and compare to 'm / / s'
in thread PERL regex modifiers for m//

I'll bet you 100 hours of my time on any (on-line accessible) project of your choosing, that if we do a survey of the regex uses on this site, not only will most of them be targeted at single line strings, an overwhelming majority will be targeted at single line strings.

No bet. I don't doubt that most regexes uses are single-line oriented. I never claimed otherwise. What I said was that people far more often use ^ as start-of-line instead of start-of-string.

But that's what makes the defaults of ^ and $ so unfortunate. Because they happen to work okay most of the time (i.e. line-by-line), they only bite people when those people attempt something less usual and more intrinsically difficult (such as multiline parsing).

And once you squash the idea that matching against multi-line strings is the norm, giving away the heads-up that seeing those options explicitly stated should give the programmer, in favour of cargo-culting a 'throw it all in there cos it probably won't cause any problems' mandate, is a really bad idea in my book. In preference to asking the programmer to look up the documentation when they need it is dangerous.

Yes, that's fine for good programmers, such as yourself. But the problem is that most programmers don't know they need those options. They think regexes already work as if /s and /m are already on.

Every time educationalists have tried to "simplify the learning process", by dumbing down, it has increased the pass rate but also wholly devalued it. There's no point in having more people pass if they don't understand how to apply what they've learnt.

This has nothing to do with simplifying any learning process. It has to do with making Perl work better (and, in particular, work better with the weaknesses and blindspots of human nature). I have argued that habitual use of /xms does that. You disagree. That's your right and privilege.

However, the fact that Perl 6 has (the equivalent of) /s on by default, and also does away with /m by offering separate always-on start-of-line/end-of-line anchors suggests that I'm not alone in believing that permanent /ms is the more appropriate default.

Damian

Replies are listed 'Best First'.
Re^7: example of 'm / / m' related example and compare to 'm / / s'
by BrowserUk (Patriarch) on Dec 04, 2011 at 13:26 UTC
    But that's what makes the defaults of ^ and $ so unfortunate. Because they happen to work okay most of the time (i.e. line-by-line), they only bite people when those people attempt something less usual and more intrinsically difficult (such as multiline parsing).

    Without the options, any attempt to use a regex to match a multi-line string will fail early and obviously. With the options, you might get away without the understanding of what they do for a while, but eventually your misunderstanding will bite you, but instead of being immediately obvious, it will likely become a mysterious and difficult to debug transient failure.

    Personally, I'd much rather that I got bitten by my misunderstandings the first time, or the first few times, I tried to do something that exposed that misunderstanding, than have only have it come to light when my cargo-culting mysteriously fails to match my actual requirements.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      Without the options, any attempt to use a regex to match a multi-line string will fail early and obviously.
      In my own experience that "early and obviously" is...optimistic.
      Personally, I'd much rather that I got bitten by my misunderstandings the first time, or the first few times, I tried to do something that exposed that misunderstanding, than have only have it come to light when my cargo-culting mysteriously fails to match my actual requirements.
      It's not cargo-cult when people choose to do it deliberately, as a mechanism to help prevent the types of mistakes they habitually make. It's like regularly using strict (incidentally, yet another default that has been changed in more recent versions of Perl).

      As long as the vast majority of Perl users I encounter unthinkingly describe /^foo/ as "foo at the start of a line" and /.*/ as "match any number of any character" (and describe them that way even when they actually know better), then I'm going to go on suggesting that people always use the regex flags that make their code work they way their brain thinks.

      I sincerely respect your right to disagree, and admire your determination to encourage people to better understand the actual meaning of the constructs they use.

      For myself, I'd rather adapt the language defaults to the way its users actually think than force its users to adapt their thinking to the way the language unfortunately defaults.

      Damian

        It's like regularly using strict (incidentally, yet another default that has been changed in more recent versions of Perl).

        Firstly, I agree with the move to safe defaults. I may even have had some influence on changing the only mind that matters in that regard with strict. I'm not certain of course, but the change did come shortly after that discussion.

        I'd rather adapt the language defaults to the way its users actually think

        I'm in favour of that also. But the crux of our disagreement is whether your assessment of how they think is correct. And I believe you are not.

        As you've agreed that the vast majority of uses of regex are against single line strings, I find it strange that you don't see that when they describe /^foo/ as "foo at the start of the line" that they aren't simply assuming -- with good cause -- that the 'start of the line' and the 'start of the string' are the same thing. Analogously ditto for the other two. Because with single line strings, that is so.

        The fact that their wordy description isn't factually correct when dealing with multi-line strings doesn't change the fact that it is a good assumption for the majority of uses.

        And once they start dealing with multi-line strings -- if they ever do -- there are other things that must be taken into serious consideration in addition to those three. And in my uses of dealing with multi-line strings I've usually found that I need to make explicit provision for dealing with newlines. That is to say, I explicitly don't want '.'s to match through newlines; but rather want to use embedded newlines to restrict the scope of preceding wildcards.

        I can see we'll not agree here, but I still think that it is better for people to apply both /s & /m on the basis of need rather than as a 'it comes as recommended' cargo cult.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?