in reply to Re^2: Truncating after the last period
in thread Truncating after the last period

And it won't truncate a string that doesn't have a FULL STOP (U+002E) in it.

I do not see anything conditional about the OPs spec: "remove all characters after the last period.". Do you?

So,

die 'Bad data' unless $s =~ s[^.{1,400}\.\K.*$][];
It will also truncate a string in the middle of a character.

Is that really a possibility?

Cos if it is, it means perl's unicode handling must be even more broken than I thought.

I've just had a go at making it happen and failed, but maybe I'm just not clever enough.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^4: Truncating after the last period
by Jim (Curate) on Aug 22, 2011 at 20:05 UTC
    I do not see anything conditional about the OPs spec: "remove all characters after the last period.". Do you?

    The spec doesn't assert there will necessarily be a period, either. I think my assumption is better than your assumption.

    Is that really a possibility? Cos if it is, it means perl's unicode handling must be even more broken than I thought.

    How is "perl's [sic] unicode [sic] handling" broken?

    AFAIK, Perl and PHP are the only programming languages that have a regular expression pattern to match true characters (i.e., graphemes) instead of only one to match code points. You just have to know it and use it.

      The spec doesn't assert there will necessarily be a period, either.

      Oh, but it did. It said "after the last period" not "after the last period if there is one".

      You assumed. I read what was there.

      Perl and PHP are the only programming languages that have a regular expression pattern to match true characters (i.e., graphemes) instead of code points. You just have to know it and use it.

      I'll take that bunch of evasive misdirection as an admission that: No. That cannot happen.

      And the only [sic] thing here is your overinflated sense of superiority.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        The OP specified 400 characters. You posted a purported solution that, among other problems, wrongly used the regular expression pattern to match any code point, not any character.

        I used "[sic]" when I quoted you so no one would think I had made the mistake of not capitalizing Perl and Unicode, both of which are properly capitalized proper names.