in reply to Re: Truncating after the last period
in thread Truncating after the last period

Not correctly it won't.

#!/usr/bin/perl use strict; use warnings; use open qw( :encoding(UTF-8) :std ); use Modern::Perl; my $string = ('X' x 400) . '.'; say length $string; # Prints 401 $string =~ s[^.{1,400}\.\K.*?$][]; say length $string; # Prints 401

It will also truncate a string in the middle of a character. And it won't truncate a string that doesn't have a FULL STOP (U+002E) in it.

Replies are listed 'Best First'.
Re^3: Truncating after the last period
by BrowserUk (Patriarch) on Aug 22, 2011 at 18:47 UTC
    And it won't truncate a string that doesn't have a FULL STOP (U+002E) in it.

    I do not see anything conditional about the OPs spec: "remove all characters after the last period.". Do you?

    So,

    die 'Bad data' unless $s =~ s[^.{1,400}\.\K.*$][];
    It will also truncate a string in the middle of a character.

    Is that really a possibility?

    Cos if it is, it means perl's unicode handling must be even more broken than I thought.

    I've just had a go at making it happen and failed, but maybe I'm just not clever enough.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I do not see anything conditional about the OPs spec: "remove all characters after the last period.". Do you?

      The spec doesn't assert there will necessarily be a period, either. I think my assumption is better than your assumption.

      Is that really a possibility? Cos if it is, it means perl's unicode handling must be even more broken than I thought.

      How is "perl's [sic] unicode [sic] handling" broken?

      AFAIK, Perl and PHP are the only programming languages that have a regular expression pattern to match true characters (i.e., graphemes) instead of only one to match code points. You just have to know it and use it.

        The spec doesn't assert there will necessarily be a period, either.

        Oh, but it did. It said "after the last period" not "after the last period if there is one".

        You assumed. I read what was there.

        Perl and PHP are the only programming languages that have a regular expression pattern to match true characters (i.e., graphemes) instead of code points. You just have to know it and use it.

        I'll take that bunch of evasive misdirection as an admission that: No. That cannot happen.

        And the only [sic] thing here is your overinflated sense of superiority.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.