in reply to Truncating after the last period

This will remove everything after the last period before the 400th character position:

$string =~ s[^.{1,400}\.\K.*?$][];

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: Truncating after the last period
by Jim (Curate) on Aug 23, 2011 at 00:07 UTC

    Let's assume the text is in the ASCII character encoding and that there is at least one period among the first 400 characters in the text.

    The code you posted, and then posted again, has a defect in it. I demonstrated the defect to you in the complete, ready-to-run Perl script I posted. You haven't fixed the defect yet.

      Let's assume the text is in the ASCII character encoding and that there is at least one period among the first 400 characters in the text.... I demonstrated the defect to you in the complete, ready-to-run Perl script I posted.

      Twaddle! This code doesn't have a period within the first 400 characters:

      #!/usr/bin/perl use strict; use warnings; use open qw( :encoding(UTF-8) :std ); use Modern::Perl; my $string = ('X' x 400) . '.'; say length $string; # Prints 401 $string =~ s[^.{1,400}\.\K.*?$][]; say length $string; # Prints 401

      The first 400 characters are all 'X's. Ergo, your code demonstrates exactly nothing!

      And I can safely assume the fact that you have dropped your \X stuff like a hot brick means that you've finally realised that that is a dead end also.

      So, I was right. Nothing more than PST.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Your code still has the same defect in it.

        #!/usr/bin/perl -l use strict; use warnings; my $string = '.' x 400_000_000; print length $string; # Prints 400000000 $string =~ s[^.{1,400}\.\K.*?$][]; print length $string; # Prints 401

        It's so easily fixed it boggles the mind you still haven't figured out how to fix it.

Re^2: Truncating after the last period
by Jim (Curate) on Aug 22, 2011 at 18:13 UTC

    Not correctly it won't.

    #!/usr/bin/perl use strict; use warnings; use open qw( :encoding(UTF-8) :std ); use Modern::Perl; my $string = ('X' x 400) . '.'; say length $string; # Prints 401 $string =~ s[^.{1,400}\.\K.*?$][]; say length $string; # Prints 401

    It will also truncate a string in the middle of a character. And it won't truncate a string that doesn't have a FULL STOP (U+002E) in it.

      And it won't truncate a string that doesn't have a FULL STOP (U+002E) in it.

      I do not see anything conditional about the OPs spec: "remove all characters after the last period.". Do you?

      So,

      die 'Bad data' unless $s =~ s[^.{1,400}\.\K.*$][];
      It will also truncate a string in the middle of a character.

      Is that really a possibility?

      Cos if it is, it means perl's unicode handling must be even more broken than I thought.

      I've just had a go at making it happen and failed, but maybe I'm just not clever enough.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        I do not see anything conditional about the OPs spec: "remove all characters after the last period.". Do you?

        The spec doesn't assert there will necessarily be a period, either. I think my assumption is better than your assumption.

        Is that really a possibility? Cos if it is, it means perl's unicode handling must be even more broken than I thought.

        How is "perl's [sic] unicode [sic] handling" broken?

        AFAIK, Perl and PHP are the only programming languages that have a regular expression pattern to match true characters (i.e., graphemes) instead of only one to match code points. You just have to know it and use it.