sdyates has asked for the wisdom of the Perl Monks concerning the following question:

Here is what I am looking to do: 1) Trncate a string after 400 characters 2) Remove any special characters at the end of the string 3) with the same strong, remove all characters after the last period. It is numeber three that I am having trouble with. By truncating after 400 characters, the last sentance is cut in half so I just want to remove it. How can I get perl to truncate after the last period in a string? Thanks, Simon

Replies are listed 'Best First'.
Re: Truncating after the last period
by jwkrahn (Abbot) on Aug 22, 2011 at 17:05 UTC
    ( my $new_string = substr $string, 0, 400 ) =~ s/[^.]*\z//;
Re: Truncating after the last period
by ikegami (Patriarch) on Aug 22, 2011 at 17:26 UTC
    If you're trying to wrap text, you might want Text::Wrap. Even if you're not, you might want to wrap the text then just keep the first line.
Re: Truncating after the last period
by AR (Friar) on Aug 22, 2011 at 17:01 UTC

    Can you show us what you've tried? We can help you with any holes in your knowledge or point out any incorrect assumptions.

Re: Truncating after the last period
by BrowserUk (Patriarch) on Aug 22, 2011 at 17:06 UTC

    This will remove everything after the last period before the 400th character position:

    $string =~ s[^.{1,400}\.\K.*?$][];

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Let's assume the text is in the ASCII character encoding and that there is at least one period among the first 400 characters in the text.

      The code you posted, and then posted again, has a defect in it. I demonstrated the defect to you in the complete, ready-to-run Perl script I posted. You haven't fixed the defect yet.

        Let's assume the text is in the ASCII character encoding and that there is at least one period among the first 400 characters in the text.... I demonstrated the defect to you in the complete, ready-to-run Perl script I posted.

        Twaddle! This code doesn't have a period within the first 400 characters:

        #!/usr/bin/perl use strict; use warnings; use open qw( :encoding(UTF-8) :std ); use Modern::Perl; my $string = ('X' x 400) . '.'; say length $string; # Prints 401 $string =~ s[^.{1,400}\.\K.*?$][]; say length $string; # Prints 401

        The first 400 characters are all 'X's. Ergo, your code demonstrates exactly nothing!

        And I can safely assume the fact that you have dropped your \X stuff like a hot brick means that you've finally realised that that is a dead end also.

        So, I was right. Nothing more than PST.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

      Not correctly it won't.

      #!/usr/bin/perl use strict; use warnings; use open qw( :encoding(UTF-8) :std ); use Modern::Perl; my $string = ('X' x 400) . '.'; say length $string; # Prints 401 $string =~ s[^.{1,400}\.\K.*?$][]; say length $string; # Prints 401

      It will also truncate a string in the middle of a character. And it won't truncate a string that doesn't have a FULL STOP (U+002E) in it.

        And it won't truncate a string that doesn't have a FULL STOP (U+002E) in it.

        I do not see anything conditional about the OPs spec: "remove all characters after the last period.". Do you?

        So,

        die 'Bad data' unless $s =~ s[^.{1,400}\.\K.*$][];
        It will also truncate a string in the middle of a character.

        Is that really a possibility?

        Cos if it is, it means perl's unicode handling must be even more broken than I thought.

        I've just had a go at making it happen and failed, but maybe I'm just not clever enough.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Truncating after the last period
by Anonymous Monk on Aug 22, 2011 at 19:39 UTC
    "Remove all characters after the last period" can also be re-stated as, "keep all characters up to a period," relying upon the default 'greedy' behavior to slurp as many characters as it can. Keep what the regular-expression keeps (if it kept anything at all, otherwise leave the string unchanged as it contains no period at all).
      "Remove all characters after the last period" can also be re-stated as, "keep all characters up to a period,"

      Not if there can be two or more periods in the first 400 characters.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.