porsche5k has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perk Monks,

The words "Significant Accounting Policies" are referenced multiple times in my document. I am interested in taking the last reference of "Significant Accounting Policies" and extracting the next text to the end of the document.Is there a Perl function that does this?

Short, simple example

_____________________

Wal-mart talks about its Significant Accounting Policies in its 10k.

Significant Accounting Policies are important for a firm.

Here is a list of the Significant Accounting Policies

1)Lifo

2)Depreciation

3)Expenses

As I said, I'm hoping to take from "1)Lifo..." down to "...Expenses". Thank you for your help!

Replies are listed 'Best First'.
Re: Take last instance of a string
by Corion (Patriarch) on Aug 30, 2016 at 17:25 UTC

    You haven't shown any of your preexisting code, so I'll just assume that the whole document is in a single scalar already. Then you can simply a regular expression:

    my $string = <<STATEMENT; Wal-mart talks about its Significant Accounting Policies in its 10k. Significant Accounting Policies are important for a firm. Here is a list of the Significant Accounting Policies </b> 1)Lifo 2)Depreciation 3)Expenses STATEMENT if( $string =~ m/(.*)Significant\s+Accounting\s+Policies\b(.*)$/m ) { print "Found stuff: $2\n"; } else { print "Match failed\n"; };

    This relies on the first .* group being greedy and gobbling up as much as possible from the string, thus leaving only the last Significant Accounting Policies instance for the match.

Re: Take last instance of a string
by AnomalousMonk (Archbishop) on Aug 30, 2016 at 18:18 UTC

    Basically Corion's approach, but uses the  \K operator available from Perl version 5.10 on:

    c:\@Work\Perl\monks>perl -wMstrict -le "use 5.010; ;; my $str = qq{Wal-mart talks about its Significant Accounting Policies + in its 10k. \n} . qq{Significant Accounting Policies are important for a firm +. \n} . qq{Here is a list of the Significant Accounting Policies \n +} . qq{1)Lifo \n} . qq{2)Depreciation \n} . qq{3)Expenses \n} ; print qq{<<$str>> \n}; ;; my $sap = qr{ Significant \s+ Accounting \s+ Policies }xms; ;; my ($end_sap) = $str =~ m{ .* $sap \K .* }xmsg; print qq{<<$end_sap>> \n}; " <<Wal-mart talks about its Significant Accounting Policies in its 10k. Significant Accounting Policies are important for a firm. Here is a list of the Significant Accounting Policies 1)Lifo 2)Depreciation 3)Expenses >> << 1)Lifo 2)Depreciation 3)Expenses >>
    Please see perlre, perlretut, and perlrequick.

    Update: On second thought, I think I would write the statement
        my ($end_sap) = $str =~ m{ .* $sap \K .* }xmsg;
    as something like
        my $sap_at_end = my ($end_sap) = $str =~ m{ .* $sap \K .* }xmsg;
    so that the success of the match will be separately captured. This allows a

    if ($sap_at_end) { do_something_with($end_sap); } else { ... }
    block to follow the match to let you handle any eventuality. I might also think about changing the  $sap \K sequence to  $sap \s* \K to consume any whitespace that might follow the  $sap pattern before the "real" text begins.


    Give a man a fish:  <%-{-{-{-<

Re: Take last instance of a string
by perldigious (Priest) on Aug 30, 2016 at 18:37 UTC

    Haven't you posted a very similar request to this previously?

    Are you able to use those answers previously given to formulate an attempt at what you are trying to accomplish?

    I love it when things get difficult; after all, difficult pays the mortgage. - Dr. Keith Whites
    I hate it when things get difficult, so I'll just sell my house and rent cheap instead. - perldigious
      I agree the requests are similar. I was just wondering if there is a function that takes the last instance of a string that is repeated multiple times. In the last example the words "statement of accounting policies" was only written once. In no way was I trying to waste anyone's time or get people to write code for me. Thank you for your time!

        No worries, it looks like hippo gave you such a function with rindex.

        I love it when things get difficult; after all, difficult pays the mortgage. - Dr. Keith Whites
        I hate it when things get difficult, so I'll just sell my house and rent cheap instead. - perldigious
Re: Take last instance of a string
by ww (Archbishop) on Aug 30, 2016 at 17:28 UTC

    What have you tried? What docs have you read?

    We're here to help you learn; NOT to write your code.

    In the interest of learning, however, see the Tutorials re regular expressions.

    Alternately (and a very bad plan, usually), capture all the desired phrases to the end of the file (into an array) and then select the last element of the array. (But, again, you'll need to study regexen.).


    Yet another gimmé request!
    Sorry, we expect SOPW to seek wisdom, not to ask us to do so for them.

      Thanks for the reply, I understand where you're coming from. I'm not looking for anyone to write my own code, I'm just wondering if there is a way to match on the last instance of a string.

Re: Take last instance of a string
by Discipulus (Canon) on Aug 31, 2016 at 07:49 UTC
    You had good replies, but if i can add something, i would start with a simpler approach: a linear one, accomulating results and eventually discard them if unneeded. Consider:

    use strict; use warnings; my $trig = 'Significant Accounting Policies'; my $res; while (<DATA>){ if ($_ =~/$trig/i){ $res = $_; next; } $res.=$_ if $res; } print $res; __DATA__ Wal-mart talks about its Significant Accounting Policies in its 10k. Significant Accounting Policies are important for a firm. Here is a list of the Significant Accounting Policies 1)Lifo 2)Depreciation 3)Expenses

    Or a similar approach but using a pointer:

    my $trig = 'Significant Accounting Policies'; my @res; my $pointer; while (<DATA>){ $res[$.-1] = $_; # or just: push @res,$_; $pointer = $.-1 if /$trig/i; } print @res[ $pointer .. $#res ];

    HtH
    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: Take last instance of a string
by choroba (Cardinal) on Aug 31, 2016 at 11:44 UTC
    Another solution, reading from a file line by line, and remembering the last position where the string was seen. After having read the whole file, seek back to the position and extract the text.

    #!/usr/bin/perl use warnings; use strict; open my $FH, '<', shift or die $!; my $pos; while (<$FH>) { $pos = tell $FH if -1 != index $_, 'Significant Accounting Policie +s'; } if (defined $pos) { seek $FH, $pos, 0; print while <$FH>; }

    Update: simplified, as you don't want to print the line containing the string.

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,