Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a file:
abc [...] abc [...] abc [...] def [...] abc [...]
I need to pull out the block from the abc closest to the def (the third one in this case) through the def. I have searched the site, and haven't found any slick ways of doing this. This seems like a fairly straightforward problem that someone else has had to solve before. Thank you for your help.

Replies are listed 'Best First'.
Re: How do I match the closest of repeated strings?
by moritz (Cardinal) on Sep 29, 2008 at 08:33 UTC
    Something along these lines?
    m{ .* # match as late as possible ( ^abc$ # the abc .*? # everything up to ^def$ # the def ) }smx

    Untested, and will be confused if there are mulitple def lines. In that case it helps to tokenize the input first (for example reading line by line)

      Sorry for my reply to the response below. I did not have the initial .* to eliminate the first abc strings. I knew this was straightforward. Sometimes when you get a mental block about something, you just can't figure it out no matter how much you stare at it.

      Thanks to both of you.

Re: How do I match the closest of repeated strings?
by GrandFather (Saint) on Sep 29, 2008 at 08:42 UTC

    Why? Very often the specific application makes a big difference to choosing a good solution. Either the solution is fairly trivial (/.*abc(.*?)def/sm) or there is a bunch of context you are not telling us about.


    Perl reduces RSI - it saves typing
      I tried that (as it seemed like the obvious thing to do), but it didn't work. I got the match from the first abc up to the def.
        'xyz abc 123 abc pqr abc get this bit def ijk abc lmn' =~ /.*abc(.*?)d +ef/sm; print ">$1<";

        Prints:

        > get this bit <

        which is what I understood you to want.

        Update: Ah, I see it did do what you want. ;)


        Perl reduces RSI - it saves typing
Re: How do I match the closest of repeated strings?
by ikegami (Patriarch) on Sep 29, 2008 at 09:37 UTC

    Alternative:

    m{ ( ^abc$ (?:(?!^def$).)* ^def$ ) }smx
      That will match everything from the first abc to the first def, not not everything starting from the closest abc (which I believe is what the OP wants).

      But it can be adapted, of course:

      m{ ( ^abc$ (?:(?!^(?:def|abc)$).)* ^def$ ) }smx
Re: How do I match the closest of repeated strings?
by graff (Chancellor) on Sep 30, 2008 at 01:53 UTC
    Looks like the kind of game where 'paragraph-mode' input would be a good idea:
    $/ = ""; # input record separator set to empty string my $last_rec; while (<>) { # each record is terminated by /\n{2,}/ if ( /^def/ ) { print $last_rec.$_ if ( $last_rec ); } else { $last_rec = $_; } }
    Check perlvar for more info on $/.

    UPDATE: Sorry, I probably misunderstood the problem. If blank lines are actually distributed as shown in the OP, then something like this would be needed in order to use "paragraph-based" input:

    $/ = ""; my $last_prefix = my $last_target = ''; while (<>) { if ( /^abc/ ) { print $last_prefix.$last_target if ( $last_target ); $last_prefix = $_; $last_target = ''; } elsif ( /^def/ ) { $last_target = $_; } elsif ( $last_target ) { $last_target .= $_; } elsif ( $last_prefix ) { $last_prefix .= $_; } }
    That's pretty icky, really -- very sorry. I suggest you just slurp the whole file and use one of the regex solutions from an earlier reply.

    Another update: I still did not get the OP's intent -- but no point fixing this code, since other solutions are up there (I think). Never mind me.