moritz has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I have a problem for which I already have a solution, but I can't stop feeling that it can be solved much more elegant an perhaps more efficient as well.

I implemented search in IRC-Logs, and I want to return not only matching lines, but a few lines of context as well. Of course I don't want to display lines twice, and I want to hilight the lines that really match.

This is how my code looks like:

my $max = 21; # In the "real world" I obtain these line IDs from a db query, and do # the next step (generate @padded_overlap) on the fly my @nums = (1, 5, 6, 8, 15, 20); my $pad = 1; # lines of context on each side my @padded_overlap = map { $_ - $pad .. $_ + $pad } @nums; # guard against context from next/previous day shift @padded_overlap while ($padded_overlap[0] < 0); pop @padded_overlap while ($padded_overlap[-1] > $max); # remove duplicates my $prev = shift @padded_overlap; my @padded = ($prev); for (@padded_overlap) { if ($_ > $prev){ push @padded, $_; $prev = $_; } } # print the lines for (@padded){ if (@nums && $_ == $nums[0]){ print "Hilighted: $_\n"; shift @nums; } else { print "$_\n"; } }

$max is the index of the last line on the day that is displayed, thus in the first two loops only 2*$pad lines are removed at most.

What I hate particularly is that I iterate three times over the list(s). How can I decrease that?

Replies are listed 'Best First'.
Re: Padding search results with context
by almut (Canon) on Aug 13, 2007 at 20:43 UTC

    Not sure if it's better or just different, but you could do something like this:

    my @nums = (1, 5, 6, 8, 15, 20); my $max_idx = 21; my $pad = 1; my @hilite = @nums; my $hilite = shift @hilite; my $i = -1; for my $n (@nums) { for my $c ($n-$pad .. $n+$pad) { if ($c > $i) { last if $c > $max_idx; $i = $c; if ($i == $hilite) { print "I=$i\n"; $hilite = shift @hilite; } else { print " i=$i\n"; } } } }

    Output:

    i=0 I=1 i=2 i=4 I=5 I=6 i=7 I=8 i=9 i=14 I=15 i=16 i=19 I=20 i=21

    Update: changed context to $pad = 1 to keep the example output in line with the other replies (Apparently it was asking too much to figure out that I had originally set $pad = 2 ... — Or why the downvotes?)

      Thanks, I like your solution because it's short and only iterates once over @nums.
Re: Padding search results with context
by FunkyMonk (Chancellor) on Aug 13, 2007 at 20:22 UTC
    I'd use a fixed size buffer @buf with $paddding * 2 + 1 elements and always look at the middle line ($viewline) of the buffer. @interesting is a list of thing you're interested in.

    @lines contains the data. I've used a list of numbers, but it could be text.

    my @lines = 1 .. 20; my @interesting = ( 1, 3, 6, 10 , 20 ); my $padding = 2; my $bufsize = 2 * $padding + 1; my $viewline = $padding; my @buf = ('') x $bufsize; while ( @buf > 0 ) { my $line = shift @lines; # get some data shift @buf; # shorten buffer push @buf, $line # add new line if $line; # if there was one print "-->@buf\n" # if there's something to see && it's interestin +g if $buf[$viewline] && grep $buf[$viewline] eq $_, @interesting; }

    Output:

    --> 1 2 3 -->1 2 3 4 5 -->4 5 6 7 8 -->8 9 10 11 12 -->18 19 20

    update: added the output

      Thank you for your reply.

      Sadly it uses a list from the first to the last index, which could be some thousand apart for only two @interesting items, so I'll pick one of the other very good solutions.

Re: Padding search results with context
by SuicideJunkie (Vicar) on Aug 13, 2007 at 20:56 UTC

    You should be able to do it in one iteration...
    I would use a queue to store the context lines

    Start by inspecting each line.

    1. If the date has changed then flush the queue and set $postContext=0;
    2. Inspect for duplicates
      1. If the line is a duplicate (of the latest item seen?), then print out and flush the queue. Set $postContext = $maxPostContext;
      2. Otherwise, add the line to the queue.
    3. If ($postContext) then print and flush the (single) item from the queue, and $postContext--;
    4. If the queue is more than $maxPreContext, then remove a line from the head of the queue.
    Repeat until done.

    So, basically, the queue saves up a maximum of $maxPreContext lines of context from before the duplication, and you keep printing $maxPostContext lines for context after the duplication. If another duplicate appears quickly, you don't get multiple copies of the same context because the queue has not refilled.

Re: Padding search results with context
by johngg (Canon) on Aug 13, 2007 at 23:14 UTC
    I'd construct the @padded list in one go by combining the map with a grep each for the tasks of keeping things within range and removing duplicates. I'd also use a hash to keep track of which lines to highlight.

    use strict; use warnings; my @nums = (1, 5, 6, 8, 15, 20); my $max = 21; my $pad = 1; my %highlight; @highlight{@nums} = (1) x @nums; my @padded; { my %seen; @padded = grep { ! $seen{$_} ++ } grep { $_ >= 1 && $_ <= $max } map { $_ - $pad .. $_ + $pad } @nums; } print $highlight{$_} ? qq{Highlighted: $_\n} : qq{$_\n} for @padded;

    Produces

    Highlighted: 1 2 4 Highlighted: 5 Highlighted: 6 7 Highlighted: 8 9 14 Highlighted: 15 16 19 Highlighted: 20 21

    I hope this is of use.

    Cheers,

    JohnGG

Re: Padding search results with context
by dwm042 (Priest) on Aug 13, 2007 at 20:31 UTC
    From your original code, you can scan for duplicates and print at the same time.

    my $prev = -1 - $pad; for(@padded_overlap) { if ( $_ > $prev ) { $prev = $_; # print stuff here } }
    That eliminates one scan.
      Thanks - it's obvious (of course you use the list elements where you construct them), but sometimes I need somebody to point me to the obvious ;-)
        This "one pass" solution is obviously inspired by people like Grandfather, but this is a solution I can understand *^^*:

        #!/usr/bin/perl use warnings; use strict; use List::Util; package main; my $max = 21; my @nums = (1, 5, 6, 8, 15, 20); my $pad = 1; # lines of context on each side my $last = -1 - $pad; foreach ( @nums ) { my $start = List::Util::max 1, $_ - $pad, $last + 1; my $end = List::Util::min $max - 1 , $_ + $pad; next if ( $start > $end ); for ( my $i = $start; $i <= $end; $i++ ) { if ( $i == $_ ) { print "Highlighted $i\n"; } else { print "Line $i\n"; } } $last = $end; }
        The output is:

        C:\Code>perl mintest.pl Highlighted 1 Line 2 Line 4 Highlighted 5 Line 6 Line 7 Highlighted 8 Line 9 Line 14 Highlighted 15 Line 16 Line 19 Highlighted 20
Re: Padding search results with context
by GrandFather (Saint) on Aug 13, 2007 at 21:28 UTC

    Two loops - one to build the output list, and one to output it. Things get a little more interesting if there are a very large number of lines such that the implied slurp is not feasible.

    use strict; use warnings; use List::Util qw(min max); my $firstLine = 1; my $lastLine = 20; my @indexes = (1, 5, 6, 8, 15, 20); my $pad = 1; # lines of context on each side my @selLines; for my $centre (@indexes) { my $min = max ($centre - $pad, $firstLine); my $max = min ($centre + $pad, $lastLine); $_ == $centre ? $selLines[$_] = [$_, 1] : $selLines[$_] ||= $_ for $min .. $max; } defined ($_) && print "$_\n" for map {ref $_ ? "Highlight $_->[0]" : $_} @selLines;

    Prints:

    Highlight 1 2 4 Highlight 5 Highlight 6 7 Highlight 8 9 14 Highlight 15 16 19 Highlight 20

    Note the handling for "end effects"!

    Update: fixed bug


    DWIM is Perl's answer to Gödel
Re: Padding search results with context
by dogz007 (Scribe) on Aug 13, 2007 at 22:09 UTC
    In the interest of using only one iteration to produce the desired results, I rewrote my solution as follows. Note the use of two hashes: one to check for highlighted-ness, and one to check for already-printed-ness.

    #!/usr/bin/perl use strict; my ($min,$max) = (0,21); my @nums = (1, 5, 6, 8, 15, 20); my $pad = 1; my (%printed,%nums); map { $nums{$_}++ } @nums; for my $i (@nums) { for my $n ($i-$pad .. $i+$pad) { if ( ! $printed{$n} && $min <= $n && $n <= $max ) { print exists $nums{$n} ? "Highlighted: " : ""; print $n, "\n"; $printed{$n}++; } } }

    Prints the same as before, but unfortunately it has to iterate once to build %nums and again to do the rest. I've tried several ways, but couldn't get shorter than two iterations. It is, however, of O(N) instead of O(N^2). I also believe this solution is easier to follow than my last one.

      In fact your previous solution was O(N log N) for N = $pad*@nums because of the sort, not O(Nē) (or did I miss something)?

      However your current solution stores some amount of data twice in a hash where pointer is sufficient because @nums is already sorted.

Re: Padding search results with context
by dogz007 (Scribe) on Aug 13, 2007 at 20:54 UTC
    I've removed the need to search for duplicates by piling up the lines you want as keys in a hash. The print code is the same as yours.

    my ($min,$max) = (0,21); my @nums = (1, 5, 6, 8, 15, 20); my $pad = 1; my %line; map { $line{$_}++ for ($_-$pad..$_+$pad) } @nums; my @padded = sort { $a <=> $b } keys %line; # print the lines for (@padded){ if (@nums && $_ == $nums[0]){ print "Hilighted: $_\n"; shift @nums; } elsif ($min <= $_ && $_ <= $max) { print "$_\n"; } }

    Prints:

    0 Hilighted: 1 2 4 Hilighted: 5 Hilighted: 6 7 Hilighted: 8 9 14 Hilighted: 15 16 19 Hilighted: 20 21

    Update: Included check to prevent running outside the range of valid lines.