Padding search results with context

moritz has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Padding search results with context by almut (Canon) on Aug 13, 2007 at 20:43 UTC
Not sure if it's better or just different, but you could do something like this: `my @nums = (1, 5, 6, 8, 15, 20); my $max_idx = 21; my $pad = 1; my @hilite = @nums; my $hilite = shift @hilite; my $i = -1; for my $n (@nums) { for my $c ($n-$pad .. $n+$pad) { if ($c > $i) { last if $c > $max_idx; $i = $c; if ($i == $hilite) { print "I=$i\n"; $hilite = shift @hilite; } else { print " i=$i\n"; } } } }` [download] Output: `i=0 I=1 i=2 i=4 I=5 I=6 i=7 I=8 i=9 i=14 I=15 i=16 i=19 I=20 i=21` [download] Update: changed context to `$pad = 1` to keep the example output in line with the other replies (Apparently it was asking too much to figure out that I had originally set `$pad = 2` ... — Or why the downvotes?)	[reply] [d/l] [select]
Re^2: Padding search results with context by moritz (Cardinal) on Aug 14, 2007 at 06:55 UTC
Thanks, I like your solution because it's short and only iterates once over `@nums`. Perl 6 in German	[reply] [d/l]
Re: Padding search results with context by FunkyMonk (Bishop) on Aug 13, 2007 at 20:22 UTC
I'd use a fixed size buffer `@buf` with `$paddding * 2 + 1` elements and always look at the middle line (`$viewline`) of the buffer. `@interesting` is a list of thing you're interested in. `@lines` contains the data. I've used a list of numbers, but it could be text. `my @lines = 1 .. 20; my @interesting = ( 1, 3, 6, 10 , 20 ); my $padding = 2; my $bufsize = 2 * $padding + 1; my $viewline = $padding; my @buf = ('') x $bufsize; while ( @buf > 0 ) { my $line = shift @lines; # get some data shift @buf; # shorten buffer push @buf, $line # add new line if $line; # if there was one print "-->@buf\n" # if there's something to see && it's interestin +g if $buf[$viewline] && grep $buf[$viewline] eq $_, @interesting; }` [download] Output: `--> 1 2 3 -->1 2 3 4 5 -->4 5 6 7 8 -->8 9 10 11 12 -->18 19 20` [download] update: added the output	[reply] [d/l] [select]
Re^2: Padding search results with context by moritz (Cardinal) on Aug 14, 2007 at 06:51 UTC
Thank you for your reply. Sadly it uses a list from the first to the last index, which could be some thousand apart for only two `@interesting` items, so I'll pick one of the other very good solutions. Perl 6 in German	[reply] [d/l]
Re: Padding search results with context by SuicideJunkie (Vicar) on Aug 13, 2007 at 20:56 UTC
You should be able to do it in one iteration... I would use a queue to store the context lines Start by inspecting each line. If the date has changed then flush the queue and set $postContext=0; Inspect for duplicates If the line is a duplicate (of the latest item seen?), then print out and flush the queue. Set $postContext = $maxPostContext; Otherwise, add the line to the queue. If ($postContext) then print and flush the (single) item from the queue, and $postContext--; If the queue is more than $maxPreContext, then remove a line from the head of the queue. Repeat until done. So, basically, the queue saves up a maximum of $maxPreContext lines of context from before the duplication, and you keep printing $maxPostContext lines for context after the duplication. If another duplicate appears quickly, you don't get multiple copies of the same context because the queue has not refilled.	[reply]
Re: Padding search results with context by johngg (Canon) on Aug 13, 2007 at 23:14 UTC
I'd construct the `@padded` list in one go by combining the `map` with a `grep` each for the tasks of keeping things within range and removing duplicates. I'd also use a hash to keep track of which lines to highlight. `use strict; use warnings; my @nums = (1, 5, 6, 8, 15, 20); my $max = 21; my $pad = 1; my %highlight; @highlight{@nums} = (1) x @nums; my @padded; { my %seen; @padded = grep { ! $seen{$_} ++ } grep { $_ >= 1 && $_ <= $max } map { $_ - $pad .. $_ + $pad } @nums; } print $highlight{$_} ? qq{Highlighted: $_\n} : qq{$_\n} for @padded;` [download] Produces `Highlighted: 1 2 4 Highlighted: 5 Highlighted: 6 7 Highlighted: 8 9 14 Highlighted: 15 16 19 Highlighted: 20 21` [download] I hope this is of use. Cheers, JohnGG	[reply] [d/l] [select]
Re: Padding search results with context by GrandFather (Saint) on Aug 13, 2007 at 21:28 UTC
Two loops - one to build the output list, and one to output it. Things get a little more interesting if there are a very large number of lines such that the implied slurp is not feasible. `use strict; use warnings; use List::Util qw(min max); my $firstLine = 1; my $lastLine = 20; my @indexes = (1, 5, 6, 8, 15, 20); my $pad = 1; # lines of context on each side my @selLines; for my $centre (@indexes) { my $min = max ($centre - $pad, $firstLine); my $max = min ($centre + $pad, $lastLine); $_ == $centre ? $selLines[$_] = [$_, 1] : $selLines[$_] \|\|= $_ for $min .. $max; } defined ($_) && print "$_\n" for map {ref $_ ? "Highlight $_->[0]" : $_} @selLines;` [download] Prints: `Highlight 1 2 4 Highlight 5 Highlight 6 7 Highlight 8 9 14 Highlight 15 16 19 Highlight 20` [download] Note the handling for "end effects"! Update: fixed bug DWIM is Perl's answer to Gödel	[reply] [d/l] [select]
Re: Padding search results with context by dogz007 (Scribe) on Aug 13, 2007 at 22:09 UTC
In the interest of using only one iteration to produce the desired results, I rewrote my solution as follows. Note the use of two hashes: one to check for highlighted-ness, and one to check for already-printed-ness. `#!/usr/bin/perl use strict; my ($min,$max) = (0,21); my @nums = (1, 5, 6, 8, 15, 20); my $pad = 1; my (%printed,%nums); map { $nums{$_}++ } @nums; for my $i (@nums) { for my $n ($i-$pad .. $i+$pad) { if ( ! $printed{$n} && $min <= $n && $n <= $max ) { print exists $nums{$n} ? "Highlighted: " : ""; print $n, "\n"; $printed{$n}++; } } }` [download] Prints the same as before, but unfortunately it has to iterate once to build `%nums` and again to do the rest. I've tried several ways, but couldn't get shorter than two iterations. It is, however, of O(N) instead of O(N^2). I also believe this solution is easier to follow than my last one.	[reply] [d/l] [select]
Re^2: Padding search results with context by moritz (Cardinal) on Aug 14, 2007 at 07:03 UTC
In fact your previous solution was O(N log N) for N = $pad*@nums because of the sort, not O(N�) (or did I miss something)? However your current solution stores some amount of data twice in a hash where pointer is sufficient because @nums is already sorted. Perl 6 in German	[reply]
Re: Padding search results with context by dwm042 (Priest) on Aug 13, 2007 at 20:31 UTC
From your original code, you can scan for duplicates and print at the same time. `my $prev = -1 - $pad; for(@padded_overlap) { if ( $_ > $prev ) { $prev = $_; # print stuff here } }` [download] That eliminates one scan.	[reply] [d/l]
Re^2: Padding search results with context by moritz (Cardinal) on Aug 14, 2007 at 06:53 UTC
Thanks - it's obvious (of course you use the list elements where you construct them), but sometimes I need somebody to point me to the obvious ;-) Perl 6 in German	[reply]
Re^3: Padding search results with context by dwm042 (Priest) on Aug 14, 2007 at 15:29 UTC
This "one pass" solution is obviously inspired by people like Grandfather, but this is a solution I can understand ^^: `#!/usr/bin/perl use warnings; use strict; use List::Util; package main; my $max = 21; my @nums = (1, 5, 6, 8, 15, 20); my $pad = 1; # lines of context on each side my $last = -1 - $pad; foreach ( @nums ) { my $start = List::Util::max 1, $_ - $pad, $last + 1; my $end = List::Util::min $max - 1 , $_ + $pad; next if ( $start > $end ); for ( my $i = $start; $i <= $end; $i++ ) { if ( $i == $_ ) { print "Highlighted $i\n"; } else { print "Line $i\n"; } } $last = $end; }` [download] The output is: `C:\Code>perl mintest.pl Highlighted 1 Line 2 Line 4 Highlighted 5 Line 6 Line 7 Highlighted 8 Line 9 Line 14 Highlighted 15 Line 16 Line 19 Highlighted 20` [download]	[reply] [d/l] [select]
Re: Padding search results with context by dogz007 (Scribe) on Aug 13, 2007 at 20:54 UTC
I've removed the need to search for duplicates by piling up the lines you want as keys in a hash. The print code is the same as yours. `my ($min,$max) = (0,21); my @nums = (1, 5, 6, 8, 15, 20); my $pad = 1; my %line; map { $line{$_}++ for ($_-$pad..$_+$pad) } @nums; my @padded = sort { $a <=> $b } keys %line; # print the lines for (@padded){ if (@nums && $_ == $nums[0]){ print "Hilighted: $_\n"; shift @nums; } elsif ($min <= $_ && $_ <= $max) { print "$_\n"; } }` [download] Prints: `0 Hilighted: 1 2 4 Hilighted: 5 Hilighted: 6 7 Hilighted: 8 9 14 Hilighted: 15 16 19 Hilighted: 20 21` [download] Update: Included check to prevent running outside the range of valid lines.	[reply] [d/l] [select]