comment on

I am experimenting with different fast ways to search through data files. I need to parse a file for a search term, but also return the section of the file it came from.

One idea I came up with was to grep for line numbers of the search term, and grep for line numbers of the section headers. With this information I hoped to use a sed command to output the header before the search term, and the entire section (up till the line before the next section header).

The below code works, but it's horribly inefficient since I have nested loops and search the data linearly. I need something like a binary search, that finds the two closest numbers in the @headers array for each number in the @matches array.

Example:

Header at line 57
Search Match at line 62
Header at line 69
Output all lines from 57-68.

Thanks!

sub searchData {
    my @headers = `grep -F -n \'MatchHeader\' $fileName | cut -f1 -d:`
+;
    my @matches = `grep -F -n \'$string\' $fileName | cut -f1 -d:`;
    my $match; my $head; my $higher; my $lower;

    foreach $match (@matches) {
        chomp ($match);
        foreach $head (@headers) {
            chomp ($head);
            if ($head <= $match) {
                $lower = $head;
            } elsif ($head > $match) {
                $higher = $head;
                last;
            }
        }
        $higher--;
        system("sed -n '$lower,$higher\p\;$higher'\q data_file");
    }
}
[download]

In reply to search array for closest lower and higher number from another array by bigbot

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.