I am experimenting with different fast ways to search through data files. I need to parse a file for a search term, but also return the section of the file it came from.
One idea I came up with was to grep for line numbers of the search term, and grep for line numbers of the section headers. With this information I hoped to use a sed command to output the header before the search term, and the entire section (up till the line before the next section header).
The below code works, but it's horribly inefficient since I have nested loops and search the data linearly. I need something like a binary search, that finds the two closest numbers in the @headers array for each number in the @matches array.
Example:
Header at line 57
Search Match at line 62
Header at line 69
Output all lines from 57-68.
Thanks!
sub searchData { my @headers = `grep -F -n \'MatchHeader\' $fileName | cut -f1 -d:` +; my @matches = `grep -F -n \'$string\' $fileName | cut -f1 -d:`; my $match; my $head; my $higher; my $lower; foreach $match (@matches) { chomp ($match); foreach $head (@headers) { chomp ($head); if ($head <= $match) { $lower = $head; } elsif ($head > $match) { $higher = $head; last; } } $higher--; system("sed -n '$lower,$higher\p\;$higher'\q data_file"); } }
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |