Hi everyone!
I tried what you all suggested and now my code looks like this:
open(HH, "<$PATHDATA/query04.txt");
chomp( my @query_arr = <HH> );
close HH;
open(XX, "<$PATHDATA/multisearch_final_sorted_3.txt");
open(DD, ">$PATHDATA/query_results.txt");
my $count=0;
while ( my $line = <XX> ) {
if ($debug) {
print MAGENTA, "LINEA:$line\n";
}
for my $query_el (@query_arr) {
if ($debug) {
print GREEN, "QUERY:$query_el\n";
}
if ($line =~ /^$query_el\]\[.*/i) {
if ($debug) {
print GREEN, "QUERY:$query_el\n";
print MAGENTA, "MATCH:$line\n";
}
print DD "QUERY:$query_el\n";
print DD "MATCH:$line\n";
print DD "#################";
$count ++;
}
}
}
print DD "Entrate trovate: $count";
close XX;
close DD;
exit 0;
Unfortunately, due to the large amount of data I have, I cannot understand if it's working, since it's not printing anything and it seems to take forever to finish.
Is there a way I can make it a little faster?
Every suggestion is really appreciated.
Thanks again,
Giu | [reply] [d/l] |
You said:
Unfortunately, due to the large amount of data I have, I cannot understand if it's working...
How many lines in "query04.txt"? How many lines in "multisearch_final_sorted_3.txt"? If you create a test version of each file, containing just a few lines that should produce some output, does the script work correctly on those test files? (Hint: Allowing file names to be provided as command line args can help with testing.)
One way to try speeding things up is to create a single regex from your query file, by joining the lines with "|":
#!/usr/bin/perl
use strict;
use warnings;
my $PATHDATA = "."; # (you didn't say how this was being set)
my ( $query_list_file, $file_to_search ) = ( @ARGV == 2 ) ?
@ARGV :
( "$PATHDATA/query04.txt", "$PATHDATA/multisearch_final_sorted_3.t
+xt" );
open( HH, "<", "$PATHDATA/$query_list_file") or die "$PATHDATA/$query_
+list_file: $!\n";
chomp( my @query_arr = <HH> );
close HH;
my $query_regex = join( '|', @query_arr );
open(XX, "<", "$PATHDATA/$file_to_search") or die "$PATHDATA/$file_to_
+search: $!\n";
open(DD, ">", "$PATHDATA/query_results.txt") or die "$PATHDATA/query_r
+esults.txt: $!\n";
my $count=0;
while ( <XX> ) {
if ( /^($query_regex)\]/ ) {
print DD "############\nQUERY: $1\nMATCH: $_\n";
$count++;
}
}
print DD "Entrate trovate: $count\n";
(In addition to allowing for other input files and using a single regex to check all matches, I also left out the "debug" stuff, rearranged the output format a little, and changed the "open" statements to use the 3-arg style.) UPDATED to add "or die ..." on each of the "open" statements -- that should be a habit.
If you still have a problem when using some small test files, post a complete and runnable script (like the one shown here) with the test data. | [reply] [d/l] |
I'm not a master of regex's (yet?) but I think I see the problem...
if ($line =~ /^$query_el\]\[.*/i) {
Remember that $ has special meaning inside a regex (end of line). I *think* that you need to use the qr operator before the regex to evaluate the contents of $query_el before you perform the regex. Someone please correct me if I'm wrong here. | [reply] [d/l] |
When you're not sure, you should try it yourself and find out, rather than posting a guess. As it happens, you are wrong in this case: perl interpolates the variable $query_el into the regex.
| [reply] |