Sorry I was unclear. Here's some more info. I have a folder with several text files. In those test files I have lines that I'm trying to match and extract.
Here is an example of some of the lines from one of the files:
/search/detail/1164321
1.html
/rsearch/detail/1164327
1.html
/search/detail/1164639
1.html
/search/detail/1164903
1.html
/search/detail/1165763
1.html
/search/detail/1191549
1.html
/search/detail/1195169
1.html
/search/detail/1195781
1.html
/search/detail/1196405
1.html
/search/detail/1196439
I have two files that my script references, parse1.txt and parse2.txt to get the strings to match.
They currently look like this:
Parse1
http
https
Parse2
.com
.gov
.edu
I'm trying to use this bit of code to match the '/search/detail/1196439' where before I was just looking to match valid webpages that started with http or https and ended with .com or .gov or .edu. The problem is that the leading '/' is messing me up. Here's more of my code
my $calls_dir2 = "$response/Bing/1Parsed/Html";
my $parsed_dir = "$response/Bing/1Parsed/Html2";
unless ( -d $parsed_dir ) {
make_path( $parsed_dir , { verbose => 1, mode => 0755 } );
}
open( my $fh2, '<', $parse1file ) or die $!;
chomp( my @parse_terms1 = <$fh2> );
close($fh2);
open( $fh2, '<', $parse2file ) or die $!;
print "parse1file=$parse1file\n";
print "parse2file=$parse2file\n";
for my $parse1 (@parse_terms1) {
seek( $fh2, 0, 0 );
while ( my $parse2 = <$fh2> ) {
chomp($parse2);
print "$parse1 $parse2\n";
my $wanted = $parse1 . $parse2;
my @files = glob "$calls_dir2/*.txt";
printf "Got %d files\n", scalar @files;
for my $file (@files) {
open my $in_fh, '<', $file;
my $basename = fileparse($file);
my ($prefix) = $basename =~ /^(.{9})/;
my $rnumber = rand(1999);
print $prefix, "\n";
my @matches;
while (<$in_fh>) {
#push @matches, $_ if /^.*?(?:\b|_)$parse1(?:\b|_)
+.*?(?:\b|_)$parse2(?:\b|_).*?$/m;
push @matches, $_ if /^.*?(?:|_)$parse1(?:|_).*?(?
+:|_)$parse2(?:|_).*?$/m;
#push @matches, $_ if m/^($parse1)$/i;
#push @matches, $_ if m/^'$parse1'$/i;
#m/^yes$/i
}
if ( scalar @matches ) {
make_path($parsed_dir);
open my $out_fh, '>',
"$parsed_dir/${basename}.$wanted.$rnumber.txt"
+ or die $!;
$out_fh->autoflush(1);
print $out_fh $_ for @matches;
print "$out_fh \n";
close $out_fh;
}
}
}
}
Please let me know if you have enough info now. If not I'm more than happy to provide mode. Thanks in advance for the assistance!
|