in reply to Re: regex question
in thread regex question

Sorry I was unclear. Here's some more info. I have a folder with several text files. In those test files I have lines that I'm trying to match and extract. Here is an example of some of the lines from one of the files:

/search/detail/1164321 1.html /rsearch/detail/1164327 1.html /search/detail/1164639 1.html /search/detail/1164903 1.html /search/detail/1165763 1.html /search/detail/1191549 1.html /search/detail/1195169 1.html /search/detail/1195781 1.html /search/detail/1196405 1.html /search/detail/1196439

I have two files that my script references, parse1.txt and parse2.txt to get the strings to match. They currently look like this:

Parse1 http https Parse2 .com .gov .edu

I'm trying to use this bit of code to match the '/search/detail/1196439' where before I was just looking to match valid webpages that started with http or https and ended with .com or .gov or .edu. The problem is that the leading '/' is messing me up. Here's more of my code

my $calls_dir2 = "$response/Bing/1Parsed/Html"; my $parsed_dir = "$response/Bing/1Parsed/Html2"; unless ( -d $parsed_dir ) { make_path( $parsed_dir , { verbose => 1, mode => 0755 } ); } open( my $fh2, '<', $parse1file ) or die $!; chomp( my @parse_terms1 = <$fh2> ); close($fh2); open( $fh2, '<', $parse2file ) or die $!; print "parse1file=$parse1file\n"; print "parse2file=$parse2file\n"; for my $parse1 (@parse_terms1) { seek( $fh2, 0, 0 ); while ( my $parse2 = <$fh2> ) { chomp($parse2); print "$parse1 $parse2\n"; my $wanted = $parse1 . $parse2; my @files = glob "$calls_dir2/*.txt"; printf "Got %d files\n", scalar @files; for my $file (@files) { open my $in_fh, '<', $file; my $basename = fileparse($file); my ($prefix) = $basename =~ /^(.{9})/; my $rnumber = rand(1999); print $prefix, "\n"; my @matches; while (<$in_fh>) { #push @matches, $_ if /^.*?(?:\b|_)$parse1(?:\b|_) +.*?(?:\b|_)$parse2(?:\b|_).*?$/m; push @matches, $_ if /^.*?(?:|_)$parse1(?:|_).*?(? +:|_)$parse2(?:|_).*?$/m; #push @matches, $_ if m/^($parse1)$/i; #push @matches, $_ if m/^'$parse1'$/i; #m/^yes$/i } if ( scalar @matches ) { make_path($parsed_dir); open my $out_fh, '>', "$parsed_dir/${basename}.$wanted.$rnumber.txt" + or die $!; $out_fh->autoflush(1); print $out_fh $_ for @matches; print "$out_fh \n"; close $out_fh; } } } }

Please let me know if you have enough info now. If not I'm more than happy to provide mode. Thanks in advance for the assistance!

Replies are listed 'Best First'.
Re^3: regex question
by haukex (Archbishop) on Mar 01, 2017 at 08:56 UTC

    This was re-posted as a root node here. Normally I might consider one of the two for reaping, but nobody has replied to this node yet, so I'm just posting this notice.