Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks who always prove to be smarter than me! I have a folder with several text files. In those test files I have lines that I'm trying to match and extract. Here is an example of some of the lines from one of the files:

/search/detail/1164321 1.html /rsearch/detail/1164327 1.html /search/detail/1164639 1.html /search/detail/1164903 1.html /search/detail/1165763 1.html /search/detail/1191549 1.html /search/detail/1195169 1.html /search/detail/1195781 1.html /search/detail/1196405 1.html /search/detail/1196439
I have two files that my script references, parse1.txt and parse2.txt to get the strings to match. They currently look like this:

Parse1

http https
Parse2
.com .gov .edu
I'm trying to use this bit of code to match the '/search/detail/1196439' where before I was just looking to match valid webpages that started with http or https and ended with .com or .gov or .edu. The problem is that the leading '/' is messing me up. Here's my code:

my $calls_dir2 = "$response/Bing/1Parsed/Html"; my $parsed_dir = "$response/Bing/1Parsed/Html2"; unless ( -d $parsed_dir ) { make_path( $parsed_dir , { verbose => 1, mode => 0755 } ); } open( my $fh2, '<', $parse1file ) or die $!; chomp( my @parse_terms1 = <$fh2> ); close($fh2); open( $fh2, '<', $parse2file ) or die $!; print "parse1file=$parse1file\n"; print "parse2file=$parse2file\n"; for my $parse1 (@parse_terms1) { seek( $fh2, 0, 0 ); while ( my $parse2 = <$fh2> ) { chomp($parse2); print "$parse1 $parse2\n"; my $wanted = $parse1 . $parse2; my @files = glob "$calls_dir2/*.txt"; printf "Got %d files\n", scalar @files; for my $file (@files) { open my $in_fh, '<', $file; my $basename = fileparse($file); my ($prefix) = $basename =~ /^(.{9})/; my $rnumber = rand(1999); print $prefix, "\n"; my @matches; while (<$in_fh>) { #push @matches, $_ if /^.*?(?:\b|_)$parse1(?:\b|_) +.*?(?:\b|_)$parse2(?:\b|_).*?$/m; push @matches, $_ if /^.*?(?:|_)$parse1(?:|_).*?(? +:|_)$parse2(?:|_).*?$/m; #push @matches, $_ if m/^($parse1)$/i; #push @matches, $_ if m/^'$parse1'$/i; #m/^yes$/i } if ( scalar @matches ) { make_path($parsed_dir); open my $out_fh, '>', "$parsed_dir/${basename}.$wanted.$rnumber.txt" + or die $!; $out_fh->autoflush(1); print $out_fh $_ for @matches; print "$out_fh \n"; close $out_fh; } } } }

Please let me know if you have enough info now. If not I'm more than happy to provide mode. Thanks in advance for the assistance!

Replies are listed 'Best First'.
Re: Regex with two strings from files
by 1nickt (Canon) on Mar 01, 2017 at 01:31 UTC

    Please let me know if you have enough info now

    Please see SSCCE. As noted by haukex elsewhere today, "more != better". If your issue is with a regexp, please post code that shows the regexp, some sample input, and the desired output (in <code></code> tags). All the extra code here is unneeded and makes it harder to see what is going on, which reduces your chances of getting a speedy answer.

    The way forward always starts with a minimal test.

      Noted on the more is NOT better and the positive suggestions. I guess it really is a simple regex question. How can I match a string that basically is a file path with a several /////s with a regex. Thanks again!

        c:\@Work\Perl\monks>perl -wMstrict -le "my $s = 'this is the path/to/my/file what i want'; ;; my $rx_ptf = qr{ \w+ (?: / \w+)+ }xms; ;; print qq{matched: '$1'} if $s =~ m{ \b ($rx_ptf) \b }xms; " matched: 'path/to/my/file'
        The  qr// m// s/// delimiters are (more or less) arbitrary. (Update: Please see Regexp Quote-Like Operators in perlop.)


        Give a man a fish:  <%-{-{-{-<

Re: Regex with two strings from files
by Marshall (Canon) on Mar 01, 2017 at 07:05 UTC
    I am very confused by your question. It is not at all clear what you are trying to do?

    Your code has glaring errors, e.g.

    for my $parse1 (@parse_terms1) { seek( $fh2, 0, 0 ); while ( my $parse2 = <$fh2> ) {
    Never re-read the same file (in your code, your file handle <$fh2> and the seek) more than once without an exceptional reason. You have many loops within loops in this section of code the purpose of which are not at all clear.

    You do not explain what parse1 or parse2 are intended to do. Or why they even should be separate files?

    It looks like to me that you should use the files, parse1,2 to build a regex that is then executed over a number of directories and/or files?

    I do not see any examples of what should "match" and what should "not match" with a simple example.

    Your best bet is to start over and explain in English what you are trying to do. It would be helpful to me if you could show some simple examples of the input contained within the files and what you intend to match or not match. I simply do not understand enough about the problem to be of further assistance.

Re: Regex with two strings from files
by choroba (Cardinal) on Mar 01, 2017 at 04:15 UTC
    Probably your previous question: regex question

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,