comment on

Hello monks who always prove to be smarter than me! I have a folder with several text files. In those test files I have lines that I'm trying to match and extract. Here is an example of some of the lines from one of the files:

/search/detail/1164321
1.html
/rsearch/detail/1164327
1.html
/search/detail/1164639
1.html
/search/detail/1164903
1.html
/search/detail/1165763
1.html
/search/detail/1191549
1.html
/search/detail/1195169
1.html
/search/detail/1195781
1.html
/search/detail/1196405
1.html
/search/detail/1196439
[download]

I have two files that my script references, parse1.txt and parse2.txt to get the strings to match. They currently look like this:

Parse1

http
https
[download]

Parse2

.com
.gov
.edu
[download]

I'm trying to use this bit of code to match the '/search/detail/1196439' where before I was just looking to match valid webpages that started with http or https and ended with .com or .gov or .edu. The problem is that the leading '/' is messing me up. Here's my code:

 my $calls_dir2 = "$response/Bing/1Parsed/Html";
    
    my $parsed_dir = "$response/Bing/1Parsed/Html2";
    unless ( -d $parsed_dir  ) {
        make_path( $parsed_dir , { verbose => 1, mode => 0755 } );
    }

    open( my $fh2, '<', $parse1file ) or die $!;
    chomp( my @parse_terms1 = <$fh2> );
    close($fh2);

    open( $fh2, '<', $parse2file ) or die $!;
    
    print "parse1file=$parse1file\n";
    print "parse2file=$parse2file\n";

    for my $parse1 (@parse_terms1) {
        seek( $fh2, 0, 0 );

        while ( my $parse2 = <$fh2> ) {
            chomp($parse2);
            print "$parse1 $parse2\n";

            my $wanted = $parse1 . $parse2;

            my @files = glob "$calls_dir2/*.txt";

            printf "Got %d files\n", scalar @files;

            for my $file (@files) {

                open my $in_fh, '<', $file;
                my $basename = fileparse($file);
                my ($prefix) = $basename =~ /^(.{9})/;
                my $rnumber  = rand(1999);
                print $prefix, "\n";

                my @matches;
                while (<$in_fh>) {

                    #push @matches, $_ if /^.*?(?:\b|_)$parse1(?:\b|_)
+.*?(?:\b|_)$parse2(?:\b|_).*?$/m;
                    
                    push @matches, $_ if /^.*?(?:|_)$parse1(?:|_).*?(?
+:|_)$parse2(?:|_).*?$/m;
                    
                    #push @matches, $_ if m/^($parse1)$/i;
                    #push @matches, $_ if m/^'$parse1'$/i;
                    #m/^yes$/i
                    
                }

                if ( scalar @matches ) {
                    make_path($parsed_dir);
                    open my $out_fh, '>',
                        "$parsed_dir/${basename}.$wanted.$rnumber.txt"
+ or die $!;
                    $out_fh->autoflush(1);
                    print $out_fh $_ for @matches;
                    print "$out_fh \n";
                    close $out_fh;
                }
            }
        }
    }
[download]

Please let me know if you have enough info now. If not I'm more than happy to provide mode. Thanks in advance for the assistance!

In reply to Regex with two strings from files by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.