comment on

Sorry I was unclear. Here's some more info. I have a folder with several text files. In those test files I have lines that I'm trying to match and extract. Here is an example of some of the lines from one of the files:

/search/detail/1164321
1.html
/rsearch/detail/1164327
1.html
/search/detail/1164639
1.html
/search/detail/1164903
1.html
/search/detail/1165763
1.html
/search/detail/1191549
1.html
/search/detail/1195169
1.html
/search/detail/1195781
1.html
/search/detail/1196405
1.html
/search/detail/1196439
[download]

I have two files that my script references, parse1.txt and parse2.txt to get the strings to match. They currently look like this:

Parse1

http
https

Parse2

.com
.gov
.edu
[download]

I'm trying to use this bit of code to match the '/search/detail/1196439' where before I was just looking to match valid webpages that started with http or https and ended with .com or .gov or .edu. The problem is that the leading '/' is messing me up. Here's more of my code

 my $calls_dir2 = "$response/Bing/1Parsed/Html";
    
    my $parsed_dir = "$response/Bing/1Parsed/Html2";
    unless ( -d $parsed_dir  ) {
        make_path( $parsed_dir , { verbose => 1, mode => 0755 } );
    }

    open( my $fh2, '<', $parse1file ) or die $!;
    chomp( my @parse_terms1 = <$fh2> );
    close($fh2);

    open( $fh2, '<', $parse2file ) or die $!;
    
    print "parse1file=$parse1file\n";
    print "parse2file=$parse2file\n";

    for my $parse1 (@parse_terms1) {
        seek( $fh2, 0, 0 );

        while ( my $parse2 = <$fh2> ) {
            chomp($parse2);
            print "$parse1 $parse2\n";

            my $wanted = $parse1 . $parse2;

            my @files = glob "$calls_dir2/*.txt";

            printf "Got %d files\n", scalar @files;

            for my $file (@files) {

                open my $in_fh, '<', $file;
                my $basename = fileparse($file);
                my ($prefix) = $basename =~ /^(.{9})/;
                my $rnumber  = rand(1999);
                print $prefix, "\n";

                my @matches;
                while (<$in_fh>) {

                    #push @matches, $_ if /^.*?(?:\b|_)$parse1(?:\b|_)
+.*?(?:\b|_)$parse2(?:\b|_).*?$/m;
                    
                    push @matches, $_ if /^.*?(?:|_)$parse1(?:|_).*?(?
+:|_)$parse2(?:|_).*?$/m;
                    
                    #push @matches, $_ if m/^($parse1)$/i;
                    #push @matches, $_ if m/^'$parse1'$/i;
                    #m/^yes$/i
                    
                }

                if ( scalar @matches ) {
                    make_path($parsed_dir);
                    open my $out_fh, '>',
                        "$parsed_dir/${basename}.$wanted.$rnumber.txt"
+ or die $!;
                    $out_fh->autoflush(1);
                    print $out_fh $_ for @matches;
                    print "$out_fh \n";
                    close $out_fh;
                }
            }
        }
    }
[download]

Please let me know if you have enough info now. If not I'm more than happy to provide mode. Thanks in advance for the assistance!

In reply to Re^2: regex question by Anonymous Monk
in thread regex question by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.