WWq has asked for the wisdom of the Perl Monks concerning the following question:

I have two files, File1 and File2. I would like to compare lines between these two files and print. However I do not find a way to match specific string. I tried the coding below, but it prints out unexpected result.

I would like to print File2 data (eg. b05ldt10ud0e0) when it matches to File1's names(eg. ldt). For those data with asterisk * in File1 (eg. b05can03*n0b5), * could be any character. For matching, It must match head and tail of an * in File2.Thus, it will print out b05can03un0b5. And printing must be follow the sequence of File1. Could anyone give me advice on this?

File1

--------------------------

ldt

b05dcc00

mny

b05can03*n0b5

b05mdd04*n9c9

File2

---------------------------

/* To start: b05afn10ud0b0 */

/* To start: b05dcc00ud0c0 */

/* To start: b05ldt10ud0e0 */

/* To start: b05dcc10ud0i0 */

/* To start: b05afn10ud0m0 */

/* To start: b05afn10ud0s0 */

/* To start: b05mny00ud0b5 */

/* To start: b05mny00ud0d3 */

/* To start: b05mdd04un9c9 */

/* To start: b05ahn00ud0j5 */

/* To start: b05mny00ud0m7 */

/* To start: b05can03un0b0 */

/* To start: b05can03un0b5 */

Coding:

-----------------------

my (@arr1,@arr2); @arr1=<File1>; @arr2=<File2>; foreach my $line1 (@arr1) { foreach my $line2 (@arr2){ if ($line1 =~ $line2 && $line1 !~ m/^\w+(\W)\w+(.*)/) { print "$line2\n";` } } }

expected result:

b05ldt10ud0e0

b05dcc00ud0c0

b05mny00ud0b5

b05mny00ud0d3

b05mny00ud0m7

b05can03un0b5

b05mdd04un9c9

Replies are listed 'Best First'.
Re: How to match specific character in a string?
by Eily (Monsignor) on Sep 10, 2013 at 16:08 UTC

    It might not be the fastest way, but the way to do that which I find the more logical is to first go through the first file and store its search pattern as a regex in an array (because order matters), and then go through the second file and find which lines match. And to be able to use the first file to order the result, I "associate" each match to the corresponding search/regex. And the associative structure in Perl is hashes.

    use Data::Dumper; my @file1 = split "\n", <<_FILE1_; # I can't have two __DATA__ section +s in the same file so here goes the first one ldt b05dcc00 mny b05can03*n0b5 b05mdd04*n9c9 _FILE1_ my @regexen; # Ordered list of search patterns for (@file1) { next unless m<\S>; # skip the blank lines s<\*><\\w*>g; # the * becomes \w* which means "any number of alphanu +m chars or _" push @regexen, $_; # we push the pattern at the end of the list } my %result; while (<DATA>) # for each line of file 2 { for $search (@regexen) # for each search pattern { push @{ $result{$search} }, $1 if /\b(\w*$search\w*)\b/; # we push + the line at the end of the matches of $search if it matches } } # let's print the result ! for $key (@regexen) # for each search pattern, in the right order { for $line (@{ $result{$key} }) # for each line this search pattern m +atched { print $line, "\n"; # we print it } } __DATA__ /* To start: b05afn10ud0b0 */ /* To start: b05dcc00ud0c0 */ /* To start: b05ldt10ud0e0 */ /* To start: b05dcc10ud0i0 */ /* To start: b05afn10ud0m0 */ /* To start: b05afn10ud0s0 */ /* To start: b05mny00ud0b5 */ /* To start: b05mny00ud0d3 */ /* To start: b05mdd04un9c9 */ /* To start: b05ahn00ud0j5 */ /* To start: b05mny00ud0m7 */ /* To start: b05can03un0b0 */ /* To start: b05can03un0b5 */
    b05ldt10ud0e0 b05dcc00ud0c0 b05mny00ud0b5 b05mny00ud0d3 b05mny00ud0m7 b05can03un0b5 b05mdd04un9c9

    You should use Data::Dumper; and print Dumper \%result; to see what it's made of if you have trouble understanding what I did :)

Re: How to match specific character in a string?
by kcott (Archbishop) on Sep 11, 2013 at 03:46 UTC

    G'day WWq,

    Here's my take on a solution:

    $ perl -Mstrict -Mwarnings -le ' my @arr1 = qw{ldt b05dcc00 mny b05can03*n0b5 b05mdd04*n9c9}; my @arr2 = map { /: (\w+)/; $1 } ( "/* To start: b05afn10ud0b0 */ ", "/* To start: b05dcc00ud0c0 */ ", "/* To start: b05ldt10ud0e0 */ ", "/* To start: b05dcc10ud0i0 */ ", "/* To start: b05afn10ud0m0 */ ", "/* To start: b05afn10ud0s0 */ ", "/* To start: b05mny00ud0b5 */ ", "/* To start: b05mny00ud0d3 */ ", "/* To start: b05mdd04un9c9 */ ", "/* To start: b05ahn00ud0j5 */ ", "/* To start: b05mny00ud0m7 */ ", "/* To start: b05can03un0b0 */ ", "/* To start: b05can03un0b5 */", ); for my $re (map { s/[*]/.*?/; $_ } @arr1) { /$re/ && print for @arr2; } ' b05ldt10ud0e0 b05dcc00ud0c0 b05mny00ud0b5 b05mny00ud0d3 b05mny00ud0m7 b05can03un0b5 b05mdd04un9c9

    I'll walk you through that, comparing my code with yours:

    1. @arr1: No change to your code.
    2. @arr2: Because of the requirement, "printing must be follow the sequence of File1" [sic], you'll need to run through this array multiple times. To avoid dealing with all that extra text ("/* To start:" and " */") on every iteration, I've filtered that out initially with a simple map. That's a minor change to your code.
    3. Outer loop (@arr1): I've added another simple map so we're now dealing with regular expression patterns. Again, a minor change to your code.
    4. Inner loop (@arr2): We've done most of the work by now so just print when there's a match. There didn't seem to be enough code left to warrant more than a single line. That's the biggest change to your code.

    -- Ken

Re: How to match specific character in a string?
by hankcoder (Scribe) on Sep 11, 2013 at 12:01 UTC

    I do it in another approach. I won't put everything in single line of regex, it would make the code hard to debug in future and limit the capability to expand if needed.

    print "Content-Type: text/html; charset=utf-8\n\n"; my (@arr1) = ( "ldt", "b05dcc00", "mny", "b05can03*n0b5", "b05mdd04*n9c9" ); my (@arr2) = ( "/* To start: b05afn10ud0b0 */", "/* To start: b05dcc00ud0c0 */", "/* To start: b05ldt10ud0e0 */", "/* To start: b05dcc10ud0i0 */", "/* To start: b05afn10ud0m0 */", "/* To start: b05afn10ud0s0 */", "/* To start: b05mny00ud0b5 */", "/* To start: b05mny00ud0d3 */", "/* To start: b05mdd04un9c9 */", "/* To start: b05ahn00ud0j5 */", "/* To start: b05mny00ud0m7 */", "/* To start: b05can03un0b0 */", "/* To start: b05can03un0b5 */ " ); foreach my $line1 (@arr1) { chomp($line1); # clear \n if data is from file $line1 =~ s/^\s+//; # remove leading whitespace $line1 =~ s/\s+$//; # remove trailing whitespace if ($line1 ne "") # proceed only if something in $line1 { print "Search [$line1]....<br>"; foreach my $line2 (@arr2) { chomp($line2); # clear \n if data is from file $line2 =~ s/^\s+//; # remove leading whitespace $line2 =~ s/\s+$//; # remove trailing whitespace my ($data) = ($line2 =~ m/\/\*(.*)\*\//si); # extract anything w +ithin /* */ my ($name, $value) = split( /\:/, $data ); # split separator cha +r : $value =~ s/^\s+//; # remove leading whitespace $value =~ s/\s+$//; # remove trailing whitespace if ($line1 =~ /\*/) # check if $line1 contain special char { my ($part1, $part2) = split( /\*/, $line1 ); if ($value =~ m/$part1(.*)$part2/i) # /i make it incase sensit +ive { print "match $line2<br>"; } } else { if ($value =~ m/$line1/i) { print "match $line2<br>"; } } } # // foreach $line2 print "<br>"; # print empty line } } # // foreach $line1

    Output result

    Search [ldt].... match /* To start: b05ldt10ud0e0 */ Search [b05dcc00].... match /* To start: b05dcc00ud0c0 */ Search [mny].... match /* To start: b05mny00ud0b5 */ match /* To start: b05mny00ud0d3 */ match /* To start: b05mny00ud0m7 */ Search [b05can03*n0b5].... match /* To start: b05can03un0b5 */ Search [b05mdd04*n9c9].... match /* To start: b05mdd04un9c9 */