natoikos has asked for the wisdom of the Perl Monks concerning the following question:

i have two tab delimited files. i want to match 204508_a_st in one file to 204508_A_ST in the other.
open INDAT, "file1" or die "$!"; my @array; my @probes; while (<INDAT>){ @array=split/\t/; open INCOMP, "file2" or die "$!"; while (<INCOMP>){ @probes=split/\t/; if ($array[0]=~/$probes[1]/i){ print $probes[1];#or whatever else } } close INCOMP; }
if i use the array entry probes1 it doesn't work, but if i assign a variable to the same exact value 204508_A_ST and use that instead, it does. is there something i am missing regarding the array probes entry?

Replies are listed 'Best First'.
Re: beginner regex question
by vennirajan (Friar) on Dec 30, 2005 at 05:00 UTC
    Hi,

    You can open the file handle outside of the first while loop also. Because, for each iteration it will open and close the files. This will increase the system load. You can avoid this by open the file outside of the loop and move the file pointer to the starting position for each iteration.


    Try this,
    open INDAT, "file1" or die "$!"; my @array; my @probes; open INCOMP, "file2" or die "$!"; while (<INDAT>){ @array=split/\t/; while (<INCOMP>){ chomp ( $_ ); @probes=split/\t/; if ($array[0]=~/$probes[1]/i){ print $probes[1];#or whatever else } } seek INCOMP, 0 , 0; } close INCOMP;

    Regards,
    S.Venni Rajan.
    "A Flair For Excellence."
                    -- BK Systems.
Re: beginner regex question
by Ovid (Cardinal) on Dec 30, 2005 at 04:52 UTC

    Why do you keep reopening the second file? That's probably not doing what you want. (Update: yes, that is what you want. Silly me. I should have put in the seek that vennirajan listed.)

    Try the following debugging code:

    print "'$probes[1]'";

    Also, let's clean this up a bit (making the possibly false assumption that you want a tighter scope for the arrays since you overwrite them).

    my ($file1, $file2) = qw(file1 file2); open INDAT, "<", $file1 or die "Cannot open ($file1) for reading: $!" +; open INCOMP, "<", $file2 or die "Cannot open ($file2) for reading: $!" +; while (defined (my $line = <INDAT>)) { chomp $line; # remove newline my @array = split /\t/, $line; while (defined (my $line = <INCOMP>)) { chomp $line; my @probes = split /\t/, $line; # using \Q and \E to avoid having regex meta character problem +s # see "perldoc perlre" if ($array[0] =~ /\Q$probes[1]\E/i) { print "Found $probes[1]\n"; } } } close INCOMP; close INDAT;

    That should get you closer to what you're looking for. Note that we only open each file once. We also remove the newline from each line to ensure that you're not accidentally trying to match that. Also, read perlre to see how \Q and \E work (the \E is actually not necessary in the above example).

    If you're using an older Perl, my three-argument open syntax won't work and you'll have to do this:

    open FH, "<$file1" ... # or open FH, $file1 ...

    Cheers,
    Ovid

    New address of my CGI Course.

Re: beginner regex question
by Samy_rio (Vicar) on Dec 30, 2005 at 04:41 UTC

    Hi natoikos, Use chomp after reading a line from the files.

    Try this,

    open INDAT, "$file1" or die "$!"; my @array; my @probes; while (<INDAT>){ chomp($_); @array=split/\t/; open INCOMP, "$file2" or die "$!"; while (<INCOMP>){ chomp($_); @probes=split/\t/; if ($array[0]=~/$probes[1]/i){ print $probes[1];#or whatever else } } close INCOMP; }

    Regards,
    Velusamy R.


    eval"print uc\"\\c$_\""for split'','j)@,/6%@0%2,`e@3!-9v2)/@|6%,53!-9@2~j';

Re: beginner regex question
by hj4jc (Beadle) on Dec 30, 2005 at 21:32 UTC
    Hi natoikos,

    I don't know if this would be helpful for you or not,
    but after playing with your code and some example files,
    I wonder if you are searching for multiple matches of $array[0] in your $probes1 list in file2,
    or after finding the first match of $array[0], you want to move on to the next $array[0] and continue searching.
    But I think the while loop quits running after finding the first match.
    Perhaps you want to use a different loop structure.
    Saints, please correct me if I am wrong.