in reply to Parsing a tab delimited file

Hi, If you show us your current code, we'd probably be able to help.
cheers
davis
Is this going out live?
No, Homer, very few cartoons are broadcast live - it's a terrible strain on the animator's wrist

Replies are listed 'Best First'.
Re: Re: Parsing a tab delimited file
by snowy (Sexton) on May 09, 2002 at 11:33 UTC

    Here is the code I have for opening the file and parsing the tab
    @ molecules and @locus are text files which are loaded into arrays

    my @locus_small = (); my $line; foreach $line (@locus) { my @tokens = split(/\t+/, $line); unless(scalar @tokens < 6) { push(@locus_small, "$tokens[0]\t$tokens[1]\t"); } }

    Then I search and print the @found which is based on locus_small but I want the @locus which matched.

    So really what I want to do is only search a few columns for a match but print the whole row if there is a match.

    foreach my $molecule (@molecules) { my @found = grep /\Q$molecule\E/i, @locus_small; if (@found) { print OUTDATA ($molecule, ": \n\t", join "\t", @found); } }

      you're doing a lot of extra work in your code. my example below will read your set of locii into an array, create a set of molecules, and print the full locus record if a molecule is found in the first two tokens. records with less than six fields are skipped.

      i skip the interim array (@locus_small in your code,) and use nested fors instead of grep, because i think it makes more sense. the really tricky bit is ( @{[]} = split /\s/ ) < 6, but i think my comments should help everyone understand what i'm doing.

      the main loop is effectively six lines of code, which should be all you need. oh, and yes, i split on single space instead of tab in my example -- i'm too lazy to change the settings in my editor to spit out tabs instead of spaces ;-) enjoy!

      #!/usr/bin/perl -w use strict; $|++; # create phony filehandle named OUTDATA (use STDOUT for debugging) *OUTDATA = *STDOUT; # create phony molecule list my @phony_molecule_list = qw( abc def abcd abc ); # create hash of molecules, to avoid duplicates my %molecules; # populate hash of molecules @molecules{@phony_molecule_list}++; # create list of locii, from DATA filehandle chomp( my @locii = <DATA> ); # for each molecule, sorted by longest word first for my $molecule (sort { length $b <=> length $a } keys %molecules) { LOCUS: for(@locii) { # skip if less than xxx tokens # i need to fake out split to get number of fields. usually # you can force list context by () = ..., but this doesn't # work with split. so, i force split to return its output # to an anonymous array (list context,) then evaluate the # anonymous array in scalar context to get number of element +s. next LOCUS if( ( @{[]} = split /\s/ ) < 6 ); # get first two fields (assumes at least three fields exist) my($test4match) = ( /(.+?\s.+?)\s/ ); # match a molecule (whole words only), and print the line if( $test4match =~ /\b\Q$molecule\E\b/ ) { print OUTDATA $_,$/; } } } __DATA__ abcd ghi 1 2 3 4 xyx yxy a b c abc xyx z y x w efg def 5 6 7 8 abc c o deg abc 9 0 1 abd abc x x

      ~Particle *accelerates*