in reply to Re^4: The best way to split tab delimited file
in thread The best way to split tab delimited file

So in addition to ignoring a tab after a comma, you also want to ignore tabs within quoted strings. This revises the original spec just a bit.

Do you really require both of these things? Or is the real requirement only the latter (ignore tabs within strings), and the original example just happened to have been derived from a string with a comma-tab?

Also, I get this output, different from yours. (no tab after 'ta') (And you have a comma-tab, just before 'rug', that you do want to split.)

474627 asidase ta sidase ala,"lpha-D- ctoside gtohydrolase","razyme","arazyme (enz Corp)","Melie","lagal","idase bta", rug 00103

Replies are listed 'Best First'.
Re^6: The best way to split tab delimited file
by Ratna_Ranjan (Novice) on Nov 23, 2009 at 19:12 UTC

    want to ignore the tab within the quoted string only,this data just happened to have a comma before the next(column tabbed data)data.

    Note:After "asidase ta" there is a tab.so whatever output i got is correct.

      If you don't mind losing the tab within the quotes, pre-process the string to remove those tabs. Here I replaced the embedded tabs with spaces, then just split on tab:

      my $var='474627 asidase ta sidase ala,"lpha-D- ctoside gtohydrol +ase","razyme","arazyme (enz Corp)","Melie","lagal","idase bta", + rug 00103'; my $tmp; $var =~ s{ ("[^"]+") }{ ($tmp = $1) =~ s/\t/ /g; $tmp }xge; my @each=split(/\t/,$var); for my $eachvar(@each) { print "$eachvar\n"; }

      Update 1: Oops, I made a mistake in the pattern. The quotes belong on the inside of the capture. (Was: "([^"]+)", Now: ("[^"]+").

      Update 2: In response to a private message, here's a little better explanation of the pattern:

      # Using s{}{} form of substitute. # Substitute supports using several different separator formats # which helps one avoid having to escape things (like '/') within the +pattern. # The 'x' option which means ignore whitespace so that comments can be + easily inserted. # The 'g' option is global obviously. # The 'e' option says that the replacement part of the pattern is a pe +rl expression. $var =~ s{ ("[^"]+") # Matches two quotes and content between them. # Capture the match for use in the replacement +. # # Disection of pattern: # ("[^"]+") = full pattern # ( ) = capture everything between pare +ntheses. # " " = quotes at start and end of patt +ern. # [^"]+ = one or more non-quote character +s } { # The replacement part is a perl expressio +n. # Original: ($tmp = $1) =~ s/\t/ /g; # is same as next 2 lines: $tmp = $1; # Make a copy of the captured match. $tmp =~ s/\t/ /g; # Replace tabs with spaces throughout the +match. $tmp; # Use resultant value for replacement. }xge; # x = ignore white space and comments # g = global # e = expression
        Thanks a lot