in reply to Re^5: The best way to split tab delimited file
in thread The best way to split tab delimited file

want to ignore the tab within the quoted string only,this data just happened to have a comma before the next(column tabbed data)data.

Note:After "asidase ta" there is a tab.so whatever output i got is correct.

  • Comment on Re^6: The best way to split tab delimited file

Replies are listed 'Best First'.
Re^7: The best way to split tab delimited file
by gmargo (Hermit) on Nov 23, 2009 at 19:20 UTC

    If you don't mind losing the tab within the quotes, pre-process the string to remove those tabs. Here I replaced the embedded tabs with spaces, then just split on tab:

    my $var='474627 asidase ta sidase ala,"lpha-D- ctoside gtohydrol +ase","razyme","arazyme (enz Corp)","Melie","lagal","idase bta", + rug 00103'; my $tmp; $var =~ s{ ("[^"]+") }{ ($tmp = $1) =~ s/\t/ /g; $tmp }xge; my @each=split(/\t/,$var); for my $eachvar(@each) { print "$eachvar\n"; }

    Update 1: Oops, I made a mistake in the pattern. The quotes belong on the inside of the capture. (Was: "([^"]+)", Now: ("[^"]+").

    Update 2: In response to a private message, here's a little better explanation of the pattern:

    # Using s{}{} form of substitute. # Substitute supports using several different separator formats # which helps one avoid having to escape things (like '/') within the +pattern. # The 'x' option which means ignore whitespace so that comments can be + easily inserted. # The 'g' option is global obviously. # The 'e' option says that the replacement part of the pattern is a pe +rl expression. $var =~ s{ ("[^"]+") # Matches two quotes and content between them. # Capture the match for use in the replacement +. # # Disection of pattern: # ("[^"]+") = full pattern # ( ) = capture everything between pare +ntheses. # " " = quotes at start and end of patt +ern. # [^"]+ = one or more non-quote character +s } { # The replacement part is a perl expressio +n. # Original: ($tmp = $1) =~ s/\t/ /g; # is same as next 2 lines: $tmp = $1; # Make a copy of the captured match. $tmp =~ s/\t/ /g; # Replace tabs with spaces throughout the +match. $tmp; # Use resultant value for replacement. }xge; # x = ignore white space and comments # g = global # e = expression
      Thanks a lot