Re^6: The best way to split tab delimited file

Replies are listed 'Best First'.
Re^7: The best way to split tab delimited file by gmargo (Hermit) on Nov 23, 2009 at 19:20 UTC
If you don't mind losing the tab within the quotes, pre-process the string to remove those tabs. Here I replaced the embedded tabs with spaces, then just split on tab: `my $var='474627 asidase ta sidase ala,"lpha-D- ctoside gtohydrol +ase","razyme","arazyme (enz Corp)","Melie","lagal","idase bta", + rug 00103'; my $tmp; $var =~ s{ ("[^"]+") }{ ($tmp = $1) =~ s/\t/ /g; $tmp }xge; my @each=split(/\t/,$var); for my $eachvar(@each) { print "$eachvar\n"; }` [download] Update 1: Oops, I made a mistake in the pattern. The quotes belong on the inside of the capture. (Was: `"([^"]+)"`, Now: `("[^"]+")`. Update 2: In response to a private message, here's a little better explanation of the pattern: # Using s{}{} form of substitute. # Substitute supports using several different separator formats # which helps one avoid having to escape things (like '/') within the +pattern. # The 'x' option which means ignore whitespace so that comments can be + easily inserted. # The 'g' option is global obviously. # The 'e' option says that the replacement part of the pattern is a pe +rl expression. $var =~ s{ ("[^"]+") # Matches two quotes and content between them. # Capture the match for use in the replacement +. # # Disection of pattern: # ("[^"]+") = full pattern # ( ) = capture everything between pare +ntheses. # " " = quotes at start and end of patt +ern. # [^"]+ = one or more non-quote character +s } { # The replacement part is a perl expressio +n. # Original: ($tmp = $1) =~ s/\t/ /g; # is same as next 2 lines: $tmp = $1; # Make a copy of the captured match. $tmp =~ s/\t/ /g; # Replace tabs with spaces throughout the +match. $tmp; # Use resultant value for replacement. }xge; # x = ignore white space and comments # g = global # e = expression [download]	[reply] [d/l] [select]
Re^8: The best way to split tab delimited file by Ratna_Ranjan (Novice) on Nov 24, 2009 at 23:25 UTC
Thanks a lot	[reply]