Yes, you have missed somethin vital. If you require that either column may randomly blank, are using spaces and not "\t" tabs as the separator you have invalid and unparsable data. Unless you have either fixed column widths or some defined separator structure you are up the proverbial. Consider this:

A B C D E

You are chopping off leading spaces which will move both C and E into col 1 but there is no way to assign either to a column unless you have a fixed width or say a tab separator. If the data is really this:

A\tB \tC D\t E\t

which is what it should be you are fine. Just split on the "\t".

Did you generate the data yourself? If not virtually any programmer with half a brain would do column data like:

# first remove tabs from data and sub in 4 spaces $_ = s/\t/ /g for @cols my $row = join "\t", @cols; print SOMEFILE $row, "\n";

This gives you a file you can parse unambiguosly as each and every tab represents a column break. Thus if @cols = ( '', '', 'foo', 'bar', '' ) the resulting record will be "\t\tfoo\tbar\t" A split "\t" on this record will give back the original col fields unambiguously regardless of the contents of @cols - the price you pay is that you can't allow tabs in your data. If you have to have tabs you would generally substitute in some token (must be very improbable in data) on the way in and remove it on the way out.

@cols = ( "foo", "\t", "bar" ); print "original '@cols' ", scalar @cols, "\n"; s/\t/<%tab%>/g for @cols; $row = join "\t", @cols; print "row '$row'\n"; @ret_cols = split "\t", $row; s/<%tab%>/\t/g for @ret_cols; print "retreive '@ret_cols' ", scalar @ret_cols, "\n"; __DATA__ original 'foo bar' 3 row 'foo <%tab%> bar' retreive 'foo bar' 3

I suspect that you do not realise that the original programmer used "\t" as the col separator. When you use "\s" in a split if will split on tabs, spaces and newlines. I would try a straight split "\t" and don't do s/^\s+// which may well produce the results you want.

cheers

tachyon

s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print


In reply to Re: Re: Re: making a single column out of a two-column text file by tachyon
in thread making a single column out of a two-column text file by allolex

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.