Hi there,

it's the first time something like this happens to me.

I have this flat file and simply... the lines gets all messed up when I am printing them on screen, I can't even parse them properly! This is very frustrating. Sadly, I can not attach the file here. When I open and visualize it with gedit, everything looks fine. Here is a snippet:

missense,0.40851449275362317,1.0,-100 2.853,2.853,5.706,2.853,2.853,8.559,8.559,... missense,0.40851449275362317,1.0,0 2.827,2.827,5.655,2.827,2.827,5.655,8.482,... frameshift,0.056074766355140186,0.7697841726618705,-64 5.290,1.763,0.000,8.817,1.763,3.527,1.763,... missense,0.44542772861356933,1.0,0

Basically it alternates two kinds of lines: a title line and a data line.

Now, say I want to print the first line:

open FILE, "<", "weird" or die $!; my @data = <FILE>; for (my $line=0; $line<1; $line++) { print $data[$line]; }

And it then prints ALL the odd lines! One after another, ignoring all the lines starting with >. I need to parse this file. When I split the lines with the split command, it considers the last element of an even line to be the whole following odd one, and it ignore the actual true last element. E.g., if I use:

my @tmp = split(',', $data[0]); print $tmp[$#tmp]."\n".$tmp[$#tmp-1];
The actual output is:
missense,0.40851449275362317,1.0, 2.853,2.853,5.706,2.853,2.853,8.559,8.559,... 0

Note that there is a \n in the middle of the first line, and a -100 value missing.

What's happening? This file was made by a bot interacting with a server...I guess it may be something related to the file encoding, but I have no idea about how to fix it.


In reply to Very weird things when printing (may be an encoding issue?) by dottornomade

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.