apachi15 has asked for the wisdom of the Perl Monks concerning the following question:

HI, I posted a question yesterday alraeady but I think I could not make my problems clear, that's why I am trying it again. Down below are examples which I tried and a few lines from the text I want to edit.The general problem is to edit a space delimited text in that way that it is easy to import the text file (in this case with 10 columns) in a database. For this reason I want to replace all spaces with a tab but since there is not in every column an entry there must be a tab again instead of the missing entry. Anyway I tried your suggestions, but still there are some problems with it. For example:
open my $fh_in , '<', 'test1.txt' or die "Can not open file test1.txt: + $!"; open my $fh_out, '>', 'test2.txt' or die "Can not open file test2.txt: + $!"; while (<$fh_in>) { s/\s+/\t/g; print $fh_out "$_\n"; } close $fh_out; close $fh_in;
I also liked this one because short and effective.
foreach my $line(@input) { $line =~ tr/ /\t/s; }
The codes do their job but still there is a problem in the 5th column (after syntenic)since there is not always an entry. This is an example of my text file:
MGI:1918918 381629 0610007C +21Rik 5 syntenic 51374 + C2orf28 2p23.3 B MGI:1918917 71667 0610007L +01Rik 5 syntenic F 55069 + C7orf42 7q11.21 C MGI:1923501 76251 0610007P +08Rik 13 syntenic B3 375748 + C9orf102 9q22.32 M MGI:1915571 58520 0610007P +14Rik 12 syntenic 11161 + C14orf1 14q24.3 B
In the lines with no entry in the 5 th column there should be second TAB. I currently work with an if(...){then....} I hope it is clearer now, Thanks Martin

Replies are listed 'Best First'.
Re: Replacing spaces
by Corion (Patriarch) on Apr 07, 2009 at 11:56 UTC

    Is your data maybe not "space delimited" but "fixed width records"? Then, the usual approach is to use unpack to extract the interesting parts of the data. I'm very fond of BrowserUk's Re: Fixed Position Column Records to programmatically find the columns and the unpack template for me.

      It is space delimited and always with a diiferent amount of spaces.Martin

        Looking at your original data (and not the horribly munged copy'n'paste above), it's clear to me that the data you have is fixed width data. So just use unpack, possibly together with the tool I linked to to produce the column specification.

        Let me also suggest you take a more mechanical approach to looking at your data than "I know the data so I know that there is a value missing". You need to be more exact about why you know that there is a value missing, like because every time there is a value, it appears in the same position.

Re: Replacing spaces
by BrowserUk (Patriarch) on Apr 07, 2009 at 12:24 UTC

    As your fields are fixed width, you should unpack them using a suitable template and then join them to put the tabs in.

    In this case, the suitable template char would be 'A' (with an appropriate count), to truncate the trailing spaces in each of your left-justified fields.

    Something like this should do the trick:

    #! perl -sw use 5.010; use strict; while( <> ) { my @fields = unpack 'A31 A31 A26 A13 A9 A11 A31 A26 A26 A1', $_; say join "\t", @fields; } __END__ c:\test> thisScript.pl yourFile > theNewFile

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      OK, Thanks for the help- I have never worked with that pack or unpack function. I will have a deeper look into it an give it a try. Martin
        I will have a deeper look into it an give it a try.

        If the sample you posted is representative of your data, the script I posted above should do the job for you.

        That said, if you are regularly working with fixed record length files getting familiar with pack/unpack will save you heaps of time in the long run.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Replacing spaces
by Anonymous Monk on Apr 07, 2009 at 11:56 UTC
    How do you decide which is the missing column?
      I work a lot with these tables and know where the columns are, but until now always per hand, now I want to automate these things.
        No, how would your code decide, by what logic ... it was not clear to me your data was fixed width
Re: Replacing spaces
by swampyankee (Parson) on Apr 07, 2009 at 12:57 UTC

    Hmmm...if the missing column is always after "syntenic," you could try something akin to this:

    #/usr/bin/perl use strict; use warnings; while(<DATA>) { chomp; my @rec = split(/\s+/, $_); $rec[4] .= "\t" if(scalar(@rec) == 9); print join("\t", @rec)."\n"; } __DATA__ MGI:1918918 381629 0610007C21Rik 5 syntenic 51374 C2orf28 2p23.3 B MGI:1918917 71667 0610007L01Rik 5 syntenic F 55069 C7orf42 7q11.21 C MGI:1923501 76251 0610007P08Rik 13 syntenic B3 375748 C9orf102 9q22.32 + M MGI:1915571 58520 0610007P14Rik 12 syntenic 11161 C14orf1 14q24.3 B
    which is presuming that "syntenic" is element 4 (counting from 0) and records including something like "syntenic B3" split into 10, vs 9, columns. The if test just adds a tab to "syntenic" when the record splits into 9, vs 10, columns.


    Information about American English usage here and here. Floating point issues? Please read this before posting. — emc