Replacing spaces

apachi15 has asked for the wisdom of the Perl Monks concerning the following question:

HI, I posted a question yesterday alraeady but I think I could not make my problems clear, that's why I am trying it again. Down below are examples which I tried and a few lines from the text I want to edit.The general problem is to edit a space delimited text in that way that it is easy to import the text file (in this case with 10 columns) in a database. For this reason I want to replace all spaces with a tab but since there is not in every column an entry there must be a tab again instead of the missing entry. Anyway I tried your suggestions, but still there are some problems with it. For example:

open my $fh_in , '<', 'test1.txt' or die "Can not open file test1.txt:
+ $!";
open my $fh_out, '>', 'test2.txt' or die "Can not open file test2.txt:
+ $!";
while (<$fh_in>) {
    s/\s+/\t/g;    
    print $fh_out "$_\n";
}
close $fh_out;
close $fh_in;
[download]

I also liked this one because short and effective.

foreach my $line(@input)
{
    $line =~ tr/ /\t/s;            
}
[download]

The codes do their job but still there is a problem in the 5th column (after syntenic)since there is not always an entry. This is an example of my text file:

MGI:1918918                    381629                         0610007C
+21Rik             5            syntenic            51374             
+             C2orf28                   2p23.3                    B
MGI:1918917                    71667                          0610007L
+01Rik             5            syntenic F          55069             
+             C7orf42                   7q11.21                   C
MGI:1923501                    76251                          0610007P
+08Rik             13           syntenic B3         375748            
+             C9orf102                  9q22.32                   M
MGI:1915571                    58520                          0610007P
+14Rik             12           syntenic            11161             
+             C14orf1                   14q24.3                   B
[download]

In the lines with no entry in the 5 th column there should be second TAB. I currently work with an if(...){then....} I hope it is clearer now, Thanks Martin

Comment on Replacing spaces Select or Download Code

Replies are listed 'Best First'.
Re: Replacing spaces by Corion (Patriarch) on Apr 07, 2009 at 11:56 UTC
Is your data maybe not "space delimited" but "fixed width records"? Then, the usual approach is to use unpack to extract the interesting parts of the data. I'm very fond of BrowserUk's Re: Fixed Position Column Records to programmatically find the columns and the unpack template for me.	[reply]
Re^2: Replacing spaces by apachi15 (Initiate) on Apr 07, 2009 at 12:13 UTC
It is space delimited and always with a diiferent amount of spaces.Martin	[reply]
Re^3: Replacing spaces by Corion (Patriarch) on Apr 07, 2009 at 12:20 UTC
Looking at your original data (and not the horribly munged copy'n'paste above), it's clear to me that the data you have is fixed width data. So just use unpack, possibly together with the tool I linked to to produce the column specification. Let me also suggest you take a more mechanical approach to looking at your data than "I know the data so I know that there is a value missing". You need to be more exact about why you know that there is a value missing, like because every time there is a value, it appears in the same position.	[reply]
Re: Replacing spaces by BrowserUk (Patriarch) on Apr 07, 2009 at 12:24 UTC
As your fields are fixed width, you should unpack them using a suitable template and then join them to put the tabs in. In this case, the suitable template char would be 'A' (with an appropriate count), to truncate the trailing spaces in each of your left-justified fields. Something like this should do the trick: `#! perl -sw use 5.010; use strict; while( <> ) { my @fields = unpack 'A31 A31 A26 A13 A9 A11 A31 A26 A26 A1', $_; say join "\t", @fields; } __END__ c:\test> thisScript.pl yourFile > theNewFile` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re^2: Replacing spaces by apachi15 (Initiate) on Apr 07, 2009 at 12:47 UTC
OK, Thanks for the help- I have never worked with that pack or unpack function. I will have a deeper look into it an give it a try. Martin	[reply]
Re^3: Replacing spaces by BrowserUk (Patriarch) on Apr 07, 2009 at 13:37 UTC
I will have a deeper look into it an give it a try. If the sample you posted is representative of your data, the script I posted above should do the job for you. That said, if you are regularly working with fixed record length files getting familiar with pack/unpack will save you heaps of time in the long run. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]
Re^4: Replacing spaces by apachi15 (Initiate) on Apr 08, 2009 at 09:14 UTC
Re^5: Replacing spaces by BrowserUk (Patriarch) on Apr 08, 2009 at 10:11 UTC
Re: Replacing spaces by Anonymous Monk on Apr 07, 2009 at 11:56 UTC
How do you decide which is the missing column?	[reply]
Re^2: Replacing spaces by apachi15 (Initiate) on Apr 07, 2009 at 12:16 UTC
I work a lot with these tables and know where the columns are, but until now always per hand, now I want to automate these things.	[reply]
Re^3: Replacing spaces by Anonymous Monk on Apr 07, 2009 at 12:57 UTC
No, how would your code decide, by what logic ... it was not clear to me your data was fixed width	[reply]
Re: Replacing spaces by swampyankee (Parson) on Apr 07, 2009 at 12:57 UTC
Hmmm...if the missing column is always after "syntenic," you could try something akin to this: `#/usr/bin/perl use strict; use warnings; while(<DATA>) { chomp; my @rec = split(/\s+/, $_); $rec[4] .= "\t" if(scalar(@rec) == 9); print join("\t", @rec)."\n"; } __DATA__ MGI:1918918 381629 0610007C21Rik 5 syntenic 51374 C2orf28 2p23.3 B MGI:1918917 71667 0610007L01Rik 5 syntenic F 55069 C7orf42 7q11.21 C MGI:1923501 76251 0610007P08Rik 13 syntenic B3 375748 C9orf102 9q22.32 + M MGI:1915571 58520 0610007P14Rik 12 syntenic 11161 C14orf1 14q24.3 B` [download] which is presuming that "syntenic" is element 4 (counting from 0) and records including something like `"syntenic B3"` split into 10, vs 9, columns. The if test just adds a tab to "syntenic" when the record splits into 9, vs 10, columns. Information about American English usage here and here. Floating point issues? Please read this before posting. — emc	[reply] [d/l]