dbrock has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I have a text file that has become double spaced. Whay would be the best way to remove the doulble char spacing globally in the text file...
------example of text file UTF-16------
D r i v e a n d m e d i a i n f o r m a t i o n f r o m m e +d i a m o u n t : R o b o t i c L i b r a r y N a m e : C O M P A Q 1 D r i v e N a m e : C O M P A Q 1 S l o t : 1 M e d i a L a b e l : D S W 0 0 0 M e d i a G U I D : { 4 3 1 B 0 3 D E - 1 C 4 9 - 1 1 D 4 - B 2 1 + C - 0 0 5 0 8 B C A 3 A 6 8 } O v e r w r i t e P r o t e c t e d U n t i l : 1 / 3 0 / 2 0 0 + 5 3 : 1 4 : 4 1 A M A p p e n d a b l e U n t i l : 1 2 / 3 1 / 9 9 9 9 1 2 : 0 0 : + 0 0 A M T a r g e t e d M e d i a S e t N a m e : D a i l y
I have been attampting to use something like this:  s/\s\w//g;

Replies are listed 'Best First'.
Re: Remove Double Spacing
by theorbtwo (Prior) on Jan 28, 2005 at 19:07 UTC

    Are you sure those are really spaces, and not null bytes? (Try asking Data::Dumper to dump it.)

    If those are null bytes, the file is in UTF-16, and you're reading it as if it were in latin-1. If you're using perl 5.8, you can simply tell Perl that the file is in utf16: binmode($fh, ':encoding(utf-16)'). If you're under 5.6, it's a hair more complicated -- you need to first read in the data, then use Encode to "decode" it from utf-16 to perl's internal format. $useable=decode('utf-16', $encoded);.


    Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

Re: Remove Double Spacing
by dragonchild (Archbishop) on Jan 28, 2005 at 19:05 UTC
    s/\s(?=\S)//gms; should work.

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Re: Remove Double Spacing
by BUU (Prior) on Jan 28, 2005 at 19:04 UTC
    s/(?<=\w) (?=\w)//g; s/\n\n/\n/g; s/ / /g;
    Seems to work on the sample.
      This: s/(?<=\w) (?=\w)//g; should probably be s/(?<=\S) (?=\S)//g; since there are other characters besides alphanumeric.
Re: Remove Double Spacing
by nobull (Friar) on Jan 29, 2005 at 10:08 UTC
    Are you sure it's really double-spaced not UTF16?

    To open UTF16 in Perl5.8

    use Enocde; open my $fh, '<:encoding(utf16)', 'somefile' or die $!;
    Update: Sorry, I'd missed the fact that theorbtwo had already given a better answer along the same lines.
      The origional Document is UTF16 I started with a UTF16 .XML file I'm trying to Strip out all of the XML tags and parse the text using regex expressions... DBrock...
Re: Remove Double Spacing
by ambrus (Abbot) on Jan 28, 2005 at 21:26 UTC

    perl -wpe 's/(.)\s/$1/g' infile > outfile

    Update: that doesn't quite work, sorry.