rickoy has asked for the wisdom of the Perl Monks concerning the following question:

I have a string: 2 0 1 2 - 7 - 2 7 9 : 3 7 : 3 1 As you can see, each character in the string are separated by 1 space and the date and time are separated by 2 spaces. I would wish to reduce the spaces so that if there are 2 spaces in between characters, it will become 1 space and if there is 1 space in between characters, then it will be gone.

Replies are listed 'Best First'.
Re: Removing extra spaces
by davido (Cardinal) on Jul 31, 2012 at 02:26 UTC

    s/\s(\s?)/$1/g

    Match a single space, and optionally a second space. Capture that second space if it exists. Replace with the capture, which will be either nothing, or the second space.


    Dave

Re: Removing extra spaces
by Rudolf (Pilgrim) on Jul 31, 2012 at 01:56 UTC

    Being lazy, I would abuse the power of regex's and say:

    my $string = '2 0 1 2 - 7 - 2 7 9 : 3 7 : 3 1'; $string =~ s/ /x/g; $string =~ s/ //g; $string =~ s/x/ /g; print $string;

    just did it out in steps.. since you want to remove all the spaces I put a spot holder where all the double spaces are supposed to be, then later replaced the 'x' with ' '. perhaps give tr/// a look, that switches out sets but I'm not sure how to switch out spaces with it.

Re: Removing extra spaces
by johngg (Canon) on Jul 31, 2012 at 09:16 UTC

    You could use a negative look-ahead to replace any space that is not followed by a space with nothing. This will break down if there are more than two spaces though.

    knoppix@Microknoppix:~$ perl -E ' > $dateStr = q{ 2 0 1 2 - 7 - 2 7 9 : 3 7 : 3 1 }; > $dateStr =~ s{\s(?!\s)}{}g; > say $dateStr;' 2012-7-27 9:37:31 knoppix@Microknoppix:~$

    Cheers,

    JohnGG

Re: Removing extra spaces
by NetWallah (Canon) on Jul 31, 2012 at 01:28 UTC
    Try this regex:
    s/\s\s?(\S)/$1/g
    Update: See the correction below. Thanks Anonymonk and davido.

                 I hope life isn't a big joke, because I don't get it.
                       -SNL

      Close, just drop the \s? and it works.

      $ perl -E '$s="2 0 1 2 - 7 - 2 7 9 : 3 7 : 3 1"; $s =~ s/\s\s?(\S)/$1 +/g; say $s' 2012-7-279:37:31 $ perl -E '$s="2 0 1 2 - 7 - 2 7 9 : 3 7 : 3 1"; $s =~ s/\s(\S)/$1/g; + say $s' 2012-7-27 9:37:31
Re: Removing extra spaces
by Athanasius (Archbishop) on Jul 31, 2012 at 02:09 UTC

    Update: rickoy, welcome to the Monastery!

    The specification is a little unclear, but assuming you want to (a) remove all single spaces, and (b) squash all sequences of 2 or more spaces down to a single space:

    #! perl use strict; use warnings; my $string = ' 2 0 1 2 - 7 - 2 7 9 : 3 7 : 3 1 '; # NB: 2 spaces here ^^ # (a) Remove single spaces 1 while $string =~ s/(^|[^ ])[ ]([^ ]|$)/$1$2/g; # (b) Squash multiple spaces down to one $string =~ s/[ ]{2,}/ /g; print "'", $string, "'\n";

    Outputs:

    '2012-7-27 9:37:31'

    HTH,

    Athanasius <°(((><contra mundum

Re: Removing extra spaces
by GrandFather (Saint) on Aug 02, 2012 at 01:47 UTC

    Where did your string come from? Strangeness of that sort looks like 16 bit Unicode strings or some such imported in some odd fashion into Perl where the high 0 byte (for an ASCII character) has been replaced by a space. Maybe you would be better to get the conversion right if possible rather than try to fix it up later?

    True laziness is hard work
Re: Removing extra spaces
by harangzsolt33 (Deacon) on Aug 25, 2019 at 05:32 UTC
    I know, this question was asked more than 7 years ago, but I would
    like to post a sub that I wrote that does exactly what you want:

    sub CollapseWhitespace{@_ or return'';my$T=shift;defined$T or return'';my$L=length($T);$L or return'';my$c;my$N=0;my$P =0;my$U=1;for(my$i=0;$i<$L;$i++){$c=vec($T,$i,8);if($c<33){ $U=0;if($N++==1){vec($T,$P++,8)=32;}}else{$N=0;$U or vec($T ,$P,8)=$c;$P++;}}return$U?$T:substr($T,0,$P);}

    ^^ This looks a bit obfuscated, so here is a nicer expanded version:

    ############################################################## # # This function removes single instances of whitespace and # converts multiple adjacent whitespace characters to a single # space. In this function, "whitespace" is defined as a character # whose ASCII value is less than 33. (This includes many special # characters such as new line characters, nul, bel, etc.) # # Usage: STRING = CollapseWhitespace(STRING) # # Example: # CollapseWhitespace("\n\t abc 123 xxx\n") --> " abc123 xxx" # sub CollapseWhitespace { @_ or return ''; my $T = shift; defined $T or return ''; my $L = length($T); $L or return ''; my $c; my $N = 0; # consecutive whitespace counter my $P = 0; # target pointer to overwrite original str $T my $U = 1; # string length will be left unchanged for (my $i = 0; $i < $L; $i++) { $c = vec($T, $i, 8); if ($c < 33) { $U = 0; if ($N++ == 1) { vec($T, $P++, 8) = 32; } } else { $N = 0; $U or vec($T, $P, 8) = $c; $P++; } } return $U ? $T : substr($T, 0, $P); }

      A more concise alternative is:

      c:\@Work\Perl\monks>perl -wMstrict -le "use warnings; use strict; ;; use Test::More 'no_plan'; use Test::NoWarnings; ;; use Data::Dump qw(pp); ;; note qq{perl version: $]}; ;; my @TESTS = ( [ undef , qq{} ], [ qq{} , qq{} ], [ qq{ } , qq{} ], [ qq{\n} , qq{} ], [ qq{\n\t} , qq{ } ], [ qq{\n\t\x00} , qq{ } ], [ qq{\n\t \x00} , qq{ } ], [ qq{\n\t abc 123 xxx\n} , qq{ abc123 xxx} ], [ qq{\nabc 123\a\b\fxxx\n\t }, qq{abc123 xxx } ], [ qq{abc 123\n\r xxx} , qq{abc123 xxx} ], ); ;; note 'special case'; is CollapseWhitespace(), '', 'no arguments'; ;; note 'general cases'; VECTOR: for my $ar_vector (@TESTS) { if (not ref $ar_vector) { note $ar_vector; next VECTOR; } ;; my ($str, $expected) = @$ar_vector; ;; is CollapseWhitespace($str), $expected, pp($str) . ' -> ' . pp($expected) ; } ;; done_testing; ;; exit; ;; sub CollapseWhitespace { my $s = shift; return '' unless defined $s; $s =~ s{ [\x00-\x20]+ }{ $+[0] - $-[0] == 1 ? '' : ' ' }xmsge; return $s; } " # perl version: 5.008009 # special case ok 1 - no arguments # general cases ok 2 - undef -> "" ok 3 - "" -> "" ok 4 - " " -> "" ok 5 - "\n" -> "" ok 6 - "\n\t" -> " " ok 7 - "\n\t\0" -> " " ok 8 - "\n\t \0" -> " " ok 9 - "\n\t abc 123 xxx\n" -> " abc123 xxx" ok 10 - "\nabc 123\a\b\fxxx\n\t " -> "abc123 xxx " ok 11 - "abc 123\n\r xxx" -> "abc123 xxx" 1..11 ok 12 - no warnings 1..12
      If you have Perl version 5.14+, a slightly conciserer variation is:
      sub CollapseWhitespace { my $s = shift; return defined $s ? $s =~ s{ [\x00-\x20]+ }{ $+[0] - $-[0] == 1 ? '' : ' ' }xmsger : '' ; }
      See the  s///  /r modifier in perlop. I leave it to you to Benchmark whether the  s///e version is actually faster than the for-loop version.


      Give a man a fish:  <%-{-{-{-<