Removing extra spaces

rickoy has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Removing extra spaces by davido (Cardinal) on Jul 31, 2012 at 02:26 UTC
`s/\s(\s?)/$1/g` [download] Match a single space, and optionally a second space. Capture that second space if it exists. Replace with the capture, which will be either nothing, or the second space. Dave	[reply] [d/l]
Re: Removing extra spaces by Rudolf (Pilgrim) on Jul 31, 2012 at 01:56 UTC
Being lazy, I would abuse the power of regex's and say: `my $string = '2 0 1 2 - 7 - 2 7 9 : 3 7 : 3 1'; $string =~ s/ /x/g; $string =~ s/ //g; $string =~ s/x/ /g; print $string;` [download] just did it out in steps.. since you want to remove all the spaces I put a spot holder where all the double spaces are supposed to be, then later replaced the 'x' with ' '. perhaps give tr/// a look, that switches out sets but I'm not sure how to switch out spaces with it.	[reply] [d/l]
Re: Removing extra spaces by johngg (Canon) on Jul 31, 2012 at 09:16 UTC
You could use a negative look-ahead to replace any space that is not followed by a space with nothing. This will break down if there are more than two spaces though. `knoppix@Microknoppix:~$ perl -E ' > $dateStr = q{ 2 0 1 2 - 7 - 2 7 9 : 3 7 : 3 1 }; > $dateStr =~ s{\s(?!\s)}{}g; > say $dateStr;' 2012-7-27 9:37:31 knoppix@Microknoppix:~$` [download] Cheers, JohnGG	[reply] [d/l]
Re: Removing extra spaces by NetWallah (Canon) on Jul 31, 2012 at 01:28 UTC
Try this regex: `s/\s\s?(\S)/$1/g` [download] Update: See the correction below. Thanks Anonymonk and davido. I hope life isn't a big joke, because I don't get it. -SNL	[reply] [d/l]
Re^2: Removing extra spaces by Anonymous Monk on Jul 31, 2012 at 06:42 UTC
Close, just drop the `\s?` and it works. `$ perl -E '$s="2 0 1 2 - 7 - 2 7 9 : 3 7 : 3 1"; $s =~ s/\s\s?(\S)/$1 +/g; say $s' 2012-7-279:37:31 $ perl -E '$s="2 0 1 2 - 7 - 2 7 9 : 3 7 : 3 1"; $s =~ s/\s(\S)/$1/g; + say $s' 2012-7-27 9:37:31` [download]	[reply] [d/l] [select]
Re: Removing extra spaces by Athanasius (Archbishop) on Jul 31, 2012 at 02:09 UTC
Update: rickoy, welcome to the Monastery! The specification is a little unclear, but assuming you want to (a) remove all single spaces, and (b) squash all sequences of 2 or more spaces down to a single space: `#! perl use strict; use warnings; my $string = ' 2 0 1 2 - 7 - 2 7 9 : 3 7 : 3 1 '; # NB: 2 spaces here ^^ # (a) Remove single spaces 1 while $string =~ s/(^\|[^ ])[ ]([^ ]\|$)/$1$2/g; # (b) Squash multiple spaces down to one $string =~ s/[ ]{2,}/ /g; print "'", $string, "'\n";` [download] Outputs: `'2012-7-27 9:37:31'` [download] HTH, Athanasius <°(((>< contra mundum	[reply] [d/l] [select]
Re: Removing extra spaces by GrandFather (Saint) on Aug 02, 2012 at 01:47 UTC
Where did your string come from? Strangeness of that sort looks like 16 bit Unicode strings or some such imported in some odd fashion into Perl where the high 0 byte (for an ASCII character) has been replaced by a space. Maybe you would be better to get the conversion right if possible rather than try to fix it up later? True laziness is hard work	[reply]
Re: Removing extra spaces by harangzsolt33 (Deacon) on Aug 25, 2019 at 05:32 UTC
I know, this question was asked more than 7 years ago, but I would like to post a sub that I wrote that does exactly what you want: `sub CollapseWhitespace{@_ or return'';my$T=shift;defined$T or return'';my$L=length($T);$L or return'';my$c;my$N=0;my$P =0;my$U=1;for(my$i=0;$i<$L;$i++){$c=vec($T,$i,8);if($c<33){ $U=0;if($N++==1){vec($T,$P++,8)=32;}}else{$N=0;$U or vec($T ,$P,8)=$c;$P++;}}return$U?$T:substr($T,0,$P);}` [download] ^^ This looks a bit obfuscated, so here is a nicer expanded version: ############################################################## # # This function removes single instances of whitespace and # converts multiple adjacent whitespace characters to a single # space. In this function, "whitespace" is defined as a character # whose ASCII value is less than 33. (This includes many special # characters such as new line characters, nul, bel, etc.) # # Usage: STRING = CollapseWhitespace(STRING) # # Example: # CollapseWhitespace("\n\t abc 123 xxx\n") --> " abc123 xxx" # sub CollapseWhitespace { @_ or return ''; my $T = shift; defined $T or return ''; my $L = length($T); $L or return ''; my $c; my $N = 0; # consecutive whitespace counter my $P = 0; # target pointer to overwrite original str $T my $U = 1; # string length will be left unchanged for (my $i = 0; $i < $L; $i++) { $c = vec($T, $i, 8); if ($c < 33) { $U = 0; if ($N++ == 1) { vec($T, $P++, 8) = 32; } } else { $N = 0; $U or vec($T, $P, 8) = $c; $P++; } } return $U ? $T : substr($T, 0, $P); } [download]	[reply] [d/l] [select]
Re^2: Removing extra spaces by AnomalousMonk (Archbishop) on Aug 25, 2019 at 09:32 UTC
A more concise alternative is: c:\@Work\Perl\monks>perl -wMstrict -le "use warnings; use strict; ;; use Test::More 'no_plan'; use Test::NoWarnings; ;; use Data::Dump qw(pp); ;; note qq{perl version: $]}; ;; my @TESTS = ( [ undef , qq{} ], [ qq{} , qq{} ], [ qq{ } , qq{} ], [ qq{\n} , qq{} ], [ qq{\n\t} , qq{ } ], [ qq{\n\t\x00} , qq{ } ], [ qq{\n\t \x00} , qq{ } ], [ qq{\n\t abc 123 xxx\n} , qq{ abc123 xxx} ], [ qq{\nabc 123\a\b\fxxx\n\t }, qq{abc123 xxx } ], [ qq{abc 123\n\r xxx} , qq{abc123 xxx} ], ); ;; note 'special case'; is CollapseWhitespace(), '', 'no arguments'; ;; note 'general cases'; VECTOR: for my $ar_vector (@TESTS) { if (not ref $ar_vector) { note $ar_vector; next VECTOR; } ;; my ($str, $expected) = @$ar_vector; ;; is CollapseWhitespace($str), $expected, pp($str) . ' -> ' . pp($expected) ; } ;; done_testing; ;; exit; ;; sub CollapseWhitespace { my $s = shift; return '' unless defined $s; $s =~ s{ [\x00-\x20]+ }{ $+[0] - $-[0] == 1 ? '' : ' ' }xmsge; return $s; } " # perl version: 5.008009 # special case ok 1 - no arguments # general cases ok 2 - undef -> "" ok 3 - "" -> "" ok 4 - " " -> "" ok 5 - "\n" -> "" ok 6 - "\n\t" -> " " ok 7 - "\n\t\0" -> " " ok 8 - "\n\t \0" -> " " ok 9 - "\n\t abc 123 xxx\n" -> " abc123 xxx" ok 10 - "\nabc 123\a\b\fxxx\n\t " -> "abc123 xxx " ok 11 - "abc 123\n\r xxx" -> "abc123 xxx" 1..11 ok 12 - no warnings 1..12 [download] If you have Perl version 5.14+, a slightly conciserer variation is: `sub CollapseWhitespace { my $s = shift; return defined $s ? $s =~ s{ [\x00-\x20]+ }{ $+[0] - $-[0] == 1 ? '' : ' ' }xmsger : '' ; }` [download] See the `s///` `/r` modifier in perlop. I leave it to you to Benchmark whether the `s///e` version is actually faster than the `for`-loop version. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]