I have a code snippet which works as desired but it does not look elegant to me. Can someone please suggest a simpler and shorter way of doing it? I am not bothered about performance as this is a one time task on a relatively small data set.
Requirement:
Original String => converted String
j k l foobar => jkl foobar
j k lm foobar => jk lm foobar
jk l foobar => jk l foobar
foobar j k l => foobar j k l
Basically what I am trying to do is strings that have the format of alphabet followed by space (multiple occurrences of this pattern) followed by an optional string should be converted to the format where in the group of alphabets at the begining should be stringified. If the alphabet pattern group occurs at the end this should not happen.
Here is my code snippet:
use strict;
use warnings;
my @str = ("j k l foobar", "foobar", "jkl foobar", "1 2 3",
+ "jk l foobar", "foobar j k l", "foobar j kl", " ", " ", "j
+ jk foobar", "j k jk foobar", "j k l");
my @sanitisedNames = ();
for(@str) {
$_ =~ s/\s+/ /g;
if ($_ =~ /^\s$/) {
next;
}
my $boundary = &sanitise($_);
my $sanitisedName;
if ($boundary == 0) {
$sanitisedName = $_;
} elsif ($boundary == length($_)) {
$_ =~ s/\s+//g;
$sanitisedName = $_;
} else {
my $firstPart = substr($_, 0, $boundary);
$firstPart =~ s/\s+//g;
my $secondPart = substr($_, $boundary);
$sanitisedName = $firstPart.' '.$secondPart;
}
push(@sanitisedNames, $sanitisedName);
}
print $_, "\n" for (@sanitisedNames);
sub sanitise {
my $str = shift;
my @chars = split('', $str);
my $count = 0;
my $len = length($str);
while ($count < $len) {
if ($chars[$count++] ne ' ' && $count < $len && $chars[$count+
++] eq ' ') {
} else {
if ($count == $len) {
return $len;
}
if ($count > 3) {
$count = $count - 3;
return $count;
} else {
return 0;
}
}
}
}
Also can this be done in a single line with a regex? I could not come up with one. So I am coming to the abode of the monks for wisdom :).
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.