Re: REGEX detailed character replace
by CountZero (Bishop) on Nov 11, 2008 at 21:40 UTC
|
Although not in one regex, this works:
use strict;
my $string = '|xx-xxx-xxxxx-xxx x/xx|xx-xxxx-xxx-xx-xxxx-xx-xx|';
print "$string\n";
$string =~ s/\|//; # gets rid of the first pipe
$string =~ s/\|/,/g; # replaces all other pipes by commas
print "$string\n";
my @chunks = split '-', $string;
$string = join( '-', @chunks[ 0 .. 3 ] ) . ','
. join( ',', @chunks[ 4 .. $#chunks ] );
print "$string\n";
giving:|xx-xxx-xxxxx-xxx x/xx|xx-xxxx-xxx-xx-xxxx-xx-xx|
xx-xxx-xxxxx-xxx x/xx,xx-xxxx-xxx-xx-xxxx-xx-xx,
xx-xxx-xxxxx-xxx x/xx,xx,xxxx,xxx,xx,xxxx,xx,xx,
Please note that it appears you also replace a space by a comma in your example.
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James
| [reply] [d/l] [select] |
Re: REGEX detailed character replace
by ikegami (Patriarch) on Nov 12, 2008 at 05:49 UTC
|
my $c = 0; s/-/$c++ < 3 ? '-' : ','/eg;
or
local our $c = 0; s/-(?(?{ $c++ < 3 })(?!))/,/g;
or
# As is, assumes at least 3 commas are found.
my ($pre, $post) = /^((?:[^-]-){3})(.*)/s;
$post =~ s/-/,/g;
$_ = "$pre$post";
or
# As is, assumes at least 4 commas are found.
my @parts = split(/-/, $_, -1);
$_ = join(',', join('-', @parts[0..3]), @parts[4..$#parts]);
is there a way to do the first question along with removing white spaces, replacing pipes with commas (except first) all in one line
Yup
s/\|//; s/[|\s]+//g; my $c = 0; s/-/$c++ < 3 ? '-' : ','/eg;
s/\|//; s/\|/,/g; s/\s+//g; my $c = 0; s/-/$c++ < 3 ? '-' : ','/eg;
Update: Fixed rushed code as per reply.
| [reply] [d/l] [select] |
|
|
s/\|//; s/\s+//g; my $c = 0; s/-/$c++ < 3 ? '-' : ','/eg;#Here pipe is not replaced by comma..Its just removing all pipes
s/^\|//; s/\s+//g; s/\|/\,/g my $c = 0; s/-/$c++ < 3 ? '-' : ','/eg;
This should work fine
| [reply] |
|
|
We can also shrink further :)
s/\|/$a++ < 1 ? '':','/eg;s/\s+//g;s/-/$c++ < 3 ? '-' : ','/eg;
| [reply] |
|
|
Re: REGEX detailed character replace
by ww (Archbishop) on Nov 11, 2008 at 22:08 UTC
|
Consider (and, answer too, please) this question.
- Are there guaranteed to be exactly three dashes before the first space?
If so, you may immediately see a regex solution: have your first capture contain everything up to the first space.
And since you say you know how to deal with your other desired replacements, the answer to question two is embedded in this node's question, above.
And, just BTW, please use code tags ( <c>...</c> ) around data like that you've shown. See Writeup Formatting Tips and Markup in the Monastery.
| [reply] |
Re: REGEX detailed character replace
by marcussen (Pilgrim) on Nov 12, 2008 at 05:27 UTC
|
s/|(xx)-(xxx)-(xxxxx)-(xxx x/xx)|(xx)-(xxxx)-(xxx)-(xx)-(xxxx)-(xx)-(xx)|/\1-\2-\3-\4,\5,\6etc;
Confucius says kill mosquito unless cannon
| [reply] [d/l] |
|
|
Well, though this is rather distant from a general solution, it does answer the OP's stated question so no downvote even though the regex is scarcely more than an identity.
It could be generalized a bit:
$string =~ s/\|(\S+)\s([^-|]+)[-|](x+)[-|](x+)[-|](x+)[-|](x+)[-|](x+)[-|](x+)[-|](x+).*/$1,$2,$3,$4,$5,$6,$7,$8,$9,/;
But that's still far from general, relying on "x" rather than [A-Za-z0-9]+ -- which I suspect comes closer (but still clumsily) to OP's "real" question." See, for example, ikegami's reply.
Update: Linkified to clarify my "this"
| [reply] [d/l] [select] |
|
|
Granted, I should probably have been more helpful, for instance Anonymous Monk does this, but I have a tendency to get lazy/sloppy/careless when giving out fish.
Confucius says kill mosquito unless cannon
| [reply] |
Re: REGEX detailed character replace
by Anonymous Monk on Nov 12, 2008 at 15:05 UTC
|
Learn some regex-fu. :)
Alright, well, let's take what you have:
|xx-xxx-xxxxx-xxx x/xx|xx-xxxx-xxx-xx-xxxx-xx-xx|
Now re-write that in a way that we preserve what you want to keep:
|(xx)-(xxx)-(xxxxx)-(xxx) (x/xx)|(xx)-(xxxx)-(xxx)-(xx)-(xxxx)-(xx)-(x
+x)|
Make that a bit more flexible:
|(\w{2})-(\w{3})-(\w{5})-(\w{3}) (\w/\w{2})|(\w{2})-(\w{4})-(\w{3})-(\
+w{2})-(\w{4})-(\w{2})-(\w{2})|
Now you'll match all of the interesting bits, so let's set up the inline replace:
$mess =~ s/|(\w{2})-(\w{3})-(\w{5})-(\w{3}) (\w\/\w{2})|(\w{2})-(\w{4}
+)-(\w{3})-(\w{2})-(\w{4})-(\w{2})-(\w{2})|/\1,\2,\3,\4\5,\6,\7,\8,\9,
+\10,\11,\12/;
| [reply] [d/l] [select] |
|
|
Thank you everyone for your help! I had looked all over and couldn't find a solution and you all have provided me with multiple ways to go about what I need which is fantastic.
I really appreciate the input.
| [reply] |
|
|
What these solutions seem to be missing is the OP wants to achieve a transformation, rather than capturing the individual 'x' groupings. More specifically, part of the string will be preserved in a known manner, while the rest is transformed.
In addition, I assume the 'x' characters are stand-ins for real alphanumeric data. So, now I think we can generalize the regex a lot via these rules:
- Skip a pipe at the beginning of the line.
- Grab a) alphanumerics and dashes up to third occurance of dash, b) the rest.
- In "the rest", replace pipes and dashes with commas.
- OP says "remove spaces", but his example shows the space being replaced by a comma, so do that.
$s='|xx-xxx-xxxxx-xxx x/xx|xx-xxxx-xxx-xx-xxxx-xx-xx|';
# in one line, if not one command
($a, $b)=$s=~/^\|?(\w+-\w+-\w+-)(.+)/; $b=~s/[-| ]/,/g; $s=$a . $b;
print $s
# xx-xxx-xxxxx-xxx,x/xx,xx,xxxx,xxx,xx,xxxx,xx,xx,
| [reply] [d/l] |
Re: REGEX detailed character replace
by DrHyde (Prior) on Nov 12, 2008 at 10:52 UTC
|
You seem to have forgotten to show us what you've tried so far. Without that, I can't help you do your work, I can only do your work for you, and you ain't paying me enough for that. | [reply] |
Re: REGEX detailed character replace
by eye (Chaplain) on Nov 16, 2008 at 04:51 UTC
|
> This is what I have:
> |xx-xxx-xxxxx-xxx x/xx|xx-xxxx-xxx-xx-xxxx-xx-xx|
> This is what I want:
> xx-xxx-xxxxx-xxx,x/xx,xx,xxxx,xxx,xx,xxxx,xx,xx,
Here's a slightly different approach to the problem. Note that the treatment of the space is different in the example and the specification.
Since all the exceptions are at the beginning of the string, we can handle the removal of the leading pipe and then sequester the first part of the string. The remainder can be handled with the "tr" command.
Advantages of this approach:
- the regex is simple
- the regex only assumes that the mysterious 'x's are not whitespace
- "tr" is potentially better than "s" for character replacement
#!/usr/bin/perl
use strict;
my $string = '|xx-xxx-xxxxx-xxx x/xx|xx-xxxx-xxx-xx-xxxx-xx-xx|';
print "$string\n";
# remove pipe, protect everything before the space
$string =~ s{^[|](\S+)}{};
my $result = $1;
# translate pipes, spaces, and hyphens to commas
$string =~ tr[| -][,];
$result .= $string;
print "$result\n";
| [reply] [d/l] [select] |