Re: splitting headache
by Ido (Hermit) on Feb 26, 2002 at 13:47 UTC
|
print "$_\n" for "'Pugh.Pugh'.Barney.McGrew.Cuthbert.Dibble.Grub"=~/('
+.+?'|".+?"|[^\.]+).?/g
Update:
Or maybe
/('(?:[^'\\]*|\\.)+'|"(?:[^"\\]*|\\.)+"|[^\.]+).?/g
| [reply] [d/l] [select] |
|
|
Thanks a lot for the answer -- this is perfect. The first regex works just fine for me and seems to handle nested "''" and '""' ok too, which is an advantage.
You seem like the regex king; the 2nd regex looks very impressive -- erm what does it do?
| [reply] |
|
|
The second regex is for escaping quotes with '. Like: 'Blah.Blah\'blah'. I'm not sure about it tho...
| [reply] |
Re: splitting headache
by IlyaM (Parson) on Feb 26, 2002 at 13:48 UTC
|
my $data = "'Pugh.Pugh'.Barney.McGrew.Cuthbert.Dibble.Grub";
use Text::CSV_XS;
my $csv = Text::CSV_XS->new({ sep_char => '.', quote_char => "'" });
$csv->parse($data) or die "Cannot parse data";
my @fields = $csv->fields;
--
Ilya Martynov
(http://martynov.org/)
| [reply] [d/l] |
|
|
Thanks for the answer. This also works great and, although not as fast as the regex method, it is probably easier for regex ludites like myself to get to grips with.
Thanks again.
| [reply] |
•Re: splitting headache
by merlyn (Sage) on Feb 26, 2002 at 14:59 UTC
|
My rule of thumb is that whenever it is easier to talk about what you want
to keep than what you want to throw away, use m//g instead of split.
You want to keep the contents of a quoted string, or a non-dot string. So say it
that way:
$_ = "'Pugh.Pugh'.Barney.McGrew.Cuthbert.Dibble.Grub";
my @keepers = grep defined $_, /'(.*?)'|([^.]+)/g;
print map "<$_>\n", @keepers;
The grep defined is in there because on every hit, we'll get $1 as the quoted
string but $2 undef, or $2 as the non-dotted string but $1 undef, and all we have
to do is toss the undefs to get the final result.
-- Randal L. Schwartz, Perl hacker | [reply] [d/l] |
Re: splitting headache
by strat (Canon) on Feb 26, 2002 at 13:40 UTC
|
I think you might better do so with a pattern-matching, e.g.
my @result = $string =~ /^([^\.]\.[^\.])(:?\.([^\.]))*$/;
or something the like.
Or use split and some after-working:
my ($firstPart, @result) = split(/\./, $string);
$result[0] = $firstPart . "." . $result[0];
Best regards,
perl -le "s==*F=e=>y~\*martinF~stronat~=>s~[^\w]~~g=>chop,print" | [reply] [d/l] [select] |
|
|
Thanks for the reply. I should have explained that my "Pugh.Pugh.Barney.McGrew.Cuthbert.Dibble.Grub" string was just a typical string that might occur. The 'word.groupings' would actually come anywhere, not just at the start. Thanks again though.
| [reply] |
Re: splitting headache
by PrakashK (Pilgrim) on Feb 26, 2002 at 18:14 UTC
|
No need for an external module. Just use Text::ParseWords module (part of perl standard distribution):
use Text::ParseWords;
my $line = "'Pugh.Pugh'.Barney.McGrew.Cuthbert.Dibble.Grub";
my @words = quotewords('\.', 0, $line);
/prakash | [reply] [d/l] |
Re: splitting headache
by simon.proctor (Vicar) on Feb 26, 2002 at 13:59 UTC
|
Well I played with it for a while and this is the closest I could get. Personally I would recommend that you split on the '.' and then iterate over the current list keeping a track of what you have seen already.
print "$_\n" foreach (split(m/\G(([^\.]+)\.\2)|(?:(?!\2\.)\.)/g, "Pugh
+.Pugh.Barney.McGrew.Cuthbert.Dibble.Grub"));
There's probably some stuff in there I don't need but I'm not a regex master by any means :) | [reply] [d/l] |
|
|
Thanks for the reply. I guess I should have explained that my "Pugh.Pugh.Barney.McGrew.Cuthbert.Dibble.Grub" string was just an example of a typical string that might occur. The 'word.groupings' could actually come anywhere, not just at the start, so this method won't work for me. Thanks again though.
| [reply] |
Re: splitting headache
by Caillte (Friar) on Feb 26, 2002 at 14:01 UTC
|
$line = "Pugh.Pugh.Barney.McGrew.Cuthbert.Dibble.Grub";
$line =~ s/\./\n/g;
$line =~ s/([^\n]*)\n/$1./;
print $line;
This page is intentionally left justified. | [reply] [d/l] |
|
|
Thanks for the reply. I guess I should have explained that my "Pugh.Pugh.Barney.McGrew.Cuthbert.Dibble.Grub" string was just an example of a typical string that might occur. The 'word.groupings' could actually come anywhere, not just at the start, so this method also won't work for me. Thanks again though.
| [reply] |
Re: splitting headache
by strat (Canon) on Feb 26, 2002 at 16:00 UTC
|
Some time ago, I've written the following code:
#!perl -w
use strict;
my $file = "anyfile.txt";
my $sep = ';';
unless (open (CSV, $file)){
die "Error: $!\n";
}
else {
while (<CSV>){
next if $. == 1; # kill headline: dirty
my @list = &ExtractFields($_, $sep);
print join ":_:", @list;
} # while
close (CSV);
} # else
# ------------------------------------------------------------
sub ExtractFields {
my ($string, $sep) = @_;
my @csv = &FilterIndexList($string, $sep);
my $start = 0;
my @list = ();
foreach my $j (@csv){
my $end = $j-1;
# print "$start-$end ";
push (@list, substr($string, $start, $end-$start+1));
$start = $j+1;
} # foreach
# filter leading and trailing "
foreach (@list){ s/^\"(.*)\"$/$1/; }
# print join("(_|_)", @list);
return (@list);
} # ExtractFields
# ------------------------------------------------------------
sub FilterIndexList {
my ($string, $sep) = @_;
my @sep = &GetIndexList($string, $sep);
my @hc = &GetIndexList($string, '"');
# try to find connected " and remove
# the positions within from @sep
my $i = 0;
foreach (;;){
my ($start) = grep {$_ == $hc[$i]-1 } @sep;
if ($start){
$i++;
my ($end) = grep {$_ == $hc[$i]+1 } @sep;
if ($end){
# print "found at $start-$end: $hc[$i]\n";
# kill positions in @sep within $start and $end
@sep = grep { $_ <= $start or $_ >= $end } @sep;
$i++;
}
else { # invalid end; throw away end and start over again
splice(@hc, $i, 1);
$i--;
}
}
else { # invalid begin; throw away start
splice(@hc, $i, 1);
}
last if $i > $#hc; # exit loop if no more " to test
}
return (@sep);
} # FilterIndexList
# ------------------------------------------------------------
# Return a list of incices of positions of $sep in $string
sub GetIndexList {
my ($string, $subStr) = @_;
my @list = ();
my $pos = -1; # startposition
while (1){
# search for next $subStr
$pos = index($string, $subStr, $pos+1);
# if startposition again or not found, return
last if $pos == -1;
# else push found position onto list
push (@list, $pos);
}
return (@list);
} # GetIndexList
# ------------------------------------------------------------
It's not the best piece of code I've ever written, and I'm not sure if it works in all cases, but maybe it could help you... But it's a lot of code just for nearly nothing :-)
Best regards,
perl -le "s==*F=e=>y~\*martinF~stronat~=>s~[^\w]~~g=>chop,print" | [reply] [d/l] |
Re: splitting headache
by Rich36 (Chaplain) on Feb 26, 2002 at 14:18 UTC
|
To use split, you can reverse the string with the names, only split enough times (6) so that the last string contains the '.', reverse the order of the elements in the foreach loop, then (un)reverse the string when you print it.
my @names = split(/\./, reverse("Pugh.Pugh.Barney.McGrew.Cuthbert.Dibb
+le.Grub"), 6);
foreach(reverse(@names)) {print reverse . "\n";}
__RESULT__
Pugh.Pugh
Barney
McGrew
Cuthbert
Dibble
Grub
Rich36
There's more than one way to screw it up...
| [reply] [d/l] |
|
|
Thanks for the reply. I guess I should have explained that my "Pugh.Pugh.Barney.McGrew.Cuthbert.Dibble.Grub" string was just an example of a typical string that might occur. The 'word.groupings' could actually come anywhere, not just at the start, so this method won't work for me too. Thanks again though.
| [reply] |
Re: splitting headache
by Gyro (Monk) on Feb 27, 2002 at 20:37 UTC
|
Hey Wibble,
In the following example I created another duplicate pair for show. The code looks at it as if the duplicates are not together, in otherwords I am assuming this could happen.
my $string = (join(".", sort split(/\./, "Pugh.Barney.Pugh.Barney.McGr
+ew.Cuthbert.Dibble.Grub")));
my @array = ($string =~ m/(\S+)\.\1/g); # Capture duplicate
$string =~ s/(\S+)\.\1//g; # Eliminate duplicates
$string =~ s/\./\n/g; # Change \. to \n
foreach (@array) { # Print duplicates
print "\n$_.$_";
}
print "$string"; # Print the rest of the list
If you need to keep the string together you can use .= in the foreach loop and add ' around the dups as well.
The following will add single quotes around the dups.
$string =~ s/(\S+)\.\1/'$1.$1'/g; # Add quotes
and plugged into merlyn's reply
$_ = join(".", sort split(/\./, "Pugh.Barney.Pugh.Barney.McGrew.Cuthbe
+rt.Dibble.Grub"));
$_ =~ s/(\S+)\.\1/'$1.$1'/g; # Add quotes
my @keepers = grep defined $_, /'(.*?)'|([^.]+)/g;
print map "$_\n", @keepers;
Gyro | [reply] [d/l] [select] |