Re: split on spaces, except those within quotes?
by Kanji (Parson) on Nov 12, 2002 at 02:31 UTC
|
No doubt a fancy regex will do the trick, but simpler in my mind would be Text::ParseWords' parse_line()...
my @chunks = parse_line(' ', 0, $line);
One thing to note, however, is parse_line() makes no distinction between single and double quotes, which may or may not work for you.
--k.
| [reply] [d/l] [select] |
Re: split on spaces, except those within quotes?
by cLive ;-) (Prior) on Nov 12, 2002 at 02:40 UTC
|
I thought there must be a quick solution, but I could only think of this:
#!/usr/bin/perl -w
use strict;
my $string = "a 'b c d' e f 'g h'";
my $tmp='';
my @result = ();
for (split /\s+/, $string) {
if (/^'/) {
$tmp = $_; next;
}
elsif (/'$/) {
push @result, $tmp." $_"; $tmp=''; next;
}
elsif($tmp) {
$tmp.=" $_";
}
else {
push @result,$_;
}
}
print join "\n", @result;
Of course, you could use the DBD::CSV module, setting the record delimiter to 'space' and the text quantifier to 'single quote' (spelled for clarity, not for actual use :)...
But my guess is that would be slower than a purpose designed parser for this very specific case.
.02
cLive ;-) | [reply] [d/l] |
Re: split on spaces, except those within quotes?
by BrowserUk (Patriarch) on Nov 12, 2002 at 03:27 UTC
|
#! perl -sw
use strict;
sub tokenize ($) {
local $_ = shift;
s/(('|").*?\2)/ ($£ = $1) =~ s!\s+!\cA!g; $£ /ge; #!"
grep{s/\cA/ /g, $_}split/\s+/;
}
my @bits = tokenize q/a "b c d" e f 'g h' ijk "l m n " op 'q r s
+t' u'v w'x yz/;
local $,='|';
print @bits,$/;
__END__
c:\test>212174
a|"b c d"|e|f|'g h'|ijk|"l m n "|op|'q r s t'|u'v w'x|yz|
Nah! You're thinking of Simon Templar, originally played (on UKTV) by Roger Moore and later by Ian Ogilvy | [reply] [d/l] |
|
|
__SIG__
use B;
printf "You are here %08x\n", unpack "L!", unpack "P4", pack
"L!", B::svref_2object(sub{})->OUTSIDE;
| [reply] [d/l] |
|
|
s/(('|").*?\2)/ ($£ = $1) =~ s!\s+!\cA!g; $£ /ge; #!"
That isn't re-entrant. The right side of a substitution counts as a string (which in this case, is eval'ed because of the /e,); only the right side counts as a regex.
| [reply] [d/l] |
|
|
|
|
This looks way cool. However, it is not coming through in the web browser as something usable. It has odd characters, A with symbols over top in several places, and what looks like a currency symbol in a couple of places.
Does anybody have access to the original correct formula?
| [reply] |
|
|
I believe that line is supposed to be s/(('|").*?\2)/ ($£ = $1) =~ s!\s+!\cA!g; $£ /ge;. AFAICT, the nonstandard $£ is just supposed to be a scratch variable, so you can replace it with e.g. $a (assuming there's no sort in the call stack) or a lexical of your choosing.
However, note BrowserUk's words: "I probably deserve hate mail for this one but..." - see e.g. Regexp::Common::delimited or Text::Balanced.
| [reply] [d/l] |
|
|
Why did you use grep instead of map?
—John
| [reply] [d/l] [select] |
|
|
| [reply] |
Re: split on spaces, except those within quotes?
by jryan (Vicar) on Nov 12, 2002 at 05:11 UTC
|
Here's a fancy regex for Kanji :)
Note that I have 2 different versions available below; one that takes into account backslashed quotes within quotes, and another that doesn't.
# Use this one if you'd like to account for backslashed quotes
my @matches = $string =~
/
((?:
(?:
' (?:
(?>[^\\']*)
| \\ .
) '
) | \\ . | [^\s'\\]*
)+)
/gx;
# This one does not take backslashed quotes into account
#/
#((?:
# ' [^']* '
# | [^\s']*
#)+)
#/gx;
# because of the [^\s']*, you'll have matches weaved into your data ''
@matches = grep{$_}@matches;
Update: Fixed paste error.
| [reply] [d/l] |
Re: split on spaces, except those within quotes?
by rob_au (Abbot) on Nov 12, 2002 at 03:58 UTC
|
While an answer has already been provided, with very good suggestions from Kanji and cLive ;-), it would be remiss for the module Text::xSV written by our very own tilly not to be mentioned. This module provides an excellent interface for reading character separated data where quoted data may include character separators.
Additional information can also be found in the thread starting here.
perl -e 'print+unpack("N",pack("B32","00000000000000000000000111011101")),"\n"' | [reply] |
Re: split on spaces, except those within quotes?
by Aristotle (Chancellor) on Nov 15, 2002 at 16:14 UTC
|
| [reply] [d/l] |
Re: split on spaces, except those within quotes?
by shaq the foo (Initiate) on Nov 12, 2002 at 18:17 UTC
|
I would use a negative look ahead:
my @arr = split /\s(?!\w+')/, $string; | [reply] |