Re: In need of a stupid regex trick
by jweed (Chaplain) on Jan 04, 2004 at 20:59 UTC
|
use Text::ParseWords;
@list = shellwords($string);
Who is Kayser Söze?
Code is (almost) always untested.
| [reply] [d/l] |
Re: In need of a stupid regex trick
by CountZero (Bishop) on Jan 04, 2004 at 21:05 UTC
|
I don't know about a regex, but Text::CSV_XS can do it: use strict;
use Text::CSV_XS;
use Data::Dumper;
my $csv = Text::CSV_XS->new({sep_char=>' '});
$csv->parse('one "two three" four five "six seven eight" nine');
my @columns = $csv->fields();
print Dumper(@columns);
The result is: $VAR1 = 'one';
$VAR2 = 'two three';
$VAR3 = 'four';
$VAR4 = 'five';
$VAR5 = 'six seven eight';
$VAR6 = 'nine';
CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law
| [reply] [d/l] [select] |
Re: In need of a stupid regex trick
by demerphq (Chancellor) on Jan 04, 2004 at 21:16 UTC
|
my @list=$str=~/("[^"]*"|\S+)/g
will do, but it doesnt handle escaping, and alas im on a box without perl installed so i havent tested it.
---
demerphq
First they ignore you, then they laugh at you, then they fight you, then you win.
-- Gandhi
| [reply] [d/l] [select] |
|
|
This seems to work but it still needs something extra besides a regex :(
my $str='one "two three" four five "six seven eight" nine';
my @list=grep defined, $str=~/"([^"]*)"|(\S+)/g;
print join "\n", @list;
CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law
| [reply] [d/l] |
|
|
Almost, but it keeps the " around the strings.
CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law
| [reply] [d/l] |
|
|
my @list = grep defined, $str=~/"([^"]*)"|(\S+)/g;
| [reply] [d/l] |
|
|
|
|
my @list=$str=~/((?<=")[^"]*(?=")|\S+)/g
---
demerphq
First they ignore you, then they laugh at you, then they fight you, then you win.
-- Gandhi
| [reply] [d/l] [select] |
|
|
|
|
|
Re: In need of a stupid regex trick
by Zaxo (Archbishop) on Jan 04, 2004 at 21:24 UTC
|
Here's a regex that almost works, but it leaves empty shards. Hence, grep...
local $_= q(one "two three" four five "six seven eight" nine);
my @foo = grep {$_} /\G(?:(\w+)\s*)|(?:"([^"]*)"\s*)/g;
local $,="\n";
print @foo, $/;
It works for your data, but I suspect it is very fragile.
| [reply] [d/l] |
Re: In need of a stupid regex trick
by ysth (Canon) on Jan 04, 2004 at 21:43 UTC
|
It's easier to do this with m//g than split.
@list = $string =~ /"[^"]+"|\S+/g
@list = grep defined, $string =~ /"([^"]*)"|(\S+)/g;
Update: don't leave quotes on; allow empty string ""
Doesn't handle backslashes before " specially; if there is an unmatched " in the input, you'll get one it returned as part of an element.
| [reply] [d/l] [select] |
|
|
perl -wle '@l = split(/(?:"([^"]*)"|\s+)/, $ARGV[0]);$,="\t";print @l;' 'one "two three" four "five"'
| [reply] [d/l] |
|
|
perl -wle'@list = grep defined && length, split /(?:(?<!\S)"([^"]*)"(?
+!\S))|\s+/, shift;print for @list' 'one "two three" four "five"'
perl -wle'@list = grep defined, split /(?:(?<!\S)"([^"]*)"(?!\S))|\s+/
+, shift;print for @list' 'one "two three" four "five"'
| [reply] [d/l] [select] |
|
|
|
|
Re: In need of a stupid regex trick
by oha (Friar) on Jan 04, 2004 at 21:24 UTC
|
IOW, you wish to split iff there are an even number of quotes since the start of string.
/(?<^[^"]*("[^"]*"[^"])*) / does not work: perl states that
Variable length lookbehind not implemented before HERE mark in regex
lookahead works, but. and the following code:
my @a = (
'simple',
'keep simple',
'a "bit more" difficult',
'an "increasing more" "and more" here');
foreach $_ (@a)
{
s/ (?=("[^"]*"[^"])*[^"]+$)/ | /g;
print "$_\n";
}
produces
simple
keep | simple
a | "bit more" | difficult
an | "increasing more" | "and more" | here
unfortunately, the regex "confuses" split and it's not usable, at least i was not able to. but why? :)
| [reply] [d/l] [select] |
Re: In need of a stupid regex trick
by Roger (Parson) on Jan 04, 2004 at 22:36 UTC
|
The following solution is based on an example of split with capture I posted earlier...
use strict;
use warnings;
use Data::Dumper;
my $str = 'one "two three" four five "six seven eight" nine';
my @words = map { $_ || () } split /"(\\"|.*?)"|\s+/, $str;
print Dumper(\@words);
And the output is -
$VAR1 = [
'one',
'two three',
'four',
'five',
'six seven eight',
'nine'
];
| [reply] [d/l] [select] |
Re: In need of a stupid regex trick
by David Caughell (Monk) on Jan 04, 2004 at 22:38 UTC
|
My instincts were that it can be done, since regex's are where perl really shines.
I'm still learning the language, so I've decided to give this a shot just for fun. Jweed's elegant solution (and whoever put it out on cpan) is the best if you're doing this sort of thing for any other purpose than learning the language, though.
Now on to something not quite so elegant:
#!/usr/bin/perl -w
use strict;
my $string = 'one "two three" four five "six seven eight" nine';
my @list = split /
[ ]" #opening quotes
| #or
"[ ] #closing quotes
|
[ ] #a space
(?!.*?\w")
# that's not before any number of characters followed
# by a closing quote (allows EOL at quote)
/x, $string;
OOC, is it possible to put a character group (and quantify it with a * + ? or {} ) into a look-ahead match?
Crap, this isn't quite there yet. The four and five are sticking together. If anyone has suggestions on how to fix that, I'd appreciate it.
$scratchpad_public = 0 unless $scratchpad;
| [reply] [d/l] |
|
|
is it possible to put a character group (and quantify it with a * + ? or {} ) into a look-ahead match?
Sure it is. Anything you can do in a plain match, you can do in a lookahead match.
For lookbehind, it's a different matter: you can only use fixed length lookbehind, so quantifiers (like * and +), and varied-length alternatives, are out. BTW if all alternatives have the same length, it is allowed, as in:
$_ = q[There's food at the bar.];
while(/(?<=foo|bar)(\S+)/g) {
print "$1\n";
}
| [reply] [d/l] |
|
|
| [reply] [d/l] |
Re: In need of a stupid regex trick
by pg (Canon) on Jan 04, 2004 at 22:12 UTC
|
use Data::Dumper;
use strict;
sub my_split {
local $_ = shift;
my $abc;
s/(('|").*?\2)/ ($abc = $1) =~ s!\s+!\cA!g; $abc /ge; #!"
grep{s/\cA/ /g, $_}split/\s+/;
}
my @pieces = my_split q/one "two three" four "five six seven" eight/;
print Dumper(\@pieces);
| [reply] [d/l] |
Re: In need of a stupid regex trick
by Anonymous Monk on Jan 05, 2004 at 21:00 UTC
|
my $s = ' one "two three" four five "six seven eight" nine';
my @w = $s =~ /
\s* #strip whitespace outside of paren "blocks"
(?:"(?{local $openq=1}))? #note fact of open-quote without storing
((??{$openq ? '[^"]*' : '\w+'})) #store block
(?:(??{$openq ? '"' : '\b'})) #gobble up closeq or word-boundary(rea
+lly nop in this case)
/gx;
local $" = ':';
print "@w\n";
,welchavw | [reply] [d/l] |
|
|
| [reply] |