Re: Parsing a line of text items
by philipbailey (Curate) on Mar 30, 2021 at 12:23 UTC
|
I often use Text::ParseWords for this problem. It has the advantage of being a core module.
use strict;
use warnings;
use feature "say";
use Text::ParseWords;
my $args = '23 45.67 "John Marcus" Surname';
my @parsed = parse_line('\s+', 0, $args);
say for @parsed;
Output:
23
45.67
John Marcus
Surname
| [reply] [d/l] [select] |
|
|
Thanks. I had no idea Text-ParseWords existed. This is the ideal solution. And it is in the core!
I also tested Text-CSV and while good, it left some problems, especially the possible multiple whitespace between words.
| [reply] |
|
|
| [reply] |
|
|
C:\Strawberry\perl\bin>corelist Text::ParseWords
Data for 2021-01-23
Text::ParseWords was first released with perl 5
Tho it's exporting a lot on default
our @EXPORT = qw(shellwords quotewords nested_quotewords parse_line);
And you can tell the documentation is old, could have more examples.
| [reply] [d/l] [select] |
Re: Parsing a line of text items
by hippo (Archbishop) on Mar 30, 2021 at 11:25 UTC
|
use strict;
use warnings;
use Text::CSV;
use Test::More tests => 2;
my $in = '23 45.67 "John Marcus" Surname';
my $want = [23, 45.67, 'John Marcus', 'Surname'];
my $csv = Text::CSV->new ({sep_char => ' '});
ok $csv->parse ($in), 'Parsing';
is_deeply [$csv->fields], $want, 'Fields match';
You will probably want to extend the tests to better reflect your real-world requirements.
| [reply] [d/l] |
Re: Parsing a line of text items
by choroba (Cardinal) on Mar 30, 2021 at 12:11 UTC
|
Use glob. But make sure the input doesn't contain *, ?, and {}.
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
sub parse_args {
my ($input) = @_;
return [glob $input]
}
use Test::More tests => 1;
is_deeply parse_args('23 45.67 "John Marcus" Surname'),
[23, 45.67, 'John Marcus', 'Surname'];
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [d/l] [select] |
Re: Parsing a line of text items (updated)
by AnomalousMonk (Archbishop) on Mar 30, 2021 at 16:19 UTC
|
A Text::CSV (or Text::CSV_XS for speed)
solution seems very appropriate, but if you need to roll your own,
maybe something like:
Win8 Strawberry 5.30.3.1 (64) Tue 03/30/2021 11:53:39
C:\@Work\Perl\monks
>perl -Mstrict -Mwarnings
use 5.010; # needs (?|...) branch reset
my $rx_dq_body = qr{ [^\\"]* (?: \\. [^\\"]* )* }xms;
my $rx_unquoted = qr{ \S+ }xms;
for my $args (
'', ' ',
'23 45.67 "John Marcus O\"Ddly" Surname',
'"only \"quoted\" thing"',
'no quoted stuff',
) {
my $got_parsed_args =
my @parsed_args =
$args =~ m{ \G \s* (?| " ($rx_dq_body) " | ($rx_unquoted)) }xmsg;
print ">$args< -> ";
if ($got_parsed_args) {
printf "%s \n", join ' ', map ">$_<", @parsed_args;
}
else {
print "nada \n";
}
}
^Z
>< -> nada
> < -> nada
>23 45.67 "John Marcus O\"Ddly" Surname< -> >23< >45.67< >John Marcus
+O\"Ddly< >Surname<
>"only \"quoted\" thing"< -> >only \"quoted\" thing<
>no quoted stuff< -> >no< >quoted< >stuff<
This needs Perl version 5.10+ for the (?|...) "branch reset"
operator, but modification for pre-5.10 Perls is simple; let me
know if you need it. The $rx_dq_body regex to match a double-quoted
body supports embedded escaped double-quotes (and any other escaped
character). You can play with this regex to get exactly what you
want/need.
Of course, lots of tests should be done to verify this (or any other
solution) really does what you want.
Update: For some reason, I included a \G \s* group in
the regex above. It is entirely unnecessary although it does no harm
AFAICT. The match regex
m{ (?| " ($rx_dq_body) " | ($rx_unquoted)) }xmsg
should be exactly equivalent.
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
|
|
I can understand the challenge to hack it by yourself ... :)
But I think the suggested Text::ParseWords is core and offers everything I expect from parsing a command line.
It has also tests, is cutomizable and the source is well structured and documented.
So if I "wanna roll my own" and need to make special adjustments (like e.g. paired {quotes} ) I can take the code as a base.
DB<94> use Text::ParseWords qw/shellwords/
DB<96> x shellwords(q{this is 'an example' "with different quoting a
+nd \" escaping" including\ escaped\ whitespace})
0 'this'
1 'is'
2 'an example'
3 'with different quoting and " escaping'
4 'including escaped whitespace'
DB<97>
In case larger files need to be parsed I'll consider a dependency to Text::CSV , but this really looks good.
| [reply] [d/l] [select] |
|
|
I would tend to agree that an approach using a reliable, common
module like Text::ParseWords (of which I had not
previously been aware -- thanks, philipbailey++) or
Text::CSV is usually best. But I wanted to give an
example of a "pure" regex approach.
As an aside, I think it's worth emphasizing again that whatever
approach is taken, a thorough suite of tests for the final code is
advisable even if the approach is based on well-tested modules.
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] |
|
|
Re: Parsing a line of text items
by LanX (Saint) on Mar 30, 2021 at 11:21 UTC
|
update
scratch it, this doesn't work. It could, but it takes too much efforts to figure it out.
Better use the Text::CSV
approach
maybe
DB<45> p $_
23 45.67 "John Marcus" Surname 23 45.67 "John Marcus" Surname
DB<46> say $2 while /(?:^|("|\s+))(.*?)\1/g
45.67
John Marcus
Surname
45.67
John Marcus
Surname
DB<47>
Here are dragons, no guaranty whatsoever.
edit
as expected, it only works if it ends with a whitespace, and I had problems using (?:$|\1) at the end.
| [reply] [d/l] [select] |
|
|
DB<89> p "'$_'"
' 23 45.67 "John Marcus" Surname 23 45.67 "John Marcus" Surname ext
+ra '
DB<90> say "'$3'" while /\s*(\s)("?)(.*?)\2(?=\1)/g
'23'
'45.67'
'John Marcus'
'Surname'
'23'
'45.67'
'John Marcus'
'Surname'
'extra'
DB<91>
For testing I'd suggest to automatically create strings for random input. Like this you can cover a large set of cases.
NB: here are still dragons.
| [reply] [d/l] |