Efficient split for alpha numeric pairs in a row

merrymonk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Efficient split for alpha numeric pairs in a row by choroba (Cardinal) on Jul 01, 2015 at 07:24 UTC
You can use a regular expression. The `/g` modifier in a while-loop will match as many times as possible: `#! /usr/bin/perl use warnings; use strict; use Data::Dumper; while (<DATA>) { my %hash; while (/([A-Z])([0-9.]+)/g) { my ($key, $num) = ($1, $2); $hash{$key} = $num; } print Dumper(\%hash); } __DATA__ N260G02X142.05Y649.62I77.33J58.34H2M25 N265M20 N270G45 N275G01X304.78Y608.8C45.91M25 N280M20 N285G46C0 N290G03X324.39Y638.4I-69.58J67.4H2M25 N295M20 N300G45 N305G01X180.04Y592.89C326.5M25 N310X279.77Y586.78C26.5 N315X195.39Y584.57C336.5 N320X228.31Y579.58C355.92` [download] لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l] [select]
Re: Efficient split for alpha numeric pairs in a row by BrowserUk (Patriarch) on Jul 01, 2015 at 07:55 UTC
pp $_ for map{ { m[([A-Z])([0-9.]+)]g } } split"\n", <<EOD; N260G02X142.05Y649.62I77.33J58.34H2M25 N265M20 N270G45 N275G01X304.78Y608.8C45.91M25 N280M20 N285G46C0 N290G03X324.39Y638.4I-69.58J67.4H2M25 N295M20 N300G45 N305G01X180.04Y592.89C326.5M25 N310X279.77Y586.78C26.5 N315X195.39Y584.57C336.5 N320X228.31Y579.58C355.92 EOD ;; { G => 02, H => 2, I => 77.33, J => 58.34, M => 25, N => 260, X => 142 +.05, Y => 649.62 } { M => 20, N => 265 } { G => 45, N => 270 } { C => 45.91, G => 01, M => 25, N => 275, X => 304.78, Y => 608.8 } { M => 20, N => 280 } { C => 0, G => 46, N => 285 } { G => 03, H => 2, J => 67.4, M => 25, N => 290, X => 324.39, Y => 638 +.4 } { M => 20, N => 295 } { G => 45, N => 300 } { C => 326.5, G => 01, M => 25, N => 305, X => 180.04, Y => 592.89 } { C => 26.5, N => 310, X => 279.77, Y => 586.78 } { C => 336.5, N => 315, X => 195.39, Y => 584.57 } { C => 355.92, N => 320, X => 228.31, Y => 579.58 } [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!	[reply] [d/l]
Re^2: Efficient split for alpha numeric pairs in a row by merrymonk (Hermit) on Jul 01, 2015 at 08:24 UTC
Many thanks for the almost instant replies. This is just what I was looking for!	[reply]
Re: Efficient split for alpha numeric pairs in a row by johngg (Canon) on Jul 01, 2015 at 09:55 UTC
An alternative approach split'ing on points preceded by a letter and followed by a value or vice versa. use 5.014; use Data::Dumper; open my $inFH, q{<}, \ <<EOD or die $!; N260G02X142.05Y649.62I77.33J58.34H2M25 N265M20 N270G45 N275G01X304.78Y608.8C45.91M25 N280M20 N285G46C0 N290G03X324.39Y638.4I-69.58J67.4H2M25 N295M20 N300G45 N305G01X180.04Y592.89C326.5M25 N310X279.77Y586.78C26.5 N315X195.39Y584.57C336.5 N320X228.31Y579.58C355.92 EOD my @dataLines = map { { split m{(?x) (?<=[A-Z]) (?=[-\d]) \| (?<=[-\d]) (?=[A-Z]) } +} } map { chomp; $_ } <$inFH>; close $inFH or die $!; print Data::Dumper->Dumpxs( [ \ @dataLines ], [ qw{ *dataLines } ] ); [download] The output. Read more... (3 kB) I hope this is helpful. Cheers, JohnGG	[reply] [d/l] [select]
Re^2: Efficient split for alpha numeric pairs in a row by BrowserUk (Patriarch) on Jul 01, 2015 at 10:18 UTC
`open my $inFH, q{<}, \ <<EOD or die $!;` Neat! I wish I'd thought of that. (You will, dear boy; you will :) With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!	[reply] [d/l]
Re: Efficient split for alpha numeric pairs in a row by AnomalousMonk (Archbishop) on Jul 01, 2015 at 17:13 UTC
Here's another, slightly different approach. It tries to factor regex elements, and to build an extraction regex (rather than a split pattern of some kind) based on a mapping of all valid commands to their possible values. I make some assumptions: an `$integer` is zero or positive, never negative (update: and it never has a sign); certain real numbers like .0 or 0. need not be recognized; some commands like `N G` have `$integer` values only, and others like `X Y` have only `$real` values (the OP says a command is "a letter followed by a number which can be either an integer or real", implying that N, for instance, could have a real value); there is no inherent order in the occurrence of the `N G X Y` etc. commands in a command sequence. Note that the command sequence is validated before commands are extracted (the undefined `'C'` command in one of the example sequences causes an abort). Also note that I use the `(?\|pattern)` extension of Perl 5.10+ (see Extended Patterns in perlre); if you need to avoid this construct, I can provide an alternative. c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my $integer = qr{ (?<! \d) \d+ (?! \d) }xms; my $real = qr{ -? $integer [.] $integer }xms; ;; my %cmds = ( N => $integer, G => $integer, H => $integer, M => $integer, X => $real, Y => $real, I => $real, J => $real, ); ;; my ($rx_cmd) = map qr{ (?\| $_) }xms, join q{ \| }, map qq{($_) ($cmds{$_})}, keys %cmds ; ;; for my $s (qw( N260G02X142.05Y649.62I77.33J58.34H2M25 N265M20 N270G45 N290G03X324.39Y638.4I-69.58J67.4H2M25 N275G01X304.78Y608.8C45.91M25 N280M20 )) { print qq{seq: '$s'}; die qq{bad: '$s'} unless $s =~ m{ \A $rx_cmd+ \z }xms; my %parts = $s =~ m{ $rx_cmd }xmsg; dd \%parts; } " seq: 'N260G02X142.05Y649.62I77.33J58.34H2M25' { G => "02", H => 2, I => 77.33, J => 58.34, M => 25, N => 260, X => 1 +42.05, Y => 649.62 } seq: 'N265M20' { M => 20, N => 265 } seq: 'N270G45' { G => 45, N => 270 } seq: 'N290G03X324.39Y638.4I-69.58J67.4H2M25' { G => "03", H => 2, I => -69.58, J => 67.4, M => 25, N => 290, X => 3 +24.39, Y => 638.4 } seq: 'N275G01X304.78Y608.8C45.91M25' bad: 'N275G01X304.78Y608.8C45.91M25' at -e line 1. [download] Give a man a fish: `<%-(-(-(-<`	[reply] [d/l] [select]
Re: Efficient split for alpha numeric pairs in a row by locked_user sundialsvc4 (Abbot) on Jul 01, 2015 at 12:07 UTC
And, for the record, “the key piece of magic” is the `/g`, and possibly also the `/c`, modifiers, which are described under, of all places, “Regexp Quote-Like Operators” in `perldoc perlop`. (As referred-to in the main regexp topic, `perldoc perlre`.) These two modifiers allow you to search repeatedly within a single string, in a `while`-loop as shown, using groups (in parentheses) to extract each part that has been matched. As shown in examples given earlier, this lets you “roll-up” such a string, viewing it exactly as the CNC designers intended: as a list of one-or-more variable/value pairs having no set delimiters between them. (Delimiters being unnecessary because alpha characters are distinct from digits and decimal points.)