merrymonk has asked for the wisdom of the Perl Monks concerning the following question:

I need to analyse data used to control machine tool. A sample is shown below. Each item in the row is a letter followed by a number which can be either an integer or real.
For each row I want to split the data so that in a hash I get, for example, for the first row
$h{N} = 260; $h{G} = 02 ; $h{X} = 142.05; $h{Y} = 649.62; $h{I} = 77.33; $h{J} = 58.34; $h{H} = 2; $h{M} = 25;

N260G02X142.05Y649.62I77.33J58.34H2M25 N265M20 N270G45 N275G01X304.78Y608.8C45.91M25 N280M20 N285G46C0 N290G03X324.39Y638.4I-69.58J67.4H2M25 N295M20 N300G45 N305G01X180.04Y592.89C326.5M25 N310X279.77Y586.78C26.5 N315X195.39Y584.57C336.5 N320X228.31Y579.58C355.92

What is the most effective way of using Perl to ‘split’ the data so I can create the hash?

Replies are listed 'Best First'.
Re: Efficient split for alpha numeric pairs in a row
by choroba (Cardinal) on Jul 01, 2015 at 07:24 UTC
    You can use a regular expression. The /g modifier in a while-loop will match as many times as possible:
    #! /usr/bin/perl use warnings; use strict; use Data::Dumper; while (<DATA>) { my %hash; while (/([A-Z])([0-9.]+)/g) { my ($key, $num) = ($1, $2); $hash{$key} = $num; } print Dumper(\%hash); } __DATA__ N260G02X142.05Y649.62I77.33J58.34H2M25 N265M20 N270G45 N275G01X304.78Y608.8C45.91M25 N280M20 N285G46C0 N290G03X324.39Y638.4I-69.58J67.4H2M25 N295M20 N300G45 N305G01X180.04Y592.89C326.5M25 N310X279.77Y586.78C26.5 N315X195.39Y584.57C336.5 N320X228.31Y579.58C355.92
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Efficient split for alpha numeric pairs in a row
by BrowserUk (Patriarch) on Jul 01, 2015 at 07:55 UTC

    pp $_ for map{ { m[([A-Z])([0-9.]+)]g } } split"\n", <<EOD; N260G02X142.05Y649.62I77.33J58.34H2M25 N265M20 N270G45 N275G01X304.78Y608.8C45.91M25 N280M20 N285G46C0 N290G03X324.39Y638.4I-69.58J67.4H2M25 N295M20 N300G45 N305G01X180.04Y592.89C326.5M25 N310X279.77Y586.78C26.5 N315X195.39Y584.57C336.5 N320X228.31Y579.58C355.92 EOD ;; { G => 02, H => 2, I => 77.33, J => 58.34, M => 25, N => 260, X => 142 +.05, Y => 649.62 } { M => 20, N => 265 } { G => 45, N => 270 } { C => 45.91, G => 01, M => 25, N => 275, X => 304.78, Y => 608.8 } { M => 20, N => 280 } { C => 0, G => 46, N => 285 } { G => 03, H => 2, J => 67.4, M => 25, N => 290, X => 324.39, Y => 638 +.4 } { M => 20, N => 295 } { G => 45, N => 300 } { C => 326.5, G => 01, M => 25, N => 305, X => 180.04, Y => 592.89 } { C => 26.5, N => 310, X => 279.77, Y => 586.78 } { C => 336.5, N => 315, X => 195.39, Y => 584.57 } { C => 355.92, N => 320, X => 228.31, Y => 579.58 }

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
    I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!
      Many thanks for the almost instant replies. This is just what I was looking for!
Re: Efficient split for alpha numeric pairs in a row
by johngg (Canon) on Jul 01, 2015 at 09:55 UTC

    An alternative approach split'ing on points preceded by a letter and followed by a value or vice versa.

    use 5.014; use Data::Dumper; open my $inFH, q{<}, \ <<EOD or die $!; N260G02X142.05Y649.62I77.33J58.34H2M25 N265M20 N270G45 N275G01X304.78Y608.8C45.91M25 N280M20 N285G46C0 N290G03X324.39Y638.4I-69.58J67.4H2M25 N295M20 N300G45 N305G01X180.04Y592.89C326.5M25 N310X279.77Y586.78C26.5 N315X195.39Y584.57C336.5 N320X228.31Y579.58C355.92 EOD my @dataLines = map { { split m{(?x) (?<=[A-Z]) (?=[-\d]) | (?<=[-\d]) (?=[A-Z]) } +} } map { chomp; $_ } <$inFH>; close $inFH or die $!; print Data::Dumper->Dumpxs( [ \ @dataLines ], [ qw{ *dataLines } ] );

    The output.

    I hope this is helpful.

    Cheers,

    JohnGG

Re: Efficient split for alpha numeric pairs in a row
by AnomalousMonk (Archbishop) on Jul 01, 2015 at 17:13 UTC

    Here's another, slightly different approach. It tries to factor regex elements, and to build an extraction regex (rather than a split pattern of some kind) based on a mapping of all valid commands to their possible values. I make some assumptions:

    • an  $integer is zero or positive, never negative (update: and it never has a sign);
    • certain real numbers like .0 or 0. need not be recognized;
    • some commands like  N G have  $integer values only, and others like  X Y have only  $real values (the OP says a command is "a letter followed by a number which can be either an integer or real", implying that N, for instance, could have a real value);
    • there is no inherent order in the occurrence of the  N G X Y etc. commands in a command sequence.
    Note that the command sequence is validated before commands are extracted (the undefined  'C' command in one of the example sequences causes an abort). Also note that I use the  (?|pattern) extension of Perl 5.10+ (see Extended Patterns in perlre); if you need to avoid this construct, I can provide an alternative.
    c:\@Work\Perl\monks>perl -wMstrict -MData::Dump -le "my $integer = qr{ (?<! \d) \d+ (?! \d) }xms; my $real = qr{ -? $integer [.] $integer }xms; ;; my %cmds = ( N => $integer, G => $integer, H => $integer, M => $integer, X => $real, Y => $real, I => $real, J => $real, ); ;; my ($rx_cmd) = map qr{ (?| $_) }xms, join q{ | }, map qq{($_) ($cmds{$_})}, keys %cmds ; ;; for my $s (qw( N260G02X142.05Y649.62I77.33J58.34H2M25 N265M20 N270G45 N290G03X324.39Y638.4I-69.58J67.4H2M25 N275G01X304.78Y608.8C45.91M25 N280M20 )) { print qq{seq: '$s'}; die qq{bad: '$s'} unless $s =~ m{ \A $rx_cmd+ \z }xms; my %parts = $s =~ m{ $rx_cmd }xmsg; dd \%parts; } " seq: 'N260G02X142.05Y649.62I77.33J58.34H2M25' { G => "02", H => 2, I => 77.33, J => 58.34, M => 25, N => 260, X => 1 +42.05, Y => 649.62 } seq: 'N265M20' { M => 20, N => 265 } seq: 'N270G45' { G => 45, N => 270 } seq: 'N290G03X324.39Y638.4I-69.58J67.4H2M25' { G => "03", H => 2, I => -69.58, J => 67.4, M => 25, N => 290, X => 3 +24.39, Y => 638.4 } seq: 'N275G01X304.78Y608.8C45.91M25' bad: 'N275G01X304.78Y608.8C45.91M25' at -e line 1.


    Give a man a fish:  <%-(-(-(-<

Re: Efficient split for alpha numeric pairs in a row
by locked_user sundialsvc4 (Abbot) on Jul 01, 2015 at 12:07 UTC

    And, for the record, “the key piece of magic” is the /g, and possibly also the /c, modifiers, which are described under, of all places, “Regexp Quote-Like Operators” in perldoc perlop.   (As referred-to in the main regexp topic, perldoc perlre.)

    These two modifiers allow you to search repeatedly within a single string, in a while-loop as shown, using groups (in parentheses) to extract each part that has been matched.   As shown in examples given earlier, this lets you “roll-up” such a string, viewing it exactly as the CNC designers intended:   as a list of one-or-more variable/value pairs having no set delimiters between them.   (Delimiters being unnecessary because alpha characters are distinct from digits and decimal points.)