in reply to Re^2: Repeating a capture group pattern within a pattern
in thread Repeating a capture group pattern within a pattern

That is pretty horrible but at least it appears well-formed whereby it is a space-separated list of decimal numbers and NaNs. I would probably just split it and go from there, TBH.

#!/usr/bin/env perl use strict; use warnings; my $str = '1016.1 7.7 20.6 13.8 72.9 215.0 6.64 3.94 5.33 0.0 31.3 31. +3 0.0 0.0 716.7 8461717.0 NaN 7750386.0 2224256.0 38507.5 10.9 1016.4 + 7.7 21.1 13.7 71.1 222.0 6.82 4.67 4.95 0.0 2.1 2.1 0.0 0.0 727.6 11 +081057.0 NaN 10146792.0 2372829.8 40682.2 11.2 1016.4 7.7 21.6 13.2 6 +7.7 220.0 6.71 4.37 5.1 0.0 0.8 0.8 0.0 0.0 749.9 13780637.0 NaN 1261 +4863.0 4011495.3 44407.4 11.3 1016.4 7.7 21.7 12.3 64.8 216.0 6.64 3. +98 5.31 0.0 0.0 0.0 0.0 0.0 693.8 16278537.0 NaN 14898300.0 5267137.5 + 47598.7 11.1 1016.1 7.7 22.1 9.8 56.4 224.0 6.72 4.7 4.81 0.0 0.0 0. +0 0.0 0.0 613.8 18488268.0 NaN 16917212.0 7048784.0 54656.4 11.3 1015 +.9 7.7 21.6 10.3 59.0 223.0 6.52 4.48 4.74 0.0 0.0 0.0 0.0 0.0 508.5 +20319020.0 NaN 18588124.0 8487086.0 52728.9 11.1 1016.0 7.7 20.6 11.8 + 66.4 217.0 6.22 3.81 4.93 0.0 0.2 0.0 0.0 0.2 387.0 21712000.0 NaN 1 +9859482.0 9532787.0 45898.5 10.5 1016.1 7.7 19.9 12.7 71.8 219.0 5.97 + 3.74 4.64 0.0 3.3 0.0 0.0 3.3 257.6 22639382.0 NaN 20706258.0 101554 +80.0 39930.0 9.9 1016.4 7.7 19.4 13.2 74.9 221.0 5.25 3.43 3.97 0.0 1 +.8 0.0 0.0 1.8 140.2 23144020.0 NaN 21166678.0 10436559.0 36483.1 9.3 + 1016.4 7.7 19.2 12.5 73.1 232.0 4.04 3.19 2.44 0.0 6.8 2.7 4.3 0.0 4 +7.7 23315526.0 NaN 21323276.0 10494672.0 38582.3 8.0 1016.3 7.7 19.2 +11.2 68.6 262.0 2.52 2.49 0.34 0.0 53.8 0.0 24.2 39.1 1.8 23322102.0 +NaN 21329584.0 10494672.0 43729.5 6.0 1016.5 7.7 19.0 11.0 68.8 247.0 + 2.3 2.08 0.98 0.0 100.0 0.0 99.5 100.0 0.1 23322252.0 NaN 21329518.0 + 10494672.0 43367.8 3.7 1016.7 7.7 18.6 12.0 73.4 225.0 2.3 1.57 1.67 + 0.0 54.2 0.0 46.3 14.7 0.0 23322252.0 NaN 21329518.0 10494672.0 3821 +9.0 3.6 1016.8 7.7 18.4 14.1 81.9 196.0 2.97 0.75 2.88 0.0 86.4 0.1 8 +2.5 21.9 0.1 23322356.0 NaN 21329518.0 10494672.0 28237.8 4.4 1017.0 +7.7 18.5 14.4 82.5 199.0 3.04 0.87 2.92 0.0 99.7 53.1 65.1 98.6 0.0 2 +3322262.0 NaN 21329518.0 10494672.0 27807.3 5.0 1017.0 7.7 18.7 14.1 +80.8 195.0 2.52 0.57 2.46 0.0 100.0 71.6 23.1 99.9 0.1 23322488.0 NaN + 21329518.0 10494672.0 29676.3 4.6 1017.2 7.7 18.8 14.0 80.0 196.0 2. +96 0.78 2.85 0.18 100.0 100.0 100.0 99.7 0.4 23323732.0 NaN 21330926. +0 10494672.0 7893.4 4.6 1017.1 7.7 18.6 14.2 81.3 155.0 2.05 -0.98 1. +8 1.3 93.7 58.4 63.3 57.6 15.0 23377624.0 NaN 21380274.0 10494672.0 2 +9305.2 4.6 1017.1 7.7 18.6 13.8 80.1 170.0 3.61 -0.55 3.56 1.3 92.8 8 +9.9 30.2 1.7 72.1 23637418.0 NaN 21617728.0 10494761.0 31078.9 5.7 10 +17.3 7.7 18.9 14.9 83.3 178.0 2.97 -0.13 2.98 1.3 86.3 55.2 68.3 0.0 +130.9 24108452.0 NaN 22048990.0 10494719.0 27161.4 5.4 1017.5 7.7 19. +7 15.3 81.4 197.0 2.29 0.48 2.24 1.3 57.2 22.3 44.9 0.0 302.4 2519711 +8.0 NaN 23044054.0 10494678.0 28930.3 4.8 1017.7 7.7 20.4 14.6 76.5 1 +98.0 1.82 0.28 1.78 1.3 66.0 10.1 62.2 0.0 413.4 26685568.0 NaN 24404 +956.0 10494752.0 34279.5 4.0 1017.5 7.7 20.7 13.8 72.6 161.0 2.67 -1. +21 2.33 1.3 84.8 0.0 84.8 0.0 429.6 28232080.0 NaN 25819190.0 1049545 +1.0 38634.6 4.6 1017.4 7.7 21.2 13.9 71.5 159.0 3.15 -1.39 2.81 1.3 9 +7.7 0.2 97.7 0.0 444.8 29833140.0 NaN 27281036.0 10495527.0 39613.4 5 +.3 1017.6 7.7 22.0 13.7 68.4 161.0 2.93 -1.21 2.62 1.3 86.6 1.1 71.8 +51.9 558.1 31842300.0 NaN 29115360.0 10495682.0 42999.5 5.7 1017.4 7. +7 23.6 11.8 58.0 143.0 2.81 -1.89 2.04 1.3 5.5 0.0 4.9 0.6 628.6 3410 +5344.0 NaN 31181590.0 10496878.0 53270.7 5.5 1017.3 7.7 23.9 10.7 54. +4 139.0 3.68 -2.68 2.49 1.3 10.6 0.0 7.9 2.9 673.6 36530392.0 NaN 333 +94718.0 10504540.0 55330.6 6.5 1017.2 7.7 23.6 11.7 57.8 141.0 4.98 - +3.24 3.81 1.3 37.6 0.0 5.4 34.0 671.6 38948084.0 NaN 35600932.0 10572 +389.0 53759.2 8.3 1017.3 7.7 23.0 12.4 61.2 145.0 5.0 -2.93 4.05 1.3 +24.6 0.0 24.6 0.0 563.3 40976088.0 NaN 37452260.0 10577521.0 51063.5 +8.4 1017.1 7.7 22.9 13.0 63.6 145.0 4.66 -2.77 3.78 1.3 28.6 0.0 7.3 +22.9 465.8 42652852.0 NaN 38982672.0 10578041.0 48385.1 8.3 1017.0 7. +7 22.8 12.7 62.7 143.0 4.44 -2.77 3.46 1.3 38.8 0.0 34.2 6.9 350.8 43 +915868.0 NaN 40134736.0 10578632.0 49670.0 7.6 1016.7 7.7 22.4 12.8 6 +4.2 138.0 4.17 -2.85 3.03 1.3 10.5 0.0 2.4 8.3 232.2 44751760.0 NaN 4 +0897580.0 10579198.0 48024.7 7.1 1016.5 7.7 21.7 14.3 71.1 128.0 3.16 + -2.53 1.89 1.3 0.9 0.0 0.0 0.9 131.4 45224988.0 NaN 41329512.0 10638 +878.0 40243.3 6.6 1016.6 7.7 21.2 14.9 74.9 124.0 2.97 -2.45 1.71 1.3 + 25.2 0.0 2.4 23.3 44.4 45384908.0 NaN 41475220.0 10681477.0 35910.2 +4.9 1016.4 7.7 20.9 15.2 77.1 119.0 2.62 -2.3 1.25 1.3 6.4 0.0 5.8 0. +7 1.6 45390616.0 NaN 41480160.0 10681174.0 33633.9 4.5 1016.3 7.7 20. +6 15.2 78.0 77.0 3.04 -2.98 -0.49 1.3 1.5 0.0 1.0 0.4 0.1 45390552.0 +NaN 41479900.0 10681174.0 32898.9 4.4 1016.3 7.7 19.8 15.5 82.0 83.0 +3.6 -3.54 -0.59 1.3 39.6 1.5 0.0 38.6 0.1 45390516.0 NaN 41479900.0 1 +0681174.0 28384.6 5.4 1016.2 7.7 19.8 15.2 80.9 80.0 3.99 -3.89 -0.88 + 1.3 79.8 0.0 0.1 79.8 0.0 45390256.0 NaN 41479900.0 10681174.0 29798 +.2 6.2 1016.1 7.7 19.9 14.5 78.0 83.0 4.55 -4.5 -0.65 1.3 72.8 1.8 5. +5 70.7 0.1 45390380.0 NaN 41479900.0 10681174.0 33125.9 6.9 1015.7 7. +7 19.7 14.8 79.8 79.0 4.93 -4.83 -1.06 1.3 82.5 4.1 0.9 81.6 0.1 4539 +0640.0 NaN 41479900.0 10681174.0 31134.5 7.6 1015.4 7.7 19.6 15.2 81. +5 85.0 5.18 -5.15 -0.54 1.3 60.2 1.5 9.8 55.2 2.6 45399888.0 NaN 4148 +8504.0 10681174.0 29202.1 8.0 1015.0 7.7 19.7 16.0 84.4 90.0 5.29 -5. +28 -0.08 1.3 72.2 0.4 15.8 66.8 32.6 45517296.0 NaN 41595768.0 106811 +74.0 25601.0 8.5 1014.9 7.7 19.9 16.2 84.5 91.0 5.63 -5.62 -0.07 1.3 +62.2 0.1 3.6 60.7 99.1 45873920.0 NaN 41921652.0 10681188.0 25504.3 8 +.9 1014.7 7.7 20.5 16.2 82.2 97.0 5.27 -5.26 0.48 1.3 93.6 2.4 5.9 93 +.0 169.2 46483036.0 NaN 42477320.0 10681384.0 28098.3 8.7 1014.6 7.7 +20.8 16.1 80.8 100.0 5.36 -5.29 0.87 1.3 99.7 2.6 12.5 99.7 196.9 471 +91680.0 NaN 43123740.0 10681074.0 29718.5 8.7 1014.2 7.7 21.7 16.3 78 +.2 101.0 5.17 -5.08 0.9 1.3 100.0 0.0 5.8 100.0 290.9 48239060.0 NaN +44080204.0 10681266.0 32552.5 8.4 1014.1 7.7 22.3 16.4 76.7 102.0 5.6 +9 -5.59 1.04 1.3 100.0 0.0 0.3 100.0 344.6 49479400.0 NaN 45212932.0 +10694658.0 34180.5 9.1 1013.5 7.7 22.0 16.4 77.6 94.0 6.21 -6.2 0.29 +1.3 100.0 0.0 14.7 100.0 258.3 50409360.0 NaN 46061948.0 10725906.0 3 +3338.7 9.7 1013.5 7.7 21.4 15.2 75.4 95.0 6.7 -6.69 0.48 1.3 100.0 0. +0 79.6 100.0 115.0 50823272.0 NaN 46440372.0 10725795.0 35955.3 10.4 +1013.1 7.7 22.0 15.2 73.4 91.0 6.86 -6.86 0.07 1.3 99.5 0.0 78.3 97.8 + 235.6 51671476.0 NaN 47214844.0 10727690.0 38242.3 10.8'; my @fields = split / /, $str; my $NaNcount = grep { $_ eq 'NaN' } @fields; print "There are " . scalar @fields . " fields in the line of which $NaNcount are NaN.\n";

If you really only want the first four, then split / /, $str, 5 will bundle all the stuff you don't want into the unused 5th list item.

HTH.


🦛

Replies are listed 'Best First'.
Re^4: Repeating a capture group pattern within a pattern
by mldvx4 (Friar) on Jul 15, 2024 at 11:43 UTC

    Thanks. Is there a way to use split() on blocks of 21 items at a time? The readings come in blocks of 21 data points with the data points being determined by position in the sequence: Pressure, Geop Height, Temperature, Dew Point, Humidity, Wind Direction, Wind Speed MS, Wind UMS, Wind VMS, Precipitation Amount, Total Cloud Cover, Low Cloud Cover, Medium Cloud Cover, High Cloud Cover, Radiation Global, Radiation Global Accumulation, Radiation Net Surface LW Accumulation, Radiation Net Surface SW Accumulation, Radiation SW Accumulation, Visibility, and Wind Gusts.

      I would probably do something like this to make an array of hashrefs.

      #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11160609 use warnings; use List::AllUtils qw( pairwise bundle_by ); my @keys = split /, /, 'Pressure, Geop Height, Temperature, Dew Point, + Humidity, Wind Direction, Wind Speed MS, Wind UMS, Wind VMS, Precipi +tation Amount, Total Cloud Cover, Low Cloud Cover, Medium Cloud Cover +, High Cloud Cover, Radiation Global, Radiation Global Accumulation, +Radiation Net Surface LW Accumulation, Radiation Net Surface SW Accum +ulation, Radiation SW Accumulation, Visibility, Wind Gusts'; my @datasets = bundle_by { +{ pairwise { $a, $b } @keys, @_ } } 21, split ' ', getdata(); use Data::Dump 'dd'; dd @datasets[0 .. 2]; sub getdata { return '1016.1 7.7 20.6 13.8 72.9 215.0 6.64 3.94 5.33 0.0 31.3 31.3 0 +.0 0.0 ... 10727690.0 38242.3 10.8'; }

      And here's what the first 3 elements of the answer array are:

      ( { "Dew Point" => 13.8, "Geop Height" => 7.7, "High Cloud Cover" => "0.0", "Humidity" => 72.9, "Low Cloud Cover" => 31.3, "Medium Cloud Cover" => "0.0", "Precipitation Amount" => "0.0", "Pressure" => 1016.1, "Radiation Global" => 716.7, "Radiation Global Accumulation" => "8461717.0", "Radiation Net Surface LW Accumulation" => "NaN", "Radiation Net Surface SW Accumulation" => "7750386.0", "Radiation SW Accumulation" => "2224256.0", "Temperature" => 20.6, "Total Cloud Cover" => 31.3, "Visibility" => 38507.5, "Wind Direction" => "215.0", "Wind Gusts" => 10.9, "Wind Speed MS" => 6.64, "Wind UMS" => 3.94, "Wind VMS" => 5.33, }, { "Dew Point" => 13.7, "Geop Height" => 7.7, "High Cloud Cover" => "0.0", "Humidity" => 71.1, "Low Cloud Cover" => 2.1, "Medium Cloud Cover" => "0.0", "Precipitation Amount" => "0.0", "Pressure" => 1016.4, "Radiation Global" => 727.6, "Radiation Global Accumulation" => "11081057.0", "Radiation Net Surface LW Accumulation" => "NaN", "Radiation Net Surface SW Accumulation" => "10146792.0", "Radiation SW Accumulation" => 2372829.8, "Temperature" => 21.1, "Total Cloud Cover" => 2.1, "Visibility" => 40682.2, "Wind Direction" => "222.0", "Wind Gusts" => 11.2, "Wind Speed MS" => 6.82, "Wind UMS" => 4.67, "Wind VMS" => 4.95, }, { "Dew Point" => 13.2, "Geop Height" => 7.7, "High Cloud Cover" => "0.0", "Humidity" => 67.7, "Low Cloud Cover" => 0.8, "Medium Cloud Cover" => "0.0", "Precipitation Amount" => "0.0", "Pressure" => 1016.4, "Radiation Global" => 749.9, "Radiation Global Accumulation" => "13780637.0", "Radiation Net Surface LW Accumulation" => "NaN", "Radiation Net Surface SW Accumulation" => "12614863.0", "Radiation SW Accumulation" => 4011495.3, "Temperature" => 21.6, "Total Cloud Cover" => 0.8, "Visibility" => 44407.4, "Wind Direction" => "220.0", "Wind Gusts" => 11.3, "Wind Speed MS" => 6.71, "Wind UMS" => 4.37, "Wind VMS" => 5.1, }, )

      I would first split and then process in chunks of 21 items:

      my @values = split /\s+/, $weatherdata; croak "Invalid item count in weather data" unless @values % 21 == 0; while(my @row = splice @values, 0, 21 ) { .... }
        Is there a way to use split on blocks of 21 items at a time?

      Sure -- ask for the first 22 blocks (n+1) using split, then re-do the process with the last block (which will will contain the remaining bits). Repeat until you're out of blocks.

      Alex / talexb / Toronto

      Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.