in reply to Parsing a Variable Format String

You could try to split the string on whitespace and count the number of whitespace-separated tokens on each line:
my @items = split /\s+/, $buf; if (scalar(@items) == 13) { # process (ii) } else { # process (i) or (iii) }

This assumes that the number of token in a line determines the line type.

Update: try this:

use strict; use warnings; while (<DATA>) { my @items = split /\s+/, $_; if (scalar(@items) == 13) { my (@new) = ($items[4] =~ /(PV)(.*)/); my (@new2) = ($items[8] =~ /(RL)(.*)/); splice @items, 4, 1, @new; splice @items, 9, 1, @new2; } print "@items\n"; # now @items always contains same number of to +kens # process items... } __DATA__ SS 21 PL 2#3 PV 51.3 CL #110 +0 RL 126' SA 106 DS 93 SS 21 PL 2#3 PVa51.3 CT^ 110 +0 RL126, SA 106 DS 93 SS 21 PL 2#3 PV 51.3 CL #110 +0 RL 126' SA 106 DS 93

prints:

SS 21 PL 2#3 PV 51.3 CL #110 +0 RL 126' SA 106 DS 93 SS 21 PL 2#3 PV a51.3 CT^ 110 +0 RL 126, SA 106 DS 93 SS 21 PL 2#3 PV 51.3 CL #110 +0 RL 126' SA 106 DS 93

Replies are listed 'Best First'.
Re^2: Parsing a Variable Format String
by ozboomer (Friar) on Jul 10, 2008 at 02:40 UTC
    Thanks for the suggestions... I'll put it in the pot(!)

    After some more walking and thinking, a woefully poor way of doing what I (ultimately) need might be:
    @opts = ( "SS 21 PL 2#3 PV 51.3 CL #110 +0 RL 126' SA 106 DS 93", "SS 21 PL 2.3 PVa51.3 CT^ 110 +0 RL126, SA 106 DS 93", "SS 21 PL2#3# PV 51.3 CL #110 +0 RL 126' SA 106 DS 93" +, "SS 21 PL2.3# PV 51.3 CL #110 +0 RL 126' SA 106 DS 93" ); foreach $buf (@opts) { printf(" 1 2 3 4 5 + 6\n"); printf(" 0123456789012345678901234567890123456789012345678901 +234567890\n"); printf("\$buf: >%s<\n", $buf); printf("\n"); $ssos = index($buf, "SS", 0); $plos = index($buf, "PL", $ssos); $pvos = index($buf, "PV", $plos); $ctos = index($buf, "CT", $pvos); if ($ctos < 0) { $ctos = index($buf, "CL", $pvos); } $rlos = index($buf, "RL", $ctos); $saos = index($buf, "SA", $rlos); $dsos = index($buf, "DS", $saos); printf("\$ssos = $ssos\n"); printf("\$plos = $plos\n"); printf("\$pvos = $pvos\n"); printf("\$ctos = $ctos\n"); printf("\$rlos = $rlos\n"); printf("\$saos = $saos\n"); printf("\$dsos = $dsos\n"); printf("\n"); $ssstr = substr($buf, $ssos+2, $plos - $ssos - 2); $plstr = substr($buf, $plos+2, $pvos - $plos - 2); $pvstr = substr($buf, $pvos+2, $ctos - $pvos - 2); $ctstr = substr($buf, $ctos+2, $rlos - $ctos - 2); $rlstr = substr($buf, $rlos+2, $saos - $rlos - 2); $sastr = substr($buf, $saos+2, $dsos - $saos - 2); $dsstr = substr($buf, $dsos+2); $ssstr =~ s/\s+//g; $plstr =~ s/\s+//g; $pvstr =~ s/\s+//g; $ctstr =~ s/\s+//g; $rlstr =~ s/\s+//g; $sastr =~ s/\s+//g; $dsstr =~ s/\s+//g; printf("\$ssstr = >$ssstr<\n"); printf("\$plstr = >$plstr<\n"); printf("\$pvstr = >$pvstr<\n"); printf("\$ctstr = >$ctstr<\n"); printf("\$rlstr = >$rlstr<\n"); printf("\$sastr = >$sastr<\n"); printf("\$dsstr = >$dsstr<\n"); printf("\n"); } # another opt
    ...which gives a typical output:
    $ssstr = >21< $plstr = >2#3< $pvstr = >51.3< $ctstr = >#110+0< $rlstr = >126'< $sastr = >106< $dsstr = >93<
    ...but that's pretty dashed ugly, even if it does work.

    Now, if I could replicate all that index/substr garbage with something more elegant...

      The more elegant is, as you already guessed, a regex.

      When you are looking for 'CT', the regex is /CT/. when you are looking for the first number after 'CT', the regex becomes/CT .*? (\d*)/x. The x at the end of the regex allows me to insert spaces so that the regex is easier to read. They don't get matched. If you really need to match a space, you can put a slash before it or use \s which parses anything spacy, like tab characters too

      When this regex matches something, it returns true. In that case what was parsed between the first and only parens is now in $1. Further parens in the regex would be stored in $2,$3,$4 and so on

      The .*? matches anything, but tries to match as few characters as possible

      With slight variations of this regex you probably can substitute all your index thingies.

      You can even combine your regexes to one long regex by combining all of them with .*? inbetween.