Re: Parsing a Variable Format String

You could try to split the string on whitespace and count the number of whitespace-separated tokens on each line:

my @items = split /\s+/, $buf;
if (scalar(@items) == 13) {
    # process (ii)
}
else {
    # process (i) or (iii)
}
[download]

This assumes that the number of token in a line determines the line type.

Update: try this:

use strict;
use warnings;

while (<DATA>) {
    my @items = split /\s+/, $_;
    if (scalar(@items) == 13) {
         my (@new)  = ($items[4] =~ /(PV)(.*)/);
         my (@new2) = ($items[8] =~ /(RL)(.*)/);
         splice @items, 4, 1, @new;
         splice @items, 9, 1, @new2;
    }
    print "@items\n";   # now @items always contains same number of to
+kens
    # process items...
}

__DATA__
SS  21   PL 2#3  PV  51.3 CL #110 +0 RL 126' SA 106 DS 93
SS  21   PL 2#3  PVa51.3 CT^ 110 +0 RL126, SA 106 DS 93
SS  21   PL 2#3  PV   51.3 CL #110 +0 RL 126' SA 106 DS 93
[download]

prints:

SS 21 PL 2#3 PV 51.3 CL #110 +0 RL 126' SA 106 DS 93
SS 21 PL 2#3 PV a51.3 CT^ 110 +0 RL 126, SA 106 DS 93
SS 21 PL 2#3 PV 51.3 CL #110 +0 RL 126' SA 106 DS 93
[download]

Comment on Re: Parsing a Variable Format String Select or Download Code

Replies are listed 'Best First'.
Re^2: Parsing a Variable Format String by ozboomer (Friar) on Jul 10, 2008 at 02:40 UTC
Thanks for the suggestions... I'll put it in the pot(!) After some more walking and thinking, a woefully poor way of doing what I (ultimately) need might be: @opts = ( "SS 21 PL 2#3 PV 51.3 CL #110 +0 RL 126' SA 106 DS 93", "SS 21 PL 2.3 PVa51.3 CT^ 110 +0 RL126, SA 106 DS 93", "SS 21 PL2#3# PV 51.3 CL #110 +0 RL 126' SA 106 DS 93" +, "SS 21 PL2.3# PV 51.3 CL #110 +0 RL 126' SA 106 DS 93" ); foreach $buf (@opts) { printf(" 1 2 3 4 5 + 6\n"); printf(" 0123456789012345678901234567890123456789012345678901 +234567890\n"); printf("\$buf: >%s<\n", $buf); printf("\n"); $ssos = index($buf, "SS", 0); $plos = index($buf, "PL", $ssos); $pvos = index($buf, "PV", $plos); $ctos = index($buf, "CT", $pvos); if ($ctos < 0) { $ctos = index($buf, "CL", $pvos); } $rlos = index($buf, "RL", $ctos); $saos = index($buf, "SA", $rlos); $dsos = index($buf, "DS", $saos); printf("\$ssos = $ssos\n"); printf("\$plos = $plos\n"); printf("\$pvos = $pvos\n"); printf("\$ctos = $ctos\n"); printf("\$rlos = $rlos\n"); printf("\$saos = $saos\n"); printf("\$dsos = $dsos\n"); printf("\n"); $ssstr = substr($buf, $ssos+2, $plos - $ssos - 2); $plstr = substr($buf, $plos+2, $pvos - $plos - 2); $pvstr = substr($buf, $pvos+2, $ctos - $pvos - 2); $ctstr = substr($buf, $ctos+2, $rlos - $ctos - 2); $rlstr = substr($buf, $rlos+2, $saos - $rlos - 2); $sastr = substr($buf, $saos+2, $dsos - $saos - 2); $dsstr = substr($buf, $dsos+2); $ssstr =~ s/\s+//g; $plstr =~ s/\s+//g; $pvstr =~ s/\s+//g; $ctstr =~ s/\s+//g; $rlstr =~ s/\s+//g; $sastr =~ s/\s+//g; $dsstr =~ s/\s+//g; printf("\$ssstr = >$ssstr<\n"); printf("\$plstr = >$plstr<\n"); printf("\$pvstr = >$pvstr<\n"); printf("\$ctstr = >$ctstr<\n"); printf("\$rlstr = >$rlstr<\n"); printf("\$sastr = >$sastr<\n"); printf("\$dsstr = >$dsstr<\n"); printf("\n"); } # another opt [download] ...which gives a typical output: `$ssstr = >21< $plstr = >2#3< $pvstr = >51.3< $ctstr = >#110+0< $rlstr = >126'< $sastr = >106< $dsstr = >93<` [download] ...but that's pretty dashed ugly, even if it does work. Now, if I could replicate all that index/substr garbage with something more elegant...	[reply] [d/l] [select]
Re^3: Parsing a Variable Format String by jethro (Monsignor) on Jul 10, 2008 at 03:04 UTC
The more elegant is, as you already guessed, a regex. When you are looking for 'CT', the regex is `/CT/`. when you are looking for the first number after 'CT', the regex becomes`/CT .? (\d)/x`. The x at the end of the regex allows me to insert spaces so that the regex is easier to read. They don't get matched. If you really need to match a space, you can put a slash before it or use `\s` which parses anything spacy, like tab characters too When this regex matches something, it returns true. In that case what was parsed between the first and only parens is now in $1. Further parens in the regex would be stored in $2,$3,$4 and so on The `.?` matches anything, but tries to match as few characters as possible With slight variations of this regex you probably can substitute all your index thingies. You can even combine your regexes to one long regex by combining all of them with `.?` inbetween.	[reply] [d/l] [select]
Re^4: Parsing a Variable Format String by ysth (Canon) on Jul 10, 2008 at 03:45 UTC
The .? matches anything* Any number of non-newlines, unless you specify /s. -- Online Fortune Cookie Search	[reply]