I'm trying to process the text output from a program where the output format keeps changing. I've tried using a format string and unpack() but this is getting unwieldy now that I have a few formats to deal with.

Not including the ">" and "<" characters, example data strings are:
>SS 21 PL 2#3 PV 51.3 CL #110 +0 RL 126' SA 106 DS 93< ...(i) >SS 21 PL 2#3 PVa51.3 CT^ 110 +0 RL126, SA 106 DS 93< ...(ii +) >SS 21 PL 2#3 PV 51.3 CL #110 +0 RL 126' SA 106 DS 93< ...(ii +i)
For subsequent processing, I'd prefer to end-up with the format shown in (iii).

I'm thinking along the lines of using some sort of regex but I just can't get my head around the silly things. Maybe I can do the 'extraction' of the fields one by one or perhaps I can do it all in one hit, I don't know. Performance level isn't a huge priority, as most of the programs that need to do this run as overnight batch jobs.

I guess the sort of thing I'm looking for is something like:

(@items) = MAGIC($buf);
...after which I can glean the following info:
SS 21 => SS = 21 PL=2#3 => SP = 2 ...could be m.n, m.n#, m#n, m#n# SP# = yes LP = 3 LP# = no PVa51.3 => SP vote = a51 ...could be x.y LP vote = 3 CT^ 110 + 0 => CT = 110 CT# = yes Rot state = "+" Rot value = 0 RL126, => RL = 126 flag1 = true ...shown by , or ' flag2 = no ...shown by " SA 106 DS 93 => SA = 106 DS for SA = 93
Apologies for the cryptic/variable descriptions -- some of the ways the software formats its output is a little scatter-brained IMO(!)

I'd appreciate any suggestions on how I might attack the problem.

Thanks a heap.


In reply to Parsing a Variable Format String by ozboomer

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.