I'm trying to process the text output from a program where the output format keeps changing. I've tried using a format string and unpack() but this is getting unwieldy now that I have a few formats to deal with.
Not including the ">" and "<" characters, example data strings are:
>SS 21 PL 2#3 PV 51.3 CL #110 +0 RL 126' SA 106 DS 93< ...(i)
>SS 21 PL 2#3 PVa51.3 CT^ 110 +0 RL126, SA 106 DS 93< ...(ii
+)
>SS 21 PL 2#3 PV 51.3 CL #110 +0 RL 126' SA 106 DS 93< ...(ii
+i)
For subsequent processing, I'd prefer to end-up with the format shown in (iii).
I'm thinking along the lines of using some sort of regex but I just can't get my head around the silly things. Maybe I can do the 'extraction' of the fields one by one or perhaps I can do it all in one hit, I don't know. Performance level isn't a huge priority, as most of the programs that need to do this run as overnight batch jobs.
I guess the sort of thing I'm looking for is something like:
(@items) = MAGIC($buf);
...after which I can glean the following info:
SS 21 => SS = 21
PL=2#3 => SP = 2 ...could be m.n, m.n#, m#n, m#n#
SP# = yes
LP = 3
LP# = no
PVa51.3 => SP vote = a51 ...could be x.y
LP vote = 3
CT^ 110 + 0 => CT = 110
CT# = yes
Rot state = "+"
Rot value = 0
RL126, => RL = 126
flag1 = true ...shown by , or '
flag2 = no ...shown by "
SA 106 DS 93 => SA = 106
DS for SA = 93
Apologies for the cryptic/variable descriptions -- some of the ways the software formats its output is a little scatter-brained IMO(!)
I'd appreciate any suggestions on how I might attack the problem.
Thanks a heap.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
|
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.