I'm not quite sure why the output from the utility producing this file would suddenly start to vary, but you have a point about resilience.

However, your code provides very little (if an) extra resilience to mine, and exposes several extra weaknesses:

  1. If a record has the 'Unassigned' line, but no 'Raw capacity' line, then you will wrongly use the capacity from the preceding record.
  2. If the record correctly has both the required lines, but they are in reverse order, you would again wrongly use the preceding records capacity.
  3. You've an extra dependency that string 'Mode: Unassigned' be present, formatted exactly as specified, and correctly spelt.
  4. You've added the constraint that the drive capacity be specified with at least one decimal place on the figure, and that it be reported in GB.

    What happens if the drive has exactly '80 GB' or is reported in 'MB' or 'TB' or 'GiB' or 'Gigabytes' or...?

The general rule with regex, (that I follow since someone here suggested it to me way back), is to specify the regex as loosely as possible commensurate with obtaining the information required.

I'd also suggest that processing multi-line records, line-by-line is a dangerous practice if there is any scope for variability in the the number, or ordering, of the lines that make up those records.

All that said, you have a point regarding resilience, and here is a technique that allows for some considerable resilience in ordering of elements, whether single or multi-line, whilst avoiding most of the traps:

#! perl -slw use strict; $/ = 'Drive'; while( <DATA> ) { next if $. == 1; if( m[ (?=^.*Tray \s+ (.*?) \n ) (?=.*Raw \s capacity: \s+ (.*?) \n ) (?=^.*Unassigned) ]xs ) { print "$1 : $2" } elsif( !m[Assigned] ) { printf STDERR "Badly formatted record:\n%s\n%s\n%s\n\n", '-' x 40, $_, '-' x 40; } } __DATA__ Drive at Tray 0, Slot 1 Raw capacity: 68.366 GB Usable capacity: 67.866 GB Current data rate: 2 Gbps Product ID: ST373453FC Mode: Assigned Drive at Tray 0, Slot 2 Raw capacity: 68.366 GB Usable capacity: 67.866 GB Current data rate: 2 Gbps Product ID: ST373453FC Mode: Unassigned Drive at Tray 0, Slot 3 Mode: Unassigned Usable capacity: 67.866 GB Current data rate: 2 Gbps Product ID: ST373453FC Raw capacity: 68.366 GB Drive at Tray 0, Slot 4 Mode: Unassigned Other random junk: some Usable capacity: 67.866 GB Current data rate: 2 Gbps Product ID: ST373453FC Raw capacity: 68.366 GB Drive at Tray 0, Slot 5 Raw capacity: 68.366 GB Mode: Unassigned Drive at Tray 0, Slot 6 Raw capocity: 68.366 GB Usable capacity: 67.866 GB Current data rate: 2 Gbps Product ID: ST373453FC Mode: Unassigned

Which produces:

P:\test>junk3 0, Slot 2 : 68.366 GB 0, Slot 3 : 68.366 GB 0, Slot 4 : 68.366 GB 0, Slot 5 : 68.366 GB Badly formatted record: ---------------------------------------- at Tray 0, Slot 6 Raw capocity: 68.366 GB Usable capacity: 67.866 GB Current data rate: 2 Gbps Product ID: ST373453FC Mode: Unassigned ----------------------------------------

The basic idea is to place the captures within zero-length assertions so that the the ordering of the elements captured can vary completely, but the match and captures will still be made if all the required elements are present. It also ensures that the same elements will appear in the same capture vars ($1,$2 etc.) regardless of their ordering in the record; which avoids the problem of knowing what has been captured to where.

An extension of this technique is that it allows you to specify all the elements to be captured in a different order (in the regex) to the order in which they will appear in the data. This is extremely useful when some elements are optional, as you can arrange for the non-optional elements to be returned first and so avoid the game of deciding what got captured into each of the capture vars.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re^2: Get Unassigned drive average by BrowserUk
in thread Get Unassigned drive average by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.