in reply to Get Unassigned drive average

Yes, but what if the format turns out to not be exactly the same as shown? It's safer to record every raw capacity in turn, then if you run across Unassigned, add the raw capacity to the total:
use strict; use warnings; my ($raw, $capacity, $unassigned); while (<DATA>) { $raw = $1 if m/Raw capacity: (\d+\.\d+) GB/; if (m/Mode: Unassigned/) { $capacity += $raw; $unassigned++; } } print int ($capacity / $unassigned * 1000 + .5) / 1000; __DATA__ Drive at Tray 0, Slot 5 Raw capacity: 68.366 GB Usable capacity: 67.866 GB Current data rate: 2 Gbps Product ID: ST373453FC Mode: Unassigned Drive at Tray 0, Slot 5 Raw capacity: 48.366 GB Usable capacity: 67.866 GB Current data rate: 2 Gbps Product ID: ST373453FC Mode: Assigned Drive at Tray 0, Slot 5 Raw capacity: 88.366 GB Usable capacity: 67.866 GB Current data rate: 2 Gbps Product ID: ST373453FC Mode: Unassigned

Replies are listed 'Best First'.
Re^2: Get Unassigned drive average
by BrowserUk (Patriarch) on Dec 31, 2005 at 04:13 UTC

    I'm not quite sure why the output from the utility producing this file would suddenly start to vary, but you have a point about resilience.

    However, your code provides very little (if an) extra resilience to mine, and exposes several extra weaknesses:

    1. If a record has the 'Unassigned' line, but no 'Raw capacity' line, then you will wrongly use the capacity from the preceding record.
    2. If the record correctly has both the required lines, but they are in reverse order, you would again wrongly use the preceding records capacity.
    3. You've an extra dependency that string 'Mode: Unassigned' be present, formatted exactly as specified, and correctly spelt.
    4. You've added the constraint that the drive capacity be specified with at least one decimal place on the figure, and that it be reported in GB.

      What happens if the drive has exactly '80 GB' or is reported in 'MB' or 'TB' or 'GiB' or 'Gigabytes' or...?

    The general rule with regex, (that I follow since someone here suggested it to me way back), is to specify the regex as loosely as possible commensurate with obtaining the information required.

    I'd also suggest that processing multi-line records, line-by-line is a dangerous practice if there is any scope for variability in the the number, or ordering, of the lines that make up those records.

    All that said, you have a point regarding resilience, and here is a technique that allows for some considerable resilience in ordering of elements, whether single or multi-line, whilst avoiding most of the traps:

    Which produces:

    P:\test>junk3 0, Slot 2 : 68.366 GB 0, Slot 3 : 68.366 GB 0, Slot 4 : 68.366 GB 0, Slot 5 : 68.366 GB Badly formatted record: ---------------------------------------- at Tray 0, Slot 6 Raw capocity: 68.366 GB Usable capacity: 67.866 GB Current data rate: 2 Gbps Product ID: ST373453FC Mode: Unassigned ----------------------------------------

    The basic idea is to place the captures within zero-length assertions so that the the ordering of the elements captured can vary completely, but the match and captures will still be made if all the required elements are present. It also ensures that the same elements will appear in the same capture vars ($1,$2 etc.) regardless of their ordering in the record; which avoids the problem of knowing what has been captured to where.

    An extension of this technique is that it allows you to specify all the elements to be captured in a different order (in the regex) to the order in which they will appear in the data. This is extremely useful when some elements are optional, as you can arrange for the non-optional elements to be returned first and so avoid the game of deciding what got captured into each of the capture vars.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.