in reply to Re: Get Unassigned drive average
in thread Get Unassigned drive average
I'm not quite sure why the output from the utility producing this file would suddenly start to vary, but you have a point about resilience.
However, your code provides very little (if an) extra resilience to mine, and exposes several extra weaknesses:
What happens if the drive has exactly '80 GB' or is reported in 'MB' or 'TB' or 'GiB' or 'Gigabytes' or...?
The general rule with regex, (that I follow since someone here suggested it to me way back), is to specify the regex as loosely as possible commensurate with obtaining the information required.
I'd also suggest that processing multi-line records, line-by-line is a dangerous practice if there is any scope for variability in the the number, or ordering, of the lines that make up those records.
All that said, you have a point regarding resilience, and here is a technique that allows for some considerable resilience in ordering of elements, whether single or multi-line, whilst avoiding most of the traps:
#! perl -slw use strict; $/ = 'Drive'; while( <DATA> ) { next if $. == 1; if( m[ (?=^.*Tray \s+ (.*?) \n ) (?=.*Raw \s capacity: \s+ (.*?) \n ) (?=^.*Unassigned) ]xs ) { print "$1 : $2" } elsif( !m[Assigned] ) { printf STDERR "Badly formatted record:\n%s\n%s\n%s\n\n", '-' x 40, $_, '-' x 40; } } __DATA__ Drive at Tray 0, Slot 1 Raw capacity: 68.366 GB Usable capacity: 67.866 GB Current data rate: 2 Gbps Product ID: ST373453FC Mode: Assigned Drive at Tray 0, Slot 2 Raw capacity: 68.366 GB Usable capacity: 67.866 GB Current data rate: 2 Gbps Product ID: ST373453FC Mode: Unassigned Drive at Tray 0, Slot 3 Mode: Unassigned Usable capacity: 67.866 GB Current data rate: 2 Gbps Product ID: ST373453FC Raw capacity: 68.366 GB Drive at Tray 0, Slot 4 Mode: Unassigned Other random junk: some Usable capacity: 67.866 GB Current data rate: 2 Gbps Product ID: ST373453FC Raw capacity: 68.366 GB Drive at Tray 0, Slot 5 Raw capacity: 68.366 GB Mode: Unassigned Drive at Tray 0, Slot 6 Raw capocity: 68.366 GB Usable capacity: 67.866 GB Current data rate: 2 Gbps Product ID: ST373453FC Mode: Unassigned
Which produces:
P:\test>junk3 0, Slot 2 : 68.366 GB 0, Slot 3 : 68.366 GB 0, Slot 4 : 68.366 GB 0, Slot 5 : 68.366 GB Badly formatted record: ---------------------------------------- at Tray 0, Slot 6 Raw capocity: 68.366 GB Usable capacity: 67.866 GB Current data rate: 2 Gbps Product ID: ST373453FC Mode: Unassigned ----------------------------------------
The basic idea is to place the captures within zero-length assertions so that the the ordering of the elements captured can vary completely, but the match and captures will still be made if all the required elements are present. It also ensures that the same elements will appear in the same capture vars ($1,$2 etc.) regardless of their ordering in the record; which avoids the problem of knowing what has been captured to where.
An extension of this technique is that it allows you to specify all the elements to be captured in a different order (in the regex) to the order in which they will appear in the data. This is extremely useful when some elements are optional, as you can arrange for the non-optional elements to be returned first and so avoid the game of deciding what got captured into each of the capture vars.
|
|---|