comment on

If the data file is always consistent with the example you gave, in the following respects:

record blocks are always separated by one or more blank lines
each record block always begins with "]a["
the "field delimiter" pattern within each record is always three characters: open-sq-bracket, single-letter-or-digit, close-sq-bracket

then here is a way that would let you read the file one whole record at a time, and load fields into a hash, keyed by the tag name:

open( IN, $filename ) or die "$filename: $!";
{
    local $/ = '';  # cf. perldoc perlvar about $/ and "paragraph mode
+"

    while (<IN>)    # read a whole record into $_
    {
        @fields = split( /\](\w)\[/ ); # use parens to capture the let
+ters
        shift @fields; # split puts an empty element before ']a[', so 
+drop that

        my %record = @fields;  # convert array to key=>value hash

        # you now have tags as hash keys, strings as hash values:
        # $record{"a"}==1, $record{"b"}=="FORTUNE BAY", etc.
        # to use as you see fit.
    }
}
[download]

If you want to have all records in memory at once (not just one record at a time), you can simply declare an array before the first while loop, and then after the fields are loaded into %record just push @array, { %record }; to build an array of hashes (AoH), and get to individual fields of a record like this: $array[0]{"a"}

I wouldn't use "$a" or "$b" as names for scalar variables like you suggested -- this can get messed up if you use the "sort" function in the same scope as these variables.

If the input data varies from file to file or record to record regarding the features listed above, you'll need to tweak this approach (or tweak the data before using it).

(update: I simplified the split regex so that it's easier to read -- and it accepts a wider range of tags than appears in the OP data, which probably won't cause a problem. (ahem...) Then I updated again to get the bracketing right in that regex.)

In reply to Re: Tag pattern matching by graff
in thread Tag pattern matching by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.