Re: searching a file, results into an array

The regex is rather simple for this but I think split would be even better (more efficient) if you are sure they will always be seperated by a first tab and any line with a first tab is valid. I put them into an array of arrays, more efficient than a hash keyed on doc if you only ever want to read through them sequentialy. Just another way to do it....

#!/usr/local/bin/perl -w
use strict;

my @documents;
while (<DATA>) {
    # uncomment following line for the regex way
    # if (/^([\S]*.DOC)\t(.*)/) {push @documents, [$1, $2]}

    # uncomment these to use the split method
    # chomp;
    # next unless (my ($doc, $title)=split /\t/, $_, 2);
    # push @documents, [$doc, $title];

}

print "I found the following docs\n\n";
foreach (@documents) {
    print "Doc: $_->[0] \t Title: $_->[1]\n";
}

__DATA__
RS0029.DOC      INTER UNIT HARNESS REQUIREMENT SPECIFICATION

RS0036.DOC      INSTRUMENT ELECTRONICS UNIT
RS0037.DOC      MECHANISM CONTROL ELECTRONICS
RS0041.DOC      IOU DESCAN MECHANISM

RS0042.DOC      IOU GENERIC MECHANISMS
[download]

Note the regex given is a bit more fussy than the obvious /(.*)\t(.*)/ which would cause you grief if the title contained a tab (if you don't know why read up about greedy pattern matching, it is very important)

Cheers,
R.

Comment on Re: searching a file, results into an array Select or Download Code

Replies are listed 'Best First'.
Re^2: searching a file, results into an array by perlcapt (Pilgrim) on Oct 14, 2004 at 02:58 UTC
This topic is pretty well worked out, but want to add my 2bits: I like the list of lists of this solution over the hash method. The reason being that the list retains the sequence of records. I prefer a regular expression over a split in this type of format.. reason: there may be other tabs on the line. Since thare are no spaces in the filenames (in your example), I would use this `($filename,$description) = ($line =~ m/(^\S+)\s+(.)/);` [download] or as given in the referenced comment: `if($line =~ m/(^\S+)\s+(.)/) { push @documents, [$1,$2]; }` [download] Kinda of a "me too" comment; I know.	[reply] [d/l] [select]
Re^3: searching a file, results into an array by Random_Walk (Prior) on Oct 14, 2004 at 10:41 UTC
Hi perlcapt The split I was using had the third parameter, (number of parts to split into) set to two. This prevents it eating any tabs beyond the first so any in the title are no problem. I think it has to remain the prefered option for efficiency as long as the file is all either blank lines or docs and tittles seperated by a tab. In my regex I included the litteral .DOC to improve rejection of spurrious lines though of course I am assuming no .XLS or .PPT files. I did make a couple of errors though... `# I gave /^([\S].DOC)\t(.)/ # the class grouping [] for \S is of course silly and # I forgot to escape the . in .DOC # this would have been better /^(\S\.DOC)\t(.)/` [download] Cheers, R.	[reply] [d/l]