extract syntaxes from file into array, as new file

chinescu has asked for the wisdom of the Perl Monks concerning the following question:

I have to read a few files, that are stored into some directory and have similar content that looks like this:

BOF_1


start of tree
------------------

Da         
|  something > 1234: 1 (81.06/25.89)

Number of Items  :     15

Size of the Store :     30

something else here

=== Summary Matrix ===

     a     b   <-- classified as
 11111   222 |     a = 0
  3333   444 |     b = 1
[download]

EOF_1

The second file has the following content:

BOF_2


start of tree
------------------

Da         
|  something > 456: 1 (48.07/89.21)

Number of Items  :     50

Size of the Store :     100

something else here

=== Summary Matrix ===

     a     b   <-- classified as
 55555   666 |     a = 0
  7777   888 |     b = 1
[download]

EOF_2

I would like to be able to extract specific syntaxes from these files (see example of output file below), and write them into a new file (i.e. as an array, .csv preferred).

In addition, the export file needs to have a one-line header, like shown below. The header entries are NOT pulled from the input files, but rather hard-coded in the code.

For example, the export file should be looking like this (note that except the file names, the numbers listed in the export file below are the actual variable I need to extract from the input files)

filename aa ba ab bb no_of_items size_of_store
file1 11111 222 3333 444 15 30
file2 55555 666 7777 888 50 100
etc... (for extra files)
[download]

Thanks a lot for your help!

Started writing it as follows. Need help with the syntax extraction part.


$resFolder = Somelocation; # defined in other part of the code

@files = glob($resFolder ."/". "t-*.csv");
$numFiles = scalar(@files);

$ResSummaryFile = $resFolder .'/' . 'ResSummary' . '.csv';
@header = ("filename","aa","ba","ab","bb","#_of_items","size_of_store"
+);

my @tempReadOuts; 

##loop through file list to extract necessary data (results):
for ($i = 0; $i < $numFiles; $i++) {
    open(FH1, $files[$i]);
    print "Processing file... $files[$i]\n";
    $tempReadOuts[0]=$files[$i];
        open(FH2, ">$ResSummaryFile");
        while (<FH1>) {
                chomp;
                @this_line = split("\n", $_);
            for ($j = 0; $j < scalar(@this_line); $j++) {
            
                    ## code to extract syntaxes should go here;

            }
        
            print FH2 "@tempReadOuts/n";
        }
    close(FH1);
    close(FH2);
[download]

Comment on extract syntaxes from file into array, as new file Select or Download Code

Replies are listed 'Best First'.
Re: extract syntaxes from file into array, as new file by toolic (Bishop) on Dec 13, 2011 at 00:34 UTC
You could use a regular expression to extract the numbers: `use warnings; use strict; my @nums; while (<DATA>) { if (/ (\d+) \s+ (\d+) \s+ \\| /x) { push @nums, $1, $2; } } print "file1 @nums"; __DATA__ start of tree ------------------ Da \| something > 1234: 1 (81.06/25.89) Number of Items : 15 Size of the Store : 30 something else here === Summary Matrix === a b <-- classified as 11111 222 \| a = 0 3333 444 \| b = 1` [download]	[reply] [d/l]
Re: extract syntaxes from file into array, as new file by TJPride (Pilgrim) on Dec 13, 2011 at 09:59 UTC
I leave the file stuff up to you, since that part is easy and you have to code at least some of the project. But here's the regex you need: use strict; use warnings; my %results; $_ = join '', <DATA>; ($results{'no_of_items'}) = m/Number of Items\s+:\s+(\d+)/; ($results{'size_of_store'}) = m/Size of the Store\s+:\s+(\d+)/; ($results{'aa'}, $results{'ba'}, $results{'ab'}, $results{'bb'}) = m/\ +s+a\s+b\s+<-- classified as\s+(\d+)\s+(\d+)\s+\\|\s+\w+ = \d+\s+(\d+)\ +s+(\d+)/; ### Just to display what's in hash, you still need to output it in you +r format use Data::Dumper; print Dumper(\%results); __DATA__ start of tree ------------------ Da \| something > 1234: 1 (81.06/25.89) Number of Items : 15 Size of the Store : 30 something else here === Summary Matrix === a b <-- classified as 11111 222 \| a = 0 3333 444 \| b = 1 [download]	[reply] [d/l]