I have to read a few files, that are stored into some directory and have similar content that looks like this:

BOF_1
start of tree ------------------ Da | something > 1234: 1 (81.06/25.89) Number of Items : 15 Size of the Store : 30 something else here === Summary Matrix === a b <-- classified as 11111 222 | a = 0 3333 444 | b = 1
EOF_1

The second file has the following content:

BOF_2
start of tree ------------------ Da | something > 456: 1 (48.07/89.21) Number of Items : 50 Size of the Store : 100 something else here === Summary Matrix === a b <-- classified as 55555 666 | a = 0 7777 888 | b = 1

EOF_2

I would like to be able to extract specific syntaxes from these files (see example of output file below), and write them into a new file (i.e. as an array, .csv preferred).

In addition, the export file needs to have a one-line header, like shown below. The header entries are NOT pulled from the input files, but rather hard-coded in the code.

For example, the export file should be looking like this (note that except the file names, the numbers listed in the export file below are the actual variable I need to extract from the input files)

filename aa ba ab bb no_of_items size_of_store file1 11111 222 3333 444 15 30 file2 55555 666 7777 888 50 100 etc... (for extra files)

Thanks a lot for your help!

Started writing it as follows. Need help with the syntax extraction part.

$resFolder = Somelocation; # defined in other part of the code @files = glob($resFolder ."/". "t-*.csv"); $numFiles = scalar(@files); $ResSummaryFile = $resFolder .'/' . 'ResSummary' . '.csv'; @header = ("filename","aa","ba","ab","bb","#_of_items","size_of_store" +); my @tempReadOuts; ##loop through file list to extract necessary data (results): for ($i = 0; $i < $numFiles; $i++) { open(FH1, $files[$i]); print "Processing file... $files[$i]\n"; $tempReadOuts[0]=$files[$i]; open(FH2, ">$ResSummaryFile"); while (<FH1>) { chomp; @this_line = split("\n", $_); for ($j = 0; $j < scalar(@this_line); $j++) { ## code to extract syntaxes should go here; } print FH2 "@tempReadOuts/n"; } close(FH1); close(FH2);

In reply to extract syntaxes from file into array, as new file by chinescu

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.