I was in need of extracting a ton of data from an output file that is fairly systematic in nature so I wrote it as a batch file with a little help (I can hack my way through batch files, I'm definitely not an expert). The problem is there are 500,000 lines in the output file, several files to run the batch on and the batch takes several hours to run each time - not very efficient.

I've seen things done in perl that are very fast, but I don't have any experience and I really don't know how to get it done. I assume there is no direct translator for batch->>perl, as that wouldn't really make sense. Any help would be fantastic:

Problem:

1. Find the string INTERPOLATED HYDROGRAPH

2. Copy next data column to first column of new “Output.txt” (CAC40 in this case)

3. Go down 6 rows and copy 2nd column (1223.) to column 2 of new “Output.txt”

4. Go down 2 more rows (8 total) and copy 6th column (1456.) to columns 3 of new “Output.txt”

5. repeat several thousand times

---------------------------

SAMPLE OF INPUT FILE (text I need extracted in bold)

*** *** *** *** ***

INTERPOLATED HYDROGRAPH AT CAC40

blank line here

PEAK FLOW TIME MAXIMUM AVERAGE FLOW

6-HR 24-HR 72-HR 166.58-HR

+ (CFS) (HR)

(CFS)

+ 1223. 12.67 890. 588. 245. 106.

(INCHES) .154 .408 .509 .509

(AC-FT) 441. 1166. 1456. 1456.

CUMULATIVE AREA = 53.67 SQ MI

*** *** *** *** ***

blank line here

My super crazy slow and inefficient batch file solution (though it works) is:

-----------------------------------------------------------------

@echo off>output.txt & setlocal enabledelayedexpansion set input=input.txt rem this finds the text INTERPOLATED HYDROGRAPH in the input file rem sets k0 = to that line# rem sets x = to that line# + 6 and rem sets y = to that line# + 8 for /f "tokens=1,5 delims=[] " %%a in ('find /n "INTERPOLATED HYDROGRAPH"^<%input%') do ( set /a x=%%a+6 set k0=%%b echo line !x! set /a y=%%a+8 call :xx ) goto :eof :xx rem this line takes the line #s and extracts the following: rem 2nd column of line !x! and the 6th columns of line !y! rem find /n /v ""<%input%|findstr "^\[!x!\]" for /f "tokens=2 delims= " %%a in ('find /n /v ""^<%input%^|findstr "^\[!x!\] "') do set k=%%a for /f "tokens=6 delims= " %%a in ('find /n /v ""^<%input%^|findstr "^\[!y!\] "') do set k2=%%a rem this writes the values to a text file >>output.txt echo %k0% %k% %k2%
-----------------------------------------------------------------

Any help would be greatly appreciated, thanks.


In reply to Perl solution for current batch file to extract specific column text by oryan

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.