in reply to parse file per customized separator / block / metadata

Given the lack of information about what you're trying to achieve, seems like Parse::File::Metadata is a good fit so far. Why are you looking for another alternative?

Another module that comes to mind is File::Stream, which would help with grepping blocks of data with different separator patterns.

  • Comment on Re: parse file per customized separator / block / metadata

Replies are listed 'Best First'.
Re^2: parse file per customized separator / block / metadata
by raiten (Acolyte) on Mar 07, 2010 at 12:12 UTC

    Input file could be like

    header1=val1 header1b=val1b data1 ================== header2: val2 header2b: val2b data2 =============== header3: val3 header3b: val3b data3 header4= val4 header4b= val4b data4

    For these 4 blocks of data, I want to extract the ones matching one or multiple regexp. I could grep them, but I need to reform the data block after, so I'm looking in alternative solutions, module/library or tool. The separator could change in the same file.

    I quickly check File::Stream (1) and it seems a possible option.

    about why I search for different solutions, that's a kind of common challenge :). find different views of the problem, differents solutions, more performance, more clean code, more portable and so on ...

    (1) http://search.cpan.org/~smueller/File-Stream-2.20/lib/File/Stream.pm
    http://www.justskins.com/forums/file-stream-confusion-80665.html

      It would really help if there were some = equal signs separating data3 and header4. Seems like File::Stream doesn't handle lookaheads that well. Nevertheless, here's an example that may help:
      use File::Stream; my $lookahead_regex = qr/\w+[=:]/; my ($handler, $stream) = File::Stream->new( *DATA, separator => qr/\n=*\n$lookahead_regex/, ); my $lookahead = ""; while (my $block = <$stream>) { $block =~ s/($lookahead_regex)$//; $block = $lookahead . $block; $lookahead = $1; print $block; print "-" x 60, "\n"; } __END__ header1=val1 header1b=val1b data1 ================== header2: val2 header2b: val2b data2 =============== header3: val3 header3b: val3b data3 header4= val4 header4b= val4b data4

      Output:

        Sorry, it's working great. The data file need to be dos2unix-ed. Great thanks for the code, nearly perfect shot :-)

        I still need to find if there are things to optimize to handle multiple big files or pass to multithreading.

        Thanks a lot for this code and sorry for the delayed feedback.

        I try to made some tests today and the code covers most needs. The only point which fails is matching block on /^[=]+$/ (note this regexp is not accepted for block matching). my $lookahead_regex = qr/\w+[=:]/; or my $lookahead_regex = qr/[=:][=:][=:]+/; both fail.

        I can't manage to match block separator as 'headerX:' (work) AND '=========[=]+' (don't work for now)

        advices ? I'll try to continue to work on it in the next days.

        Thanks a lot