in reply to peek at STDIN, to determine data type and then pass STDIN to a parser

Perhaps using an iterator might be a solution. Create an Iterator::Simple iterator object out of the original file handle, pull the first couple of lines from the original file handle to validate file type, and then use the iterator as the file handle passed to the actual processing code. IIRC, the iterator can behave like a standard file handle. You will need to manage the storage of the first bit of text that you check on, but the coding is pretty simple.

--MidLifeXis

  • Comment on Re: peek at STDIN, to determine data type and then pass STDIN to a parser

Replies are listed 'Best First'.
Re^2: peek at STDIN, to determine data type and then pass STDIN to a parser
by aral (Acolyte) on Jan 06, 2015 at 14:43 UTC
    Thank you for the suggestion. Are you still talking about possibilities for STDIN? For normal filehandles I would be able to use a seek operation anyways. My problem seems to be limited to pipes.

      Yes. It is an option. It may not be the best option for your uses.

      I use iterators when schlepping event logs through my monitoring system, whether they come from a real-time event queue, stored log files, or current state of a system. To my consumer software, all of the data looks the same.

      The reason I suggested this technique is that it does not significantly increase the memory or filesystem requirements (as reading files fully into memory or storing in a temp file and processing would^Wcould do). It also allows the consumer (your XML processing in this case) to treat it as just a file handle.

      # UNTESTED # # This is for line-by-line reading, not block-by-block reading. # Adjust as necessary. sub create_iterator { my $original_fh = \*STDIN; my @cached_data = $original_fh->getline; # enough +to id the file my $data_type_id = identify_data_type( \@cached_data ); # Remove +from @cached if provided my $iterator = iter( sub { my $retval; if ( $data_type_id ) { $retval = $data_type_id; $data_type_id = undef; } elsif ( @cached_data ) { $retval = shift( @cached_data ); } else { $retval = $original_fh->getline; } return $retval; } ); return $iterator; }

      --MidLifeXis

        I haven't tested that yet, but if I understand what you are doing here correctly, then that is a great idea!

        Thank you very much, very elegant solution for my problem! I'll get started right away on implementing / testing that. Never mind my workaround, elegant beats workaround!

        Okay - I have gotten your code to work, and to do what I want. Now this may be a beginners question - but:

        How on earth do I get XML::Twig's parse function to use the iterator instead of a filehandle?

        my $inputHandle = create_iterator(); $t->parse (<$inputHandle>);

        exits with error message "Not a GLOB reference at ./script.pl line xy.".

        And

        $t->parse ($inputHandle);

        spits out: "not well-formed (invalid token) at line 1, column 4, byte 4 at /usr/lib/x86_64-linux-gnu/perl5/5.20/XML/Parser.pm line 187. at ./script.pl line xy."

        So how do I typecast the iterator in order to treat it like a file handle?

Re^2: peek at STDIN, to determine data type and then pass STDIN to a parser
by Anonymous Monk on Jan 06, 2015 at 19:41 UTC
    what is the difference between reading line by line using the filehandle with the diamond operator and using an iterator?

      Nothing if you are just reading. The benefit can arise if you want to rearrange, inject, or modify the incoming data on the file handle and make the resulting stream look like a plain old file handle. I understand the OP to want to maybe inject a proper doctype into the data stream if needed.

      Perhaps not the best tool for this particular case, but a tool for the generic case.

      --MidLifeXis