Re^3: peek at STDIN, to determine data type and then pass STDIN to a parser

Yes. It is an option. It may not be the best option for your uses.

I use iterators when schlepping event logs through my monitoring system, whether they come from a real-time event queue, stored log files, or current state of a system. To my consumer software, all of the data looks the same.

The reason I suggested this technique is that it does not significantly increase the memory or filesystem requirements (as reading files fully into memory or storing in a temp file and processing would^Wcould do). It also allows the consumer (your XML processing in this case) to treat it as just a file handle.

# UNTESTED
#
# This is for line-by-line reading, not block-by-block reading.
# Adjust as necessary.
sub create_iterator {
    my $original_fh  = \*STDIN;
    my @cached_data  = $original_fh->getline;                # enough 
+to id the file
    my $data_type_id = identify_data_type( \@cached_data );  # Remove 
+from @cached if provided

    my $iterator = iter( sub {
        my $retval;
        if ( $data_type_id ) {
            $retval = $data_type_id;
            $data_type_id = undef;
        }
        elsif ( @cached_data ) {
            $retval = shift( @cached_data );
        }
        else {
            $retval = $original_fh->getline;
        }
        return $retval;
    } );

    return $iterator;
}
[download]

--MidLifeXis

Comment on Re^3: peek at STDIN, to determine data type and then pass STDIN to a parser Download Code

Replies are listed 'Best First'.
Re^4: peek at STDIN, to determine data type and then pass STDIN to a parser by aral (Acolyte) on Jan 08, 2015 at 08:57 UTC
I haven't tested that yet, but if I understand what you are doing here correctly, then that is a great idea! Thank you very much, very elegant solution for my problem! I'll get started right away on implementing / testing that. Never mind my workaround, elegant beats workaround!	[reply]
Re^4: peek at STDIN, to determine data type and then pass STDIN to a parser by aral (Acolyte) on Jan 08, 2015 at 10:12 UTC
Okay - I have gotten your code to work, and to do what I want. Now this may be a beginners question - but: How on earth do I get XML::Twig's parse function to use the iterator instead of a filehandle? `my $inputHandle = create_iterator(); $t->parse (<$inputHandle>);` [download] exits with error message "Not a GLOB reference at ./script.pl line xy.". And `$t->parse ($inputHandle);` [download] spits out: "not well-formed (invalid token) at line 1, column 4, byte 4 at /usr/lib/x86_64-linux-gnu/perl5/5.20/XML/Parser.pm line 187. at ./script.pl line xy." So how do I typecast the iterator in order to treat it like a file handle?	[reply] [d/l] [select]
Re^5: peek at STDIN, to determine data type and then pass STDIN to a parser by MidLifeXis (Monsignor) on Jan 09, 2015 at 17:48 UTC
Ok, I have had a chance to look at this - my statement above about it "treat it as just a file handle" is not quite accurate. It needs a little more support to make it look like an IO::Handle or a tied file handle. I am considering a self-project to be able to export an Iterator::Simple object as an IO::Handle object. I have been using the iterators with the `->next()` or `<$iterator>` syntax, so stating that they just behave like a file handle is definitely a mea culpa. Given a couple of days perhaps I can have something available that will wrap an iterator as a globish object. I will update here if/when that happens. --MidLifeXis	[reply] [d/l] [select]
Re^6: peek at STDIN, to determine data type and then pass STDIN to a parser by aral (Acolyte) on Jan 12, 2015 at 13:54 UTC
If you could find a way to do this that would be much appreciated. I would try it myself, but I am afraid I am still way too clueless about the lower level mechanics of perl :)	[reply]
Re^6: peek at STDIN, to determine data type and then pass STDIN to a parser by aral (Acolyte) on Feb 17, 2015 at 11:44 UTC
I understand you've probably been too busy to get back to this. However, the idea and approach seemed very valuable, and I would therefore like to bump the topic, in case someone has the time and interest to come up with a solution to wrap the above iterator approach in a glob like object. Anyone? :)	[reply]