in reply to Re^2: peek at STDIN, to determine data type and then pass STDIN to a parser
in thread peek at STDIN, to determine data type and then pass STDIN to a parser

Yes. It is an option. It may not be the best option for your uses.

I use iterators when schlepping event logs through my monitoring system, whether they come from a real-time event queue, stored log files, or current state of a system. To my consumer software, all of the data looks the same.

The reason I suggested this technique is that it does not significantly increase the memory or filesystem requirements (as reading files fully into memory or storing in a temp file and processing would^Wcould do). It also allows the consumer (your XML processing in this case) to treat it as just a file handle.

# UNTESTED # # This is for line-by-line reading, not block-by-block reading. # Adjust as necessary. sub create_iterator { my $original_fh = \*STDIN; my @cached_data = $original_fh->getline; # enough +to id the file my $data_type_id = identify_data_type( \@cached_data ); # Remove +from @cached if provided my $iterator = iter( sub { my $retval; if ( $data_type_id ) { $retval = $data_type_id; $data_type_id = undef; } elsif ( @cached_data ) { $retval = shift( @cached_data ); } else { $retval = $original_fh->getline; } return $retval; } ); return $iterator; }

--MidLifeXis

  • Comment on Re^3: peek at STDIN, to determine data type and then pass STDIN to a parser
  • Download Code

Replies are listed 'Best First'.
Re^4: peek at STDIN, to determine data type and then pass STDIN to a parser
by aral (Acolyte) on Jan 08, 2015 at 08:57 UTC

    I haven't tested that yet, but if I understand what you are doing here correctly, then that is a great idea!

    Thank you very much, very elegant solution for my problem! I'll get started right away on implementing / testing that. Never mind my workaround, elegant beats workaround!

Re^4: peek at STDIN, to determine data type and then pass STDIN to a parser
by aral (Acolyte) on Jan 08, 2015 at 10:12 UTC

    Okay - I have gotten your code to work, and to do what I want. Now this may be a beginners question - but:

    How on earth do I get XML::Twig's parse function to use the iterator instead of a filehandle?

    my $inputHandle = create_iterator(); $t->parse (<$inputHandle>);

    exits with error message "Not a GLOB reference at ./script.pl line xy.".

    And

    $t->parse ($inputHandle);

    spits out: "not well-formed (invalid token) at line 1, column 4, byte 4 at /usr/lib/x86_64-linux-gnu/perl5/5.20/XML/Parser.pm line 187. at ./script.pl line xy."

    So how do I typecast the iterator in order to treat it like a file handle?

      Ok, I have had a chance to look at this - my statement above about it "treat it as just a file handle" is not quite accurate. It needs a little more support to make it look like an IO::Handle or a tied file handle. I am considering a self-project to be able to export an Iterator::Simple object as an IO::Handle object.

      I have been using the iterators with the ->next() or <$iterator> syntax, so stating that they just behave like a file handle is definitely a mea culpa. Given a couple of days perhaps I can have something available that will wrap an iterator as a globish object. I will update here if/when that happens.

      --MidLifeXis

        If you could find a way to do this that would be much appreciated. I would try it myself, but I am afraid I am still way too clueless about the lower level mechanics of perl :)

        I understand you've probably been too busy to get back to this. However, the idea and approach seemed very valuable, and I would therefore like to bump the topic, in case someone has the time and interest to come up with a solution to wrap the above iterator approach in a glob like object.

        Anyone? :)