in reply to Re: peek at STDIN, to determine data type and then pass STDIN to a parser
in thread peek at STDIN, to determine data type and then pass STDIN to a parser

Thank you for the suggestion. Are you still talking about possibilities for STDIN? For normal filehandles I would be able to use a seek operation anyways. My problem seems to be limited to pipes.
  • Comment on Re^2: peek at STDIN, to determine data type and then pass STDIN to a parser

Replies are listed 'Best First'.
Re^3: peek at STDIN, to determine data type and then pass STDIN to a parser
by MidLifeXis (Monsignor) on Jan 06, 2015 at 15:01 UTC

    Yes. It is an option. It may not be the best option for your uses.

    I use iterators when schlepping event logs through my monitoring system, whether they come from a real-time event queue, stored log files, or current state of a system. To my consumer software, all of the data looks the same.

    The reason I suggested this technique is that it does not significantly increase the memory or filesystem requirements (as reading files fully into memory or storing in a temp file and processing would^Wcould do). It also allows the consumer (your XML processing in this case) to treat it as just a file handle.

    # UNTESTED # # This is for line-by-line reading, not block-by-block reading. # Adjust as necessary. sub create_iterator { my $original_fh = \*STDIN; my @cached_data = $original_fh->getline; # enough +to id the file my $data_type_id = identify_data_type( \@cached_data ); # Remove +from @cached if provided my $iterator = iter( sub { my $retval; if ( $data_type_id ) { $retval = $data_type_id; $data_type_id = undef; } elsif ( @cached_data ) { $retval = shift( @cached_data ); } else { $retval = $original_fh->getline; } return $retval; } ); return $iterator; }

    --MidLifeXis

      I haven't tested that yet, but if I understand what you are doing here correctly, then that is a great idea!

      Thank you very much, very elegant solution for my problem! I'll get started right away on implementing / testing that. Never mind my workaround, elegant beats workaround!

      Okay - I have gotten your code to work, and to do what I want. Now this may be a beginners question - but:

      How on earth do I get XML::Twig's parse function to use the iterator instead of a filehandle?

      my $inputHandle = create_iterator(); $t->parse (<$inputHandle>);

      exits with error message "Not a GLOB reference at ./script.pl line xy.".

      And

      $t->parse ($inputHandle);

      spits out: "not well-formed (invalid token) at line 1, column 4, byte 4 at /usr/lib/x86_64-linux-gnu/perl5/5.20/XML/Parser.pm line 187. at ./script.pl line xy."

      So how do I typecast the iterator in order to treat it like a file handle?

        Ok, I have had a chance to look at this - my statement above about it "treat it as just a file handle" is not quite accurate. It needs a little more support to make it look like an IO::Handle or a tied file handle. I am considering a self-project to be able to export an Iterator::Simple object as an IO::Handle object.

        I have been using the iterators with the ->next() or <$iterator> syntax, so stating that they just behave like a file handle is definitely a mea culpa. Given a couple of days perhaps I can have something available that will wrap an iterator as a globish object. I will update here if/when that happens.

        --MidLifeXis