peek at STDIN, to determine data type and then pass STDIN to a parser

aral has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: peek at STDIN, to determine data type and then pass STDIN to a parser
by Athanasius (Archbishop) on Jan 06, 2015 at 14:58 UTC

Hello aral,

The ungets method from FileHandle::Unget works on STDIN:

#! perl
use strict;
use warnings;
use FileHandle::Unget;

$| = 1;

my $fh = FileHandle::Unget->new(\*STDIN)
  or die "Cannot open filehandle: $!";

print "\nEnter a string: ";
read($fh, my $buffer1, 10);
print "\nThe first  10 characters: '$buffer1'\n";

$fh->ungets($buffer1);

read($fh, my $buffer2, 15);
print "The \"next\" 15 characters: '$buffer2'\n";
$fh->close;
[download]

Output:

 1:17 >perl 1115_SoPW.pl

Enter a string: abcdefghijklmnopqrstuvwxyz

The first  10 characters: 'abcdefghij'
The "next" 15 characters: 'abcdefghijklmno'

 1:17 >
[download]

Update: Added print statements and renamed variables.

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]

Re^2: peek at STDIN, to determine data type and then pass STDIN to a parser

by ikegami (Patriarch) on Jan 07, 2015 at 15:33 UTC

IO::Unread

[reply]

Re^2: peek at STDIN, to determine data type and then pass STDIN to a parser

by aral (Acolyte) on Jan 08, 2015 at 09:43 UTC

Excellent solution as well - I just tested it and it worked out of the box like this:

use FileHandle::Unget;
  
my $fh = FileHandle::Unget->new(\*STDIN)
 or die "Cannot open filehandle: $!";

my $testline = <$fh>;;
$fh->ungets($testline);

print "$.: $testline";

for (my $i = 0; $i < 3; $i++) {
    $testline = <$fh>;
    print "$.: $testline";
}
[download]

Output for "cat xmlfile | ./perscript.pl" is:

1: <?xml version="1.0" encoding="UTF-8"?>
1: <?xml version="1.0" encoding="UTF-8"?>
2: <MFOP>
3:   <Basics>
[download]

Thank you very much! FileHandle::Unget is *the* answer to my original question.

@ikegami: Unfortunately, the install script (Makefile) for IO::Unread fails with error messages, and there seems to be no debian packet for it available in jessie - so I was not able to test this.

[reply]
[d/l]
[select]

Re: peek at STDIN, to determine data type and then pass STDIN to a parser
by MidLifeXis (Monsignor) on Jan 06, 2015 at 14:37 UTC

Perhaps using an iterator might be a solution. Create an Iterator::Simple iterator object out of the original file handle, pull the first couple of lines from the original file handle to validate file type, and then use the iterator as the file handle passed to the actual processing code. IIRC, the iterator can behave like a standard file handle. You will need to manage the storage of the first bit of text that you check on, but the coding is pretty simple.

--MidLifeXis

[reply]

Re^2: peek at STDIN, to determine data type and then pass STDIN to a parser

by aral (Acolyte) on Jan 06, 2015 at 14:43 UTC

Thank you for the suggestion. Are you still talking about possibilities for STDIN? For normal filehandles I would be able to use a seek operation anyways. My problem seems to be limited to pipes.

[reply]

Re^3: peek at STDIN, to determine data type and then pass STDIN to a parser

by MidLifeXis (Monsignor) on Jan 06, 2015 at 15:01 UTC

Yes. It is an option. It may not be the best option for your uses.

I use iterators when schlepping event logs through my monitoring system, whether they come from a real-time event queue, stored log files, or current state of a system. To my consumer software, all of the data looks the same.

The reason I suggested this technique is that it does not significantly increase the memory or filesystem requirements (as reading files fully into memory or storing in a temp file and processing would^Wcould do). It also allows the consumer (your XML processing in this case) to treat it as just a file handle.

# UNTESTED
#
# This is for line-by-line reading, not block-by-block reading.
# Adjust as necessary.
sub create_iterator {
    my $original_fh  = \*STDIN;
    my @cached_data  = $original_fh->getline;                # enough 
+to id the file
    my $data_type_id = identify_data_type( \@cached_data );  # Remove 
+from @cached if provided

    my $iterator = iter( sub {
        my $retval;
        if ( $data_type_id ) {
            $retval = $data_type_id;
            $data_type_id = undef;
        }
        elsif ( @cached_data ) {
            $retval = shift( @cached_data );
        }
        else {
            $retval = $original_fh->getline;
        }
        return $retval;
    } );

    return $iterator;
}
[download]

--MidLifeXis

[reply]
[d/l]

Re^4: peek at STDIN, to determine data type and then pass STDIN to a parser

by aral (Acolyte) on Jan 08, 2015 at 08:57 UTC

Re^4: peek at STDIN, to determine data type and then pass STDIN to a parser

by aral (Acolyte) on Jan 08, 2015 at 10:12 UTC

Re^5: peek at STDIN, to determine data type and then pass STDIN to a parser

by MidLifeXis (Monsignor) on Jan 09, 2015 at 17:48 UTC

Some notes below your chosen depth have not been shown here

Re^2: peek at STDIN, to determine data type and then pass STDIN to a parser

by Anonymous Monk on Jan 06, 2015 at 19:41 UTC

what is the difference between reading line by line using the filehandle with the diamond operator and using an iterator?

[reply]

Re^3: peek at STDIN, to determine data type and then pass STDIN to a parser

by MidLifeXis (Monsignor) on Jan 06, 2015 at 21:07 UTC

Nothing if you are just reading. The benefit can arise if you want to rearrange, inject, or modify the incoming data on the file handle and make the resulting stream look like a plain old file handle. I understand the OP to want to maybe inject a proper doctype into the data stream if needed.

Perhaps not the best tool for this particular case, but a tool for the generic case.

--MidLifeXis

[reply]

A reply falls below the community's threshold of quality. You may see it by logging in.