annie06 has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,
If I have a file like:

Number1 Number2 Number3
23 85 31
90 67 45

How can I grab out the entries like "23" and "67" ?

Right now I open the file and search down to "Number1" but don't know what to do from there
sub my_search { ($file) = @_; open(FILE, $file) || die "Unable to open $file: $!\n"; while (<FILE>) { if (/^Number 1/) { $_ =~ s/\s+/ /g; my @Number = split(" ",$_); # line above only brings me to the headers of +the "text table" } }

Replies are listed 'Best First'.
Re: printing column info without headers
by kyle (Abbot) on Jul 09, 2008 at 15:17 UTC

    Your while is going to read the whole file. Inside that loop, the if you currently have will operate on the headers. If you want to operate also on the lines that are not the headers, add an else to that.

    while (<FILE>) { if (/^Number1/) { # header } else { # non-header } }

    If you happen to have a data line that begins with 'Number1', however, it's going to treat that as a header. It might be better to read the file this way:

    open ... my $header_line = <FILE>; while (<FILE>) { # data line }

    Considering the format of your data, it might be easiest to use something like Text::CSV_XS.

      Not sure how your suggestion would work
      Maybe I wasn't clear in my question
      I'm basically asking how to search a large file where I know what the heading on the columns looks like, but then I want to actually grab the info in the columns.

        If the header line appears somewhere in the middle of the file, you could make use of a state variable (here $in_data_section). You'd initialize the variable to false, and set it to true as soon as you enconter the header line. And only when $in_data_section is true, you'd treat the input lines as data (numbers)...

        (Similarly, if you can tell what ends your data section, you could then reset the state variable to false...)

        ... my $in_data_section = 0; while (<FILE>){ chomp; if ($in_data_section) { my @numbers = split; #... } $in_data_section = 1 if m/^Number/; }

        Alternatively, if you know that there are exactly two data lines following the header, you could also do something like

        while (<FILE>){ chomp; if (/^Number/) { # found header, now we're expecting two lines with numbers for (1..2) { $_ = <FILE>; chomp; my @numbers = split; #... } } #... do other stuff }
Re: printing column info without headers
by toolic (Bishop) on Jul 09, 2008 at 15:32 UTC
    use strict; use warnings; my_search('data.txt'); sub my_search { my $file = shift; open my $in_fh, '<', $file or die "Unable to open $file: $!\n"; while (<$in_fh>) { next if (/^Number1/); my @Numbers = split; print "@Numbers\n"; } close $in_fh; }

    Some comments:

    • Use the strictures use strict; use warnings;
    • Use lexical filehandles: $in_fh
    • Use the 3-argument form of open
    • split by default splits $_ on whitespace
    • close the file
      thanks, what happens if the if in the 'next' line isn't true. Does the script then move down to the following line of code in the script??
        The code I showed is extremely simple because your input file example is extremely simple. The code I presented will read in a file and will ignore all lines which begin with the string "Number1" because of the next statement. For all other lines which do not begin with "Number1", the numbers will be stored in an array. The script will generate warnings if there are blank lines (because of the print).

        Your actual file probably has a more complicated structure than your example file, but I can not predict its structure without more information from you.

        reason I'm asking is because it seems to me we would then break out of the while loop. I don't want to do that, I'd like to still be able to continue on and look for other strings in the file..etc..
        thanks, what happens if the if in the 'next' line isn't true.

        (If that's a question, then it should be closed with a question mark, incidentally.)

        I personally believe that as with any other C<if>, if the condition isn't true, then the statement, i.e. next is not executed. Then, as with (nearly) any other statement, (that it has a statement modifier doesn't change this) control is passed to the following one. Pretty ordinary stuff, as you can see.

        (Apologies for replying so late.)

        --
        If you can't understand the incipit, then please check the IPB Campaign.
Re: printing column info without headers
by moritz (Cardinal) on Jul 09, 2008 at 15:16 UTC
    while (<FILE>){ # remove line endings: chomp; # skip header lines - you want the data, right? next if m/^Number/; my @numbers = split m/\s+/, $_; # now your numbers are in @numbers }
Re: printing column info without headers
by EvanCarroll (Chaplain) on Jul 09, 2008 at 16:47 UTC
    The way you mention columns in some of your responses leads me to believe this is a fixed width table, if not ignore this response. This uses DataExtract::FixedWidth
    use DataExtract::FixedWidth; use IO::File; use feature ':5.10'; my $fh = IO::File->new( 'file.txt', 'r' ); my @tuples = <$fh>; my $de = DataExtract::FixedWidth->new({ heuristic => \@tuples }); foreach my $tuple ( @tuples ) { state $row; my $arr = $de->parse( $tuple ); given ( ++$row ) { when ( 1 ) { say $arr->[0] } when ( 2 ) { say $arr->[2] } } }
    or
    my $de = DataExtract::FixedWidth->new({ heuristic => \@tuples , column_names => [qw/foo bar baz/] }); foreach my $tuple ( @tuples ) { state $row; my $hash = $de->parse_hash( $tuple ); given ( ++$row ) { when ( 1 ) { say $hash->{foo} } when ( 2 ) { say $hash->{baz} } } }


    Evan Carroll
    I hack for the ladies.
    www.EvanCarroll.com