ITmajor has asked for the wisdom of the Perl Monks concerning the following question:

I am reading from text files that contain mostly numbers, but every so often there are rows of text (I think that the files were merged and they still contain the headers). When I read from the file I want to create an array with the numerical data and not the text. What command do I use to only collect the numbers?
Updated: 
The files are kinda like this

head	head 	head
num	num	num
num	num	num
head	head 	head
num	num	num
num	num	num
The files are tab separated. From what I have seen, the headers are similar but vary with different files. The infomation will be put into an array. Also, the first column contains numbers with decimals. Will that make a difference?

Replies are listed 'Best First'.
Re: Collect numbers without the text
by moritz (Cardinal) on Jul 30, 2008 at 15:15 UTC
    This being perl, I recommend using regexes.

    You can look at Regexp::Common, it already contains regexes that match numbers in various formats.

Re: Collect numbers without the text
by massa (Hermit) on Jul 30, 2008 at 16:20 UTC
    open my $f, '<', 'fname.txt'; my @f = map { /(\d+)/g } <$f>;
    voilà! if your file is too big, you should instead:
    open my $f, '<', 'fname.txt'; my @f; push @f, /(\d+)/g while <$f>;
    []s, HTH, Massa (κς,πμ,πλ)
Re: Collect numbers without the text
by dHarry (Abbot) on Jul 30, 2008 at 15:17 UTC

    • What do the files look like?
    • How are the numbers "separated”?
    • Can these headers be recognized somehow?

    Bottom line: it's not really clear what you mean.

Re: Collect numbers without the text
by broomduster (Priest) on Jul 30, 2008 at 15:28 UTC
    If you mean the files look something like this:
    1 2 3 unwanted text 4 5 other stuff not numbers 6 7 8

    then something like

    while (<>) { chomp; next if /\D/; #will need more accurate regex depending on actual +number format # process the lines that have only numbers }

    OTOH, if you mean something else, then you'll need to give an example (which would have been a GoodIdea to begin with).

    Udated: per comments by kyle, added chomp and comment that "rejection" regex will depend on ITmajor's real requirements.

      Every line will match /\D/ because every line will have "\n" at the end of it. It might help to use chomp, but I'd still wonder how to handle leading/trailing white space, possible decimal numbers, etc.

Re: Collect numbers without the text
by Anonymous Monk on Jul 30, 2008 at 16:00 UTC
    undef $/; print join('',grep{m/\d/} split(//,<DATA>));