gavintokyo has asked for the wisdom of the Perl Monks concerning the following question:

Have a little problem,
I have an array that prints out columns of data such as,
A19234 hostname 07/08/07 moredata moredata
A19284 hostname 07/09/07 moredata moredata
A19384 hostname 06/09/07 moredata moredata
when i try to remove all the first words with grep
@callgrep = grep (!/A192*4/, @call);
it removes the whole line
And would like to sort the dates too
Thank You

Replies are listed 'Best First'.
Re: manipulating array
by GrandFather (Saint) on Aug 23, 2007 at 01:45 UTC

    Probably you don't want grep, but instead map and a regular expression substitution (see perlretut and perlre):

    use strict; use warnings; my @call = ( 'A19284 hostname 07/09/07 moredata moredata', 'A19384 hostname 06/09/07 moredata moredata', 'A19234 hostname 07/08/07 moredata moredata', ); my @callClean = map {$_->[1]} sort {$a->[0] cmp $b->[0]} map {[(join '', @{[split '/', $_->[0]]}[2, 1, 0]), $_->[1]]} map {m|^\w+\s+(.*?(\d+/\d+/\d+).*)|; [$2, $1]} @call; print "$_\n" for @call; print "$_\n" for @callClean;

    Prints:

    A19284 hostname 05/09/07 moredata moredata A19384 hostname 06/09/07 moredata moredata A19234 hostname 08/08/07 moredata moredata hostname 08/08/07 moredata moredata hostname 05/09/07 moredata moredata hostname 06/09/07 moredata moredata

    Update: sort by the dates too.


    DWIM is Perl's answer to Gödel
Re: manipulating array
by ysth (Canon) on Aug 23, 2007 at 04:07 UTC
    It sounds like you have one line in each array element. If so, your grep is filtering out all lines that have /A192*4/ in them. But from your results, I'm guessing you actually say /A192.*4/. /A192*4/ means zero or more 2's between a A19 and a 4, and none of your examples match that. /A192.*4/ means match zero or more of any characters (except a newline) between a A192 and a 4.

    Instead, loop through your array, removing the column you don't want:

    for my $line (@call) { # remove A192.*4 and following whitespace from the beginning of eac +h line $line =~ s/^A192.*4\s+//; }
      Try that:
      use strict; use warnings; my @call = ( 'A19284 hostname 07/09/07 moredata moredata', 'A19384 hostname 06/09/07 moredata moredata', 'A19234 hostname 07/08/07 moredata moredata', ); print "$_,$/" for map {s/^\w+\s//;$_} @call;
Re: manipulating array
by jbert (Priest) on Aug 23, 2007 at 08:31 UTC
    Given that you're going to want to look at the date anyway, you might as well pick your data apart into columns. If you're sure that you'll have no whitespace in your 'moredata', you could just do a split without a limit, but to play it safe I'll just split the first few columns (untested):
    # Each eat of splitLines will hold an array ref my @splitLines; foreach my $line (@calls) { my @bits = split(/\s+/, $line, 4); # Discard the first column shift @bits; # We want to sort by date, so we'll parse # the date column and prefix with a sortable value my ($mday, $month, $year) = split(m!/!, $bits[1]); $month -= 1; # mktime wants month from 0 $year += 100; # mktime wants year from 1900 my $when = POSIX::mktime(0, 0, 0, $mday, $month, $year); unshift @bits, $when; push @splitLines, [ @bits ]; } # Sort by first elt @splitLines = sort { $a->[0] <=> $b->[0] } @splitLines; # And puts the lines back together again (discarding # the leading 'when'. @calls = map { shift @$_; join(' ', @$_) } @splitLines;
    This seems over-long, but unless you write a more complex sort comparator (and hide the date parsing and processing etc in there) I'm not sure it can get much shorter. I don't like debugging complex sort comparators, so there you go.
Re: manipulating array
by ikegami (Patriarch) on Aug 23, 2007 at 20:48 UTC

    The following speeds up the sort the others presented, and is probably faster overall for anything but trivial data:

    my @sorted = map substr($_, 6), sort map { local $_ = $_; s/^\S+\s*//; my $ymd = join '', reverse split '/', (split / /)[1]; "$ymd$_" } @data;
Re: manipulating array (TMTOWTDI!)
by Codon (Friar) on Aug 23, 2007 at 19:59 UTC
    If you know that every element begins with the A19\d+ (or you only care about these lines) you can filter / clean the records with a grep. You can then pipe those (matched/cleaned) items to sort (via a Schwartzian Transform) using a quick (simple) by_date subroutine.

    I know I could have saved some characters on the sort line, but this was more aesthetically pleasing to me.

    Ivan Heffner
    Sr. Software Engineer
    WhitePages.com, Inc.
      I don't like how it not only clobbers @data, but relies on it. The grep could be replaced with:
      map { local $_ = $_; s/^A\d+ // ? $_ : () }
      I like that solution because it points out the most significant lesson: that "*" has a specific meaning the original poster did not want.

      The star means "zero or any repeats of the previous item" in a regex context. The specific string one wants to match is "the letter A followed by the numbers 1 then 9 followed by two digits of the 0 through 9 series followed by a 4...".

      It's very important to take the time to think about a regex in that plodding way or you get matches you won't want. For example, the rest of the line could match.

      I hope I'm not being redundant with my bandwidth. Y'all have a good day.

Re: manipulating array
by lyklev (Pilgrim) on Aug 24, 2007 at 22:38 UTC
    The file glob '*' that works for files works differently for regular expressions. 'A192*4' means "A19, then 2 repeated zero or more times, then a 4", so nothing matches.