chris654 has asked for the wisdom of the Perl Monks concerning the following question:

I have a text file called database.txt and the 16th field contains 16:01:08 2007-01-10. I'm trying to archive all records over 45 days old to be appended to archive.txt I've a newbie when it comes to scripting and have been trying to google an answer for a similar question but am having trouble knowing what I should be searching for. Do you know of any examples I can take a look that would accomplish this.

type|address|city|state|size|rent|term|company|contact|phone|email|website|ID|REMOTE_ADDR|HTTP_USER_AGENT|DATE|SORTORDER
MFG|10 Oakmead Pkwy|San Jose|CA |10,000|$1.25|Net|Cushman Wakefield|Tom Cushman|408 555-8777|tom@cushman.com|http://www.google.com|1000000|76.102.98.35|Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727)|16:01:08 2007-01-10|1003
LAND|10 Oak|San Jose|CA|14,000|$1.25|Net|Ritchie Commercial|Don Ritchie|408 555-8777|tom@cushman.com|http://www.yahoo.com|1000001|76.102.98.35|Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727)|16:28:03 2008-01-10|1002

Thanks,
Chris
  • Comment on Extracting Line from Text file over 45 days old

Replies are listed 'Best First'.
Re: Extracting Line from Text file over 45 days old
by BrowserUk (Patriarch) on Jan 12, 2008 at 06:23 UTC

    Since your date field is alpha-sortable, there is no point in parsing, splitting and converting all the dates in your file. It is far simpler and much faster to convert the target date (45 days ago) to the same format:

    my @bits = split ' ', localtime( time() - ( 45 * 24 * 60 * 60 ) );; my $n = 1; my %months = map{$_, $n++} qw[Jan Feb Mar Apr May Jun Jul Aug Sep Oct +Nov Dec];; my $targetDate = join '-', $bits[ 4 ], $months{ $bits[ 1 ] }, $bits[2] +;; print $targetDate;; 2007-11-28

    You can now just use a string compare to select the records and avoid the splits and conversions.

    This pseudocode assumes that the records in your DB file are ordered correctly. It also assumes that the last field of each record is always 4 digits:

    open OLDDB, '<', $dbname or die ...; open NEWDB, '>', $tempfile or die ...; open ARCHIVE, '>>', $archive or die ...; print ARCHIVE while defined( $_ = <OLDDB> ) and substr( $_, -17, 10 ) lt $targetDate; print NEWDB; ## Output first 'failing' record to newdb print NEWDB while <OLDDB>; close for OLDDB, NEWDB, ARCHIVE; unlink $dbname; rename $tempfile, $dbname;

    Even if the above assumptions are incorrect, it will still be quicker to convert the target date to a string once, than convert every record date to an integer.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Extracting Line from Text file over 45 days old
by McDarren (Abbot) on Jan 12, 2008 at 05:07 UTC
    Howdy :)

    This is quite simple. Because your data is regular, you can use split to extract the date/time stamp from each record. Then you can use something like Date::Parse to convert the date/time stamp into a unix timestamp. After that, it's just a bit of simple arithmetic.

    Here is some example code to demonstrate:

    #!/usr/bin/perl -l use strict; use warnings; use Date::Parse; my $cutoff_date = time - (45 * 86400); while (my $line = <DATA>) { chomp($line); my $date = (split /\|/, $line)[15]; my $unixdate = str2time($date) or next; print "I would ", $unixdate < $cutoff_date ? "archive" : "not arch +ive", " $date"; } __DATA__ type|address|city|state|size|rent|term|company|contact|phone|email|web +site|ID|REMOTE_ADDR|HTTP_USER_AGENT|DATE|SORTORDER MFG|10 Oakmead Pkwy|San Jose|CA |10,000|$1.25|Net|Cushman Wakefield|To +m Cushman|408 555-8777|tom@cushman.com|http://www.google.com|1000000| +76.102.98.35|Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET +CLR 2.0.50727)|16:01:08 2007-01-10|1003 LAND|10 Oak|San Jose|CA|14,000|$1.25|Net|Ritchie Commercial|Don Ritchi +e|408 555-8777|tom@cushman.com|http://www.yahoo.com|1000001|76.102.98 +.35|Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.5 +0727)|16:28:03 2008-01-10|1002
    Which prints:
    I would archive 16:01:08 2007-01-10 I would not archive 16:28:03 2008-01-10

    Hope this helps,
    Darren :)

      I replaced <data> with database.txt but when I ran it got the following error

      Can't locate Date/Parse.pm in @INC (@INC contains: /usr/lib/perl5/5.8.0/i386-linux /usr/lib/perl5/5.8.0 /usr/lib/perl5/site_perl/5.8.0/i386-linux /usr/lib/perl5/site_perl/5.8.0 /usr/lib/perl5/site_perl .) at ./over45.sh line 4. BEGIN failed--compilation aborted at ./over45.sh line 4.

      Is Date::Parse not supported on my virtual server or is something else wrong.

      Thanks,
      Chris

        That error most likely means that the Date::Parse module is not installed. Installing it should be as simple as:
        perl -MCPAN -e "install Date::Parse"

        Cheers,
        Darren :)

Re: Extracting Line from Text file over 45 days old
by dragonchild (Archbishop) on Jan 12, 2008 at 04:58 UTC
    You want to take a look at the split function, the DateTime module, and you will probably want to use the following as a way of looping through the file:
    open my $fh, '<', $filename or die "Cannot open file '$filename' for reading: $!\n"; while ( my $line = <$fh> ) { # Do stuff here. } close $fh;

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
      You want to take a look at the split function, the DateTime module
      Seeking a way to tackle the problem, what exactly would be the OP's benefit installing 17 modules?

      --shmem

      _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                    /\_¯/(q    /
      ----------------------------  \__(m.====·.(_("always off the crowd"))."·
      ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
        That there are 17 modules to be installed is the problem of the cpan script, not the OP. We have computers do repetitive things because they're repetitive.

        I am completely baffled by this "There's too many modules involved!" concern. Do you pay for storage by the kilobyte/hour? I have yet to have a problem and I generally have between 10 and 30 Perl installations on any given machine. All of those will generally take up 2-3G.


        My criteria for good software:
        1. Does it work?
        2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?