in reply to Reduce the time taken for Huge Log files

Your script is a good candidate for dominus' red flag articles :-)
I guess you're better off writing that from scratch. This might give you an idea of an efficient way.
#!/usr/bin/perl use strict; use warnings; use FileHandle; # create a hash of businesses and their target files my %businesses = ( "corp.home.ge.com" => "new_corp_home_ge_com.log", "scotland.gcf.home.ge.com" => "new_scotland_gcf_home_ge_com.log", "marketing.ge.com" => "new_marketing_ge_com.log", ); #loop the data while (<DATA>) { # check if we have a line that contains a http:// address # and store that in $1 #\" is just to prevent editor from screwing syntax higlight if ( m-http://([^/\"]+)- ) { # do we have a entry in the hash for that business if ( $businesses{$1} ) { #yes. so create a filehandle in the hash for writing to th +at unless ( ref $businesses{$1} ) { my $fh = new FileHandle; $fh->open (">$businesses{$1}"); $businesses{$1} = $fh; } #print to the business' filehandle $businesses{$1}->print ($_); } else { #no so emmit a warning warn "unknown business in logfile"; } } } __DATA__ 16/Jan/2005:00:00:40 -0500 "GET /ge/ige/1/1/4/common/cms_portletview2. +html HTTP/1.1" 200 1702 0 "http://marketing.ge.com/portal/site/insura +nce/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" "-" "marke +ting.ge.com" 16/Jan/2005:00:00:40 -0500 "GET /portal/site/transportation/menuitem.8 +c65c5d7b286411eb198ed10b424aa30/ HTTP/1.1" 200 7596 0 "http://geae.ho +me.ge.com/portal/site/transportation/" "Mozilla/4.0 (compatible; MSIE + 5.5; Windows NT 5.0)" "-" "geae.home.ge.com" 16/Jan/2005:00:00:41 -0500 "GET /ge/ige/26/83/409/common/cms_portletvi +ew.html HTTP/1.1" 200 7240 0 "http://marketing.ge.com/portal/site/ins +urance/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" "-" "ma +rketing.ge.com"


Question: Why can't I write
print $businesses{$1} $_; #yields: #Scalar found where operator expected at C:\t.pl line 36, near "} +$_" #(Missing operator before $_?)
??

That's odd.


Update:
Changed the print to the print-method of FileHandle as per castaway's suggestion.


holli, /regexed monk/

Replies are listed 'Best First'.
Re^2: Reduce the time taken for Huge Log files
by pr19939 (Initiate) on Mar 18, 2005 at 15:04 UTC
    Hi Holli,
    I tried your code with little modification and it was
    really helpful.Thanks a lot. But i am stuck.
    I have some lines having https:// and some lines having no
    http:// at all.
    I tried out some combimations of reg ex,but no luck.
    Please advise.
    Thanks
      I just noticed the address is also at the very end of every string, so:
      m-"([^\"]+)"$-
      will do.


      holli, /regexed monk/

      If you do not need to know if it was https you can reduce them all to http:// when you read each line in with something like s!https://!http://!; just after the while(<DATA>) {. Note the use of ! as a regex delimiter stops you having a quoting nightmare with the //

      Cheers,
      R.

      Pereant, qui ante nos nostra dixerunt!
Re^2: Reduce the time taken for Huge Log files
by nobull (Friar) on Mar 18, 2005 at 18:00 UTC
    Just an asside for the OP. If you are just starting out with Perl don't get into the habit of saying "use FileHandle". The FileHandle module was replaced by IO::File some considerable time ago.
      NOTE: This class is now a front-end to the IO::* classes.
      (from the POD of FileHandle.pm)


      holli, /regexed monk/