roborat has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I'm a perl newbie and need help from the experts to extract the value from the text file-1 based on value from text file-2 which is tab separated. The script is to automate my monthly internet usage report. Each customer will have separate report output similar like below.

Sample Output will be:

CUSTOMER NAME: ABC
Device_Code Port Traf_Dir Data_Usage
SWITCH Fa1_0_33 OUT 1.311
SWITCH Fa1_0_33 IN 10.716
SWITCH Fa1_0_35 OUT 50.796
SWITCH Fa1_0_35 IN 7.882

Contents of Text File 1:
200612:ABC
200905:DEF
200212:GHI

Contents of Text File 2:
200612 200612 SWITCH FastEthernet1_0_33 OUT 01-Feb-2010_02:00 01-Mar-2010_02:00 4.336 1.311
200612 200612 SWITCH FastEthernet1_0_33 IN 01-Feb-2010_02:00 01-Mar-2010_02:00 35.435 10.716
200612 200612 SWITCH FastEthernet1_0_35 OUT 01-Feb-2010_02:00 01-Mar-2010_02:00 167.976 50.796
200612 200612 SWITCH FastEthernet1_0_35 IN 01-Feb-2010_02:00 01-Mar-2010_02:00 26.064 7.882
200612 200612 SWITCH FastEthernet2_0_33 OUT 01-Feb-2010_02:00 01-Mar-2010_02:00 -0.000 -0.000
200612 200612 SWITCH FastEthernet2_0_33 IN 01-Feb-2010_02:00 01-Mar-2010_02:00 -0.000 -0.000
200612 200612 SWITCH FastEthernet2_0_35 OUT 01-Feb-2010_02:00 01-Mar-2010_02:00 -0.000 -0.000
200612 200612 SWITCH FastEthernet2_0_35 IN 01-Feb-2010_02:00 01-Mar-2010_02:00 -0.000 -0.000
200905 200905 SWITCH FastEthernet1_0_48 OUT 01-Feb-2010_02:00 01-Mar-2010_02:00 3.193 0.965
200905 200905 SWITCH FastEthernet1_0_48 IN 01-Feb-2010_02:00 01-Mar-2010_02:00 41.499 12.549
200905 200905 SWITCH FastEthernet2_0_48 OUT 01-Feb-2010_02:00 01-Mar-2010_02:00 -0.000 -0.000
200905 200905 SWITCH FastEthernet2_0_48 IN 01-Feb-2010_02:00 01-Mar-2010_02:00 -0.000 -0.000
200212 200212 SWITCH FastEthernet1_0_19 OUT 01-Feb-2010_02:00 01-Mar-2010_02:00 3.877 1.172
200212 200212 SWITCH FastEthernet1_0_19 IN 01-Feb-2010_02:00 01-Mar-2010_02:00 0.837 0.253
200212 200212 SWITCH FastEthernet2_0_19 OUT 01-Feb-2010_02:00 01-Mar-2010_02:00 -0.000 -0.000
200212 200212 SWITCH FastEthernet2_0_19 IN 01-Feb-2010_02:00 01-Mar-2010_02:00 -0.000 -0.000

Here's my current code:
#!/usr/bin/perl #use strict; #use warnings; my $cust_file="customer.txt"; my $billing_file="2010.bill"; # '<' means read only open(CUST_DATA,'<', $cust_file) || die("Could not open file!"); #@raw_data=<CUST_DATA>; #Done reading the file - close it while (<CUST_DATA>) { ($CUST_ID,$CUST_NAME) = split(':',$_); $CUST_NAME{$CUST_ID}=$CUST_NAME; if (length($CUST_NAME)>$maxCUST_NAMElength) { $maxCUST_NAMElength=length($CUST_NAME); } } close(CUST_DATA); # '<' means read only open(BILL_DATA,'<', $billing_file) || die("Could not open file!"); @rawbill_data=<BILL_DATA>; close(BILL_DATA);
Im stuck how to extract the data based on the value from first text file. :)
Thanks in advance for your help.
  • Comment on How to Extract Data from Text file 1 based on the Value from Text file 2
  • Download Code

Replies are listed 'Best First'.
Re: How to Extract Data from Text file 1 based on the Value from Text file 2
by Ratazong (Monsignor) on Mar 04, 2010 at 10:54 UTC

    Hi!

    The following shall give you some hints:

    • Read the second file line-by-line (as you did with file1)
    • apply a regular expression to extract the customer-id (and the rest of relevant data); see the fragment below
      $_ =~ /(\d*).*SWITCH\sFastEthernet(\d)/; # just an example on how +to extract some data my $id = $1; # get the iD my $switchNr = $2; # get the number of the s +witch
    • append the text to the hash (already formatted for the output); you could use sth. like the following code
      $CUST_NAME{$id} .= "\nSWITCH Fa$switchNr ..."; # append the output, al +ready formatted
    • pass through all the keys of the hash and print the values (pretty straightforward)

    HTH, Rata

Re: How to Extract Data from Text file 1 based on the Value from Text file 2
by almut (Canon) on Mar 04, 2010 at 11:21 UTC
    #!/usr/bin/perl use strict; use warnings; my $cust_file = <<'CUST'; 200612:ABC 200905:DEF 200212:GHI CUST # create customer ID => name lookup table my %cust_id2name; open my $cust_fh, "<", \$cust_file or die $!; while (<$cust_fh>) { chomp; my ($id, $name) = split /:/; $cust_id2name{$id} = $name; } # extract data from billing file # and store indexed by customer name my %report; while (<DATA>) { my ($cust_id, $usage) = (split ' ')[0,-1]; my ($switch) = /(SWITCH FastEthernet.*(?:IN|OUT))/; $switch =~ s/FastEthernet/Fa/; my $cust_name = $cust_id2name{$cust_id}; push @{$report{$cust_name}}, "$switch $usage" unless $usage == 0; } # print report for my $cust (sort keys %report) { print "CUSTOMER NAME:$cust\n"; print "Device_Code Port Traf_Dir Data_Usage\n"; print "$_\n" for @{$report{$cust}}; print "\n"; } __DATA__ 200612 200612 SWITCH FastEthernet1_0_33 OUT 01-Feb-2010_02:00 01-Mar-2 +010_02:00 4.336 1.311 200612 200612 SWITCH FastEthernet1_0_33 IN 01-Feb-2010_02:00 01-Mar-20 +10_02:00 35.435 10.716 200612 200612 SWITCH FastEthernet1_0_35 OUT 01-Feb-2010_02:00 01-Mar-2 +010_02:00 167.976 50.796 200612 200612 SWITCH FastEthernet1_0_35 IN 01-Feb-2010_02:00 01-Mar-20 +10_02:00 26.064 7.882 200612 200612 SWITCH FastEthernet2_0_33 OUT 01-Feb-2010_02:00 01-Mar-2 +010_02:00 -0.000 -0.000 200612 200612 SWITCH FastEthernet2_0_33 IN 01-Feb-2010_02:00 01-Mar-20 +10_02:00 -0.000 -0.000 200612 200612 SWITCH FastEthernet2_0_35 OUT 01-Feb-2010_02:00 01-Mar-2 +010_02:00 -0.000 -0.000 200612 200612 SWITCH FastEthernet2_0_35 IN 01-Feb-2010_02:00 01-Mar-20 +10_02:00 -0.000 -0.000 200905 200905 SWITCH FastEthernet1_0_48 OUT 01-Feb-2010_02:00 01-Mar-2 +010_02:00 3.193 0.965 200905 200905 SWITCH FastEthernet1_0_48 IN 01-Feb-2010_02:00 01-Mar-20 +10_02:00 41.499 12.549 200905 200905 SWITCH FastEthernet2_0_48 OUT 01-Feb-2010_02:00 01-Mar-2 +010_02:00 -0.000 -0.000 200905 200905 SWITCH FastEthernet2_0_48 IN 01-Feb-2010_02:00 01-Mar-20 +10_02:00 -0.000 -0.000 200212 200212 SWITCH FastEthernet1_0_19 OUT 01-Feb-2010_02:00 01-Mar-2 +010_02:00 3.877 1.172 200212 200212 SWITCH FastEthernet1_0_19 IN 01-Feb-2010_02:00 01-Mar-20 +10_02:00 0.837 0.253 200212 200212 SWITCH FastEthernet2_0_19 OUT 01-Feb-2010_02:00 01-Mar-2 +010_02:00 -0.000 -0.000 200212 200212 SWITCH FastEthernet2_0_19 IN 01-Feb-2010_02:00 01-Mar-20 +10_02:00 -0.000 -0.000

    Output:

    CUSTOMER NAME:ABC Device_Code Port Traf_Dir Data_Usage SWITCH Fa1_0_33 OUT 1.311 SWITCH Fa1_0_33 IN 10.716 SWITCH Fa1_0_35 OUT 50.796 SWITCH Fa1_0_35 IN 7.882 CUSTOMER NAME:DEF Device_Code Port Traf_Dir Data_Usage SWITCH Fa1_0_48 OUT 0.965 SWITCH Fa1_0_48 IN 12.549 CUSTOMER NAME:GHI Device_Code Port Traf_Dir Data_Usage SWITCH Fa1_0_19 OUT 1.172 SWITCH Fa1_0_19 IN 0.253

    (Of course, you should open your original files instead of __DATA__, etc.)

Re: How to Extract Data from Text file 1 based on the Value from Text file 2
by serf (Chaplain) on Mar 04, 2010 at 11:58 UTC
    Hi roborat, welcome to Perl Monks. While almut was away coding you a solution, I was doing the same... and got a few interruptions so was beaten to it... There are a couple of things that I would correct in your example code, because they will help you in your future work.
    #use strict; #use warnings;
    use warnings and use strict are your friends, you should use them. You have them commented out here. This isn't C where you have # in the front of declarations, in Perl a # at the start of a line is usally for commenting out that line.
    open(CUST_DATA,'<', $cust_file) || die("Could not open file!");
    If you're going to die on failure (which you should do) you'll help yourself a lot by giving as much information as possible as to why it's failed. If the failure is due to an incorrect file name, you'll want to know what file name the program was trying to open, to see if it's different from what you were expecting it to be using. It can help to protect the name in single quotes, like: die "Can't read '$filename': $!\n" so you can easily spot leading or trailing spaces in the name which can catch you. Also use the "$!" at the end of the line which tells you the error message associated with the reason the action failed - this is a BIG help. Here is my quick stab at the code... Things to note, I like to use hash references because it can make it easier to deal with the data later. I've trapped to catch lines with unrecognized customer IDs. I've also sorted by customer name. You could turn these off if not applicable, but they may be useful. I hope this helps...
    #!/usr/bin/perl # # # use warnings; use strict; my $cust_file = "customer.txt"; my $billing_file = "2010.bill"; my $cust; open (CUST_DATA, $cust_file) || die "Can't read '$cust_file': $!\n"; while (defined (my $line = <CUST_DATA>)) { chomp $line; # Remove trailing newline my ($cust_id,$cust_name) = split(/:/,$line); $cust->{$cust_id}->{name} = $cust_name; } close(CUST_DATA); my @bill_fields = qw(cust_id x_cust_id2 Device_Code Port Traf_Dir x_st +art_date x_end_date x_number Data_Usage); my $bill_line; my $bill_data; open(BILL_DATA, $billing_file) || die "Can't read '$billing_file': $!\ +n"; while (defined (my $line = <BILL_DATA>)) { chomp $line; @{$bill_line}{@bill_fields} = split(/\s/, $line); if ( ! $cust->{ $bill_line->{cust_id} }->{name} ) { die "$0: Found unknown customer ID [$bill_line->{cust_id}] on +line $. of '$billing_file'\n"; } $bill_data->{ $bill_line->{cust_id} }->{ $bill_line->{Device_Code} + }->{ $bill_line->{Port} }->{ $bill_line->{Traf_Dir} } = $bill_line-> +{Data_Usage}; } close(BILL_DATA); sub by_name ($$) { return $cust->{$a}->{name} <=> $cust->{$b}->{name}; } for my $customer ( keys %$cust ) { print "CUSTOMER NAME: $cust->{$customer}->{name}\n" . "Device_Code Port Traf_Dir Data_Usage\n"; for my $device ( sort by_name keys %{$bill_data->{$customer}} ) { for my $port ( sort keys %{$bill_data->{$customer}->{$device}} + ) { (my $short_port = $port) =~ s/^FastEthernet/Fa/g; for my $direction qw(OUT IN) { print "$device $short_port $direction " . $bill_data->{$customer}->{$device}->{$port}->{$dir +ection} . $/; } } } }
      Hi Rata, Almut and Serv, Thanks a lot for you reply.
      I'll study and try to update the suggested codes.
      Thanks a lot Perl Monks. :)
      Hi Serf, Thanks a lot, the code is working as I need to. Right Now, I'm trying to put the each customer reports to each separate txt or html file.
      Can please give a hint or show how to do it?
      Thanks a Lot!
        If you change the last block to this it should do what you want.

        Modify the "$output" filename to suit your needs.

        for my $customer ( keys %$cust ) { # If you want to write to a file named with the customer ID # my $output = "./$customer.out"; # If you want to write to a file named with the customer name my $output = "./$cust->{$customer}->{name}.out"; open (CUST_OUT, ">$output") || die "Can't write to '$output': $!\n +"; print CUST_OUT "CUSTOMER NAME: $cust->{$customer}->{name}\n" . "Device_Code Port Traf_Dir Data_Usage\n"; for my $device ( sort by_name keys %{$bill_data->{$customer}} ) { for my $port ( sort keys %{$bill_data->{$customer}->{$device}} + ) { (my $short_port = $port) =~ s/^FastEthernet/Fa/g; for my $direction qw(OUT IN) { print CUST_OUT "$device $short_port $direction " . $bill_data->{$customer}->{$device}->{$port}->{$dir +ection} . $/; } } } close CUST_OUT; }