jimmy88 has asked for the wisdom of the Perl Monks concerning the following question:

Please forgive me for my naivety but I'm having an issue that I'm sure one of the wise perl monks can quickly resolve: My script processes all of my input data as desired with the exception of minimum values for the input grades. What is the issue here and how can I ensure accurate output? Once again, sorry to bother by asking for help with something so seemingly simple. Your help is much appreciated.

***students.txt*** 122334:James,Lebron: 222223:Duncan,Tim: 244668:Bryant,Kobe: 355779:Durant,Kevin: ****************** ***grades.txt*** 122334 1 98 222223 1 86 244668 1 89 355779 1 90 122334 2 96 222223 2 88 244668 2 92 355779 2 96 122334 3 97 222223 3 96 244668 3 95 355779 3 94 122334 4 97 222223 4 96 244668 4 95 355779 4 94 122334 5 97 222223 5 96 244668 5 95 355779 5 94 **************** #!/usr/local/bin/perl #Assign class roster file to variable. $students = 'students.txt'; # Open class roster. open (NAMES, "<$students") || die "Couldn't open $students $!"; # Create and populate arrays. while (<NAMES>) { ($id,$name) = split(':',$_); $name{$id} = $name; # Monitor name length for purpose of print formatting. if (length($name)>$longestname) { $longestname = length($name); } # Creat group size variable for later use with class avera +ges. $groupsize = $.; } # Arrays are complete. Close file. close NAMES; # Assign grade spreadsheet to variable. $grades = 'grades.txt'; # Open grade spreadsheet. open (GRADES,"<$grades") || die "Couldn't open $grades $!"; # Create, populate and assign arrays. while (<GRADES>) { ($id,$exam,$grade) = split; $grade{$id,$exam} = $grade; # Monitor exam counter. if ($exam > $lastexam) { $lastexam = $exam; } } # Arrays are complete. Close file. close GRADES; # Create, format and print table headings. printf "%6s %-${longestname}s ", 'ID#','Name'; foreach $exam (1..$lastexam) { printf "%4d",$exam; } printf "%10s",'Total'; printf "%8s",'Avg'; printf "%8s",'Min'; printf "%8s\n\n",'Max'; # Define alphabetical sort subroutine. sub alpha { $name{$a} cmp $name{$b} } # Print formatted student data. foreach $id ( sort alpha keys(%name) ) { printf "%6d %-${longestname}s ", $id,$name{$id}; # Set total point counter to zero. $total = 0; foreach $exam (1..$lastexam) { printf "%4s",$grade{$id,$exam}; # Counter increment. $total += $grade{$id,$exam}; $examtot{$exam} += $grade{$id,$exam}; # Calculate minimum grades. if ($grade{$id,$exam} < $mingrade) { $mingrade = $grade{$id,$exam}; } # Calculate maximum grades. if ($grade{$id,$exam} > $maxgrade) { $maxgrade = $grade{$id,$exam}; } } # Print student's point total. printf "%10d",$total; # Print student's average. printf "%8d",$total / $exam; # Print student's minimum exam grade. printf "%8d", $mingrade; # Print student's maximum exam grade. printf "%8d\n", $maxgrade; } # Print heading for class averages. printf "\n%6s %${longestname}s ",'',"Average: "; # Calculate and print class averages. foreach $exam (1..$lastexam) { printf "%4d",$examtot{$exam} / $groupsize; } # Exit script. exit(0);

Replies are listed 'Best First'.
Re: Finding Minimum Value
by Athanasius (Archbishop) on Sep 22, 2014 at 03:51 UTC

    Hello jimmy88, and welcome to the Monastery!

    Aside from the problem with $mingrade, which atcroft has already addressed, there is a subtle logic error in your script which is masked by the fact that in the data file “grades.txt” the final line’s “exam” happens to be the value of the total number of exams. Change the order of this file to this:

    ... 222223 5 96 244668 5 95 355779 5 94 122334 1 98

    and the averages will all be wrong. The reason for this is the way Perl aliases the variable in a foreach loop: after the foreach $exam (1..$lastexam) { loop has completed, $exam reverts to the value it had before the loop was entered. In this case, you can fix the problem by changing the line:

    printf "%8d",$total / $exam;

    to:

    printf "%8d",$total / $lastexam;

    Note also that this script would be greatly improved by the addition of the lines:

    use strict; use warnings;

    at its head. Converting global variables to lexicals will require some thought (to determine the correct placement of each my declaration), but the effort expended now will be more than repaid by the debugging time you will save down the track. (To silence the warnings, you will also need to initialise some of the variables.)

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Finding Minimum Value
by atcroft (Abbot) on Sep 22, 2014 at 01:48 UTC

    Initialize $mingrade before your loop to either something larger than the largest expected value, or the first value you encountered.

    Hope that helps.

Re: Finding Minimum Value
by hdb (Monsignor) on Sep 22, 2014 at 05:27 UTC

    In addition to what was said above, $grade{$id,$exam} does not do what you expect. Try $grade{$id}{$exam} to create a two-dimensional data structure.

      See also Multi dimensional array emulation in Perldata

      Cheers Rolf

      (addicted to the Perl Programming Language and ☆☆☆☆ :)

      update

      IMHO that's a misnomer, cause its a multidim HASH emulation.

      May be a case where originally a "associative array" was meant, but the terminology should be consistent.

      It might work. See $; in perlvar.
      لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Finding Minimum Value
by pme (Monsignor) on Sep 22, 2014 at 10:33 UTC
    Hi jimmy88,
    You can easily create complex data structures in perl and then simply process all data in two foreachs.
    #!/usr/bin/perl -w use strict; use Data::Dumper; my %student; open STUDENT, "<student.txt" or die "Cannot open student.txt: $!\n"; while (<STUDENT>) { my @student = split(':'); $student{$student[0]} = { name => $student[1] }; } close STUDENT; open GRADES, "<grades.txt" or die "Cannot open grades.txt: $!\n"; while (<GRADES>) { my @grades = split(' '); $student{$grades[0]}->{grade}->{$grades[1]} = $grades[2]; } close GRADES; print Dumper(\%student); print "\n------------------------------\n"; foreach my $id (sort keys %student) { print "Id: $id - $student{$id}->{name}\n"; my $graderef = $student{$id}->{grade}; foreach my $exam (sort keys %$graderef) { print " Exam: $exam grade: $graderef->{$exam}\n"; } }
    Regards
Re: Finding Minimum Value
by Anonymous Monk on Sep 22, 2014 at 00:17 UTC
    Well, you lack subroutines ... and you print the max/min at the same time as you calculate it ... you have to do it in two steps ... subroutines help with that
Re: Finding Minimum Value
by Laurent_R (Canon) on Sep 22, 2014 at 17:47 UTC
Re: Finding Minimum Value
by CountZero (Bishop) on Sep 22, 2014 at 20:06 UTC
    Your script gets much much simpler by putting your data in a database and then using standard SQL to retrieve the data. When working with data and manipulating the data in various ways, always think "database"!

    For example:

    use Modern::Perl '2014'; use DBI; my $dbfile = 'c:/data/school.sqlite'; my $dbh = DBI->connect( "dbi:SQLite:dbname=$dbfile", "", "", { RaiseError => 1 + } ); my $sth = $dbh->prepare('INSERT INTO students (id, name) VALUES (?, ?) +'); while (<DATA>) { chomp; last if /grades.txt/; next unless $_; next if /^\*\*\*/; say $_; my ( $id, $name ) = split /:/; $sth->execute( $id, $name ); } $sth = $dbh->prepare('INSERT INTO results (id, course, score) VALUES (?, ?, + ?)'); while (<DATA>) { chomp; next unless $_; next if /^\*\*\*/; my ( $id, $course, $score ) = split /\s+/; $sth->execute( $id, $course, $score ); } my $ary_ref = $dbh->selectall_arrayref( 'SELECT name, sum(score), avg(score), min(score), max(score) FROM resu +lts JOIN students WHERE students.id = results.id GROUP BY results.id +ORDER BY name' ); printf "%20s %10s %8s %8s %8s\n", qw/Name Total Average Min Max/; for my $line (@$ary_ref) { printf "%20s %10d %8d %8d %8d\n", @$line; } $ary_ref = $dbh->selectall_arrayref( 'SELECT course, avg(score) FROM results GROUP BY course ORDER BY c +ourse'); printf "\n%10s %8s\n", qw/Course Average/; for my $line (@$ary_ref) { printf "%10d %8d\n", @$line; } __DATA__ ***students.txt*** 122334:James,Lebron: 222223:Duncan,Tim: 244668:Bryant,Kobe: 355779:Durant,Kevin: ****************** ***grades.txt*** 122334 1 98 222223 1 86 ...(snip)... 244668 5 95 355779 5 94 ****************
    Output:
    Name Total Average Min Max Bryant,Kobe 466 93 89 95 Duncan,Tim 462 92 86 96 Durant,Kevin 468 93 90 96 James,Lebron 485 97 96 98 Course Average 1 90 2 93 3 95 4 95 5 95

    Retrieving data is just one line of Perl-code and since the data is saved in the database you can use it again and again without running the cost of parsing the input file(s).

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
      When working with data and manipulating the data in various ways, always think "database"!

      I am sorry but I have to object very strongly to this. Especially with the word "always".

      Databases are very good for two things: data persistence and ability to manage data volumes too big for the computer memory. And also the fact that, if your database is SQL, that the SQL language is a very high level and practical language that will hide many implementation details.

      But databases also have a lot of limitations. First, they are horribly slow (compared to hashes in memory). And the languages to manipulate them, such as PL-SQL, are often also horribly slow. Of course, this probably does not matter if you have just tens of thousands of records. But when you get to millions or tens of millions of records, the difference is huge.

      So, if you don't need to store data in a persistent fashion, just probably don't use a database, or, at least, think twice before you do it.

      About a year and a half ago, I was asked to try to improve performance of a very complicated extraction process on a database. Initial duration test led to a prospective execution time of 160 days. After some profiling and benchmarking work, I was able to reduce it to about 60 days, 59.5 of which in a very complicated trans-codification process. Not too bad, but still obviously a nogo. I moved to an extract of raw data files and a reprocessing of the flat files in pure Perl. The overall extraction time fell to about 12 or 13 hours, but the trans-codification part, using half a dozen Perl hashes, fell from 59.5 days to just about an hour, i.e. an improvement of a factor of about 1,400.

      No, it is a bit more complicated. Databases are very useful, there is no doubt about it, but they are certainly not the solution to everything, far from that. Especially when performance is important.

        I said "think database", not that you always have to use one or that it is always the best option.

        However in actual practical real world cases, there is usually a need to persist the data, there is more data than you can keep in the __DATA__ section of your script or the data changes from time to time. So it pays to "think database" in many, if not most, of the day-to-day jobs.

        I am not surprised that Perl outruns any database in doing a complicated trans-codification process. That is one of Perl's main strengths!

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        My blog: Imperial Deltronics
Re: Finding Minimum Value
by james28909 (Deacon) on Sep 22, 2014 at 12:57 UTC
    thats some pretty famous students you got there ;)