malaga has asked for the wisdom of the Perl Monks concerning the following question:

i am using the sub routine below when accessing 6 different text files that look something like this:

"ID" "Course" 9414 "Nutitional Epidemiology" 9414 "Nutritional assessment" 9414 "Undernutrition in the United States" 1371 "Health Politics and Policy" 1371 "Advanced Health Politics?" 1371 "Introduction to Health Policy & Management" 1371 "Health and Public Policy Seminar"


Problem #1: in some cases it's ok to just list everything. but with one file i need to (i think) assign each item to a scalar so i can move it around in the order i want.

Problem #2: in either case i need to get rid of the ""'s, and the labels that are printing out, and also the ID number (the value we are searching on) Here's how it prints right now:

"ID": 9414 "Dept": "Division of Something" "Fax": "5551212" "MiddleName": "Title": "Professor of Something" "LastName": "Black" "Suffix": "PhD" "Address": "111 1st Street" "Email": "abc@acb.com" "FirstName": "Gladys" "Phone": "5551212" "Photo": "Black.jpg"


here's the code:

sub getrows { my $value = param('name'); my %data = (); my @fields = split(/\t/, <FILE>); chomp @fields; my @records; while(<FILE>) { chomp; my @row = split(/\t/); if ($row[0] eq $value) { my %data; @data{@fields} = @row; push @records, \%data; } } close (FILE); for my $ref (@records){ my %data = %$ref; print ul( map { li("$_: $data{$_}") } keys %data); } }#end getrows


i've been trying to do substiutions, or to extract only the part of the data i need, but i feel like i'm thrashing around and not making any headway. i was also trying to separate the printing into a different subroutine so that i can list the data in some cases but not others, but couldn't get the scoping right. can anyone tell me the best way to approach any of this? i would really appreciate it.

Replies are listed 'Best First'.
Re: getting the right stuff
by chromatic (Archbishop) on Feb 06, 2001 at 09:24 UTC
    Text::ParseWords can help in splitting the string up into tokens, and it's really nice when getting rid of quotes that may or may not be there:
    use Text::ParseWords; <DATA>; while (<DATA>) { chomp; my @words = shellwords($_); print "ID: $words[0]\tCourse: $words[1]\n"; } __END__ "ID" "Course" 9414 "Nutitional Epidemiology" 9414 "Nutritional assessment" 9414 "Undernutrition in the United States" 1371 "Health Politics and Policy" 1371 "Advanced Health Politics?" 1371 "Introduction to Health Policy & Management"
    If you separate out the printing, it's probably best to return a reference to @records, then pass that to the printing subroutine. You can probably also pass a list of hash keys to print (use a hash slice or a subset):
    my @keys = qw ( ID Dept Fax); print ul( map { li("$_: $data{$_}") } @keys);
    I think that's what you mean.
      i tried the Text::ParseWords, and it gave me back something like:
      : : : : : : : :
      i probably didn't do it right. but, going back to what i was doing before, once i have the data i need in @records, isn't there a way to get the " out with s///? or something simple?

      with the printing, what i want to do is lose the labels of ID, Dept, etc.

      i've been going over this - over and over - and i think i understand what i have right up to @records. i can draw a picture of where the data is going right up till then. but i'm not sure what the data looks like by the time it's in @records.

      i think what would probably solve all of my problems (because I would be able to call each item) is if i could assign a scalar variable to each item. so that i could say something like:
      print $ID, $firstname $middlename $lastname\n; print $dept, $address, $phone\n;
      does this make any sense?

        To get rid of a character from a string, use tr/"//d (you need to add the d on the end for tr to delete)... this is also really handy for a range of characters, e.g. tr/"'//d;. Of course you have to bind the tr operator to the string you want stripped as in $string =~ tr/A-Z/a-z/;

        HTH

        Philosophy can be made out of anything. Or less -- Jerry A. Fodor

        @records is just an ordered list. If you assigned to it directly, it would look something like:
        my @records; @records = qw( name ID email phone );
        That means the 0th element is 'name', and the 3rd element is 'phone'. In code terms:

        print "$records[0]\t$records[3]\n";

        produces name    phone. Does that help?

        If you find accessing data by name instead of by index is easier, you can assign to a hash:

        while (<DATA>) { chomp; my ($key, $value) = split; $value =~ tr/"//d; my %record; $record{$key} = $value; push @records, \%record; }
        You can then loop through records, printing just the elements you want:
        foreach my $rec (@records) { print "ID: $rec->{ID}\n"; print "Name: $rec->{name}\n"; }
        Is that more clear?
Re: getting the right stuff
by malaga (Pilgrim) on Feb 06, 2001 at 23:32 UTC
    i'm an idiot. i was saving my txt files with the quote as the separator, so i fixed that and it's not a problem now. i'm still an idiot about the rest of the stuff - being able to call each item separately to print it, but i haven't figured that one out yet. ugh.
Re: getting the right stuff
by malaga (Pilgrim) on Feb 07, 2001 at 04:51 UTC
    ok, i'm getting soooo close (maybe). i now have just the course printing, but it's printing the same course for each key. what am i doing wrong?
    my $value = param('name');#the value passed from the webpage my %data = (); my @fields = split(/\t/, <FILE>); my $row; my $id; my $course; chomp @fields; my @records; my @course; while(<FILE>) { chomp; my @row = split(/\t/); if ($row[0] eq $value) { ($id, $course) = split (/\t/); my %data; #then put the row into a hash. @data{@fields} = @row; push @records, \%data; } } close (FILE); for my $ref (@records){ my %data = %$ref; print ul( map { ul("$course") } keys %data);
      print ul( map { ul("$course") } keys %data);

      looks wrong. What you're doing here is printing the same $course (which contains the last course read) for every record. Also you're going to get Extremely Funky HTML (lists inside lists, without any items?) What you want is probably something like:

      print "<ul>"; foreach my $ref ( @records ) { print "<li>", $ref->{Course}, "</li>"; } print "</ul>";

      Do you see the difference? While $course refers to the variable that never changes in your final loop, whereas $ref->{course} refers to the data in the hash reference whose key is the string "course". This does change because $ref changes for every iteration.

      Update: changes "course" to "Course" in key.

        that's giving me an empty bulleted list. i'm still playing around with it, but i'm not getting anything yet.
Re: getting the right stuff
by malaga (Pilgrim) on Feb 07, 2001 at 11:16 UTC
    i thought i had it all! but noooooooo.

    i need to print more than one 'item' (where the *****'s are). what am i doing wrong?
    my $value = param('name');#the value passed from the webpage my %data = (); my @fields = split(/\t/, <FILE>); chomp @fields; my @records; my %ref; while(<FILE>) { chomp; my @row = split(/\t/); if ($row[0] eq $value) { my %data; @data{@fields} = @row; push @records, \%data; } } close (FILE); print "<ul>"; foreach my $ref ( @records ) { #print $ref->{'FirstName'}, "<br>"; *****i'm using this when there + is only one thing to print print @{$ref}{qw/FirstName MiddleName LastName}; *****i was told t +o use this for more than one, but can't make it work. } print "</ul>";
      In your second print statement, there is no "/" between LastName and the curly brace. You'll need to close it like this:

      print @$ref{ qw/FirstName MiddleName LastName/ };

      If that doesn't solve your problem, look into the way you are constructing each hashref in the first loop. Maybe the keys in @fields aren't correct? Data::Dumper will be your friend in times like these. Use it to peer inside @results.

      Also, a side note, you may be able to simplify and/or speed up the parsing routine you have using DBD::CSV along with DBI. Not only is the underlying engine it uses faster than regexes, but it handles all the parsing details for you. Another benefit is it provides you a little less painful upgrade path in the future if you need to move to a relational database.

      Here's an example of what you were doing, only using DBI:

      #Connect to the data source my $dbh = DBI->connect( "DBI:CSV(RaiseError=>1,AutoCommit=>1,Taint=>1,ChopBlanks=>1):csv_sep +_char=\t" ); my $statement = q{ SELECT * FROM filename WHERE FirstName = ? }; my $sth = $dbh->prepare($statement); $sth->execute($value); while(my $row = $sth->fetchrow_hashref) { print @$row{ qw(FirstName MiddleName LastName) }; }

      Make sure that the word "filename" inside $statement gets changed to whatever file you're parsing. You will need to rename the file so that it has no extension, as DBD::CSV requires this.

      Update: I shouldn't have used SELECT * like above, this requests all the fields, not just what I need. In some cases using * in SQL can be very evil when you've got alot of records. The general rule of thumb is you should try to SELECT only those columns that are necessary. Sorry, my bad.

      It should read like this:

      my $statement = q{ SELECT FirstName , MiddleName , LastName FROM filename WHERE FirstName = ? };
        thanks a lot for the help!we can't get the server guys to put dbi on right now. it's a future type of thing. but i might try it on a different server just for practice. putting the slash in gives me the first and last item, but leaves out the middle one. do i need anything else in between those?