phantom85 has asked for the wisdom of the Perl Monks concerning the following question:

Hello, so I am working with WWW::Mechanize to submit form and get class schedule and I have extracted the part of HTML I am interested in. My problem is now I want to put the data in the hash table and I don't know where to start. Following is the code I have so far, and I print it out to file just to test if it returns correct data

use WWW::Mechanize qw(); use IO::Socket::SSL qw(); use HTML::TreeBuilder; use 5.10.0; use strict; use warnings; my $mech = WWW::Mechanize->new(ssl_opts => { SSL_verify_mode => IO::Socket::SSL::SSL_VERIFY_NONE, verify_hostname => 0, }); my $url = "scheduleurl"; $mech->get($url); my $filename = 'out.htm'; my $result = $mech->submit_form( form_number => 2, fields => { "ctl00\$ContentPlaceHolder1\$TermDDL" => 2171, "ctl00\$ContentPlaceHolder1\$ClassSubject" => 'CS', } ,button => "ctl00\$ContentPlaceHolder1\$SearchButton" ); $mech->submit(); #print $result->content(); open(my $fhandle, '>', $filename) or die "Could not open file '$filena +me' $!"; my $tree = HTML::TreeBuilder->new_from_content($result->content); if (my $div = $tree->look_down(_tag => "div", id => "class_list")){ #print $div->as_text(), "\n"; # say $fhandle $div->as_HTML(),"\n"; my @list = $div->find(_tag => 'ol'); #print Dumper \@list; foreach (@list) { say $fhandle $_->as_HTML(); } } close $fhandle; $tree->delete();

So the script prints this to file I just pasted one item in the list but there are multiple items with the same format.

<ol> <li><span class="ClassTitle"><strong>CS 128</strong></span> Section 01 + <table border="0" cellpadding="5" cellspacing="0" class="GridView" +id="ClassDetails_TBL" width="99%"> <tr> <th align="right" id="TableHeaderCell8" nowrap>Class Nbr</th> <td id="TableCell13">11647</td> <th align="right" id="TableHeaderCell9" nowrap>Capacity</th> <td id="TableCell14">30</td> </tr> <tr> <th align="right" id="TableCell5" nowrap>Title</th> <td class="tablealtstyle" id="TableCell8">Introduction to +C++</td> <th align="right" id="TableCell8a" nowrap>Units</th> <td class="tablealtstyle" id="TableCell9">4</td> </tr> <tr> <th align="right" id="TableCell11" nowrap>Time</th> <td id="TableCell1">3:00 PM&ndash;4:50 PM&nbsp;&nbsp;&nbsp; +TuTh</td> <th align="right" id="TableCell15">Building/Room</th> <td id="TableCell2">8 52</td> </tr> </table> </li> <li></li> ...

so I want to put that data in to the hash table with keys being class titles and values are class information.

{ CS128 Section 01 => { Class Nbr => 11647, Capacity => 30, Title => Introduction to C++, Units => 4, Time => 3:00PM- 4:50PM TuTh, Room => 8 52 } }

Replies are listed 'Best First'.
Re: html to hash table
by Corion (Patriarch) on Oct 30, 2016 at 06:51 UTC
Re: html to hash table
by duyet (Friar) on Oct 30, 2016 at 09:29 UTC
    Base on the content of your @list:
    my $data = {}; foreach my $item ( @list ) { my $span = $tree->look_down( _tag => 'span' ); my $title = $span->as_trimmed_text() . $span->right(); for my $row ( $tree->look_down( _tag => q{tr} )) { my @keys = $row->look_down( _tag => q{th} ); my @vals = $row->look_down( _tag => q{td} ); for ( my $i = 0; $i < scalar @keys; $i++ ) { my $key = $keys[ $i ]->as_trimmed_text(); my $val = $vals[ $i ]->as_trimmed_text(); $data->{ $title }{ $key } = $val; } } } print 'data: ' . Dumper( $data ); }
    Result:
    data: $VAR1 = { 'CS 128 Section 01 ' => { 'Capacity' => '30', 'Building/Room' => '8 52', 'Title' => 'Introduction to C++', 'Time' => "3:00 PM\x{2013}4:50 PM\ +x{a0}\x{a0}\x{a0}TuTh", 'Class Nbr' => '11647', 'Units' => '4' } };
    Look at HTML/Element for more info
      Thank you it works, one more question if I want to access 'Time' for 'CS 128 Section 01' class how would i do that?
        print $data->{'CS 128 Section 01'}{Time}
        If you have more data you can loop thru it:
        foreach my $class ( keys %{ $data }) { print $data->{ $class }{Time} # do something else with other items ... }
Re: html to hash table
by perl-diddler (Chaplain) on Oct 30, 2016 at 22:41 UTC
    It seems you are wanting "actions" to be called based on HTML elements.

    I don't know how well it works, but HTML::Parser (H::P) has options to call your "callout function" for the opening and closing tags of specific HTML elements, or all of them. When you specify the callout functions, you tell H::P what you want passed to your function. For example I wanted to see the start/stop/text and non-parsed text (DATA/javascript) so I specified functions for each (mayka is a anon-sub creation routine that included some routine error checks and such).

    $p->parser(HTML::Parser->new("api_version" => 3, start_h => [ mayka($p,6,start_h => \&_start), "tag,skipped_text,attr,attrseq,line,text"], end_h => [ mayka($p,4,end_h => \&_end), "tagname,skipped_text,line,offset_end"], text_h => [ mayka($p,3, text_h => \&_text), "tag,skipped_text,text"], default_h => [ mayka($p,3,default_h => \&_dflt), "event, skipped_text, text"], marked_sections => 1,));
    The last parameter specified what elements I wanted passed to my function, so for tags that had class labels, I could store & nest them.

    Note -- I may easily be missing some functionality in WWW::Mechanize, but I didn't see the ability to process class or ID values when they started and ended. It does get a bit hairy trying to keep track of them, since nested elements preempt and assign class+id's to children, and when those elements end, the class+id revert to whatever was in place before you encountered that element (i.e. need to maintain a stack)...

    hope this helps...