in reply to HTML Parsing /Regex Qstn

You can use something like following code. Need to have HTML::Tree installed.
use HTML::TreeBuilder; use Data::Dump qw{dump}; my $tree = HTML::TreeBuilder->new_from_file("your_html_file"); my %output = (); for my $div ($tree->find("div")) { if(my $titlefield = $div->look_down(class => "titlefield")) { my $href = $titlefield->attr("href"); $output{$href} = [$titlefield->attr("title")]; my $date = ""; if(my $datefield = $div->look_down(class => "datefield")) { $date = $datefield->as_text(); } push @{$output{$href}}, $date; # ... } } dump { %output };
-- Roman

Replies are listed 'Best First'.
Re^2: HTML Parsing /Regex Qstn
by sri1230 (Novice) on Jan 21, 2010 at 16:53 UTC
    Thanks Roman. And i will make sure i put the code tags next time. Thanks again all of you!

      Yes, do keep <code>...</code> tags in mind for next time... but there is absolutely no reason you can't go back right now and edit your prior post.

      HTH,

      planetscape
      Roman - One more question.. How do i get the content directly inside the div tag..the stuff that is'nt in any of those inner tags?
        You can traverse content_list of any element. Text is plain, while other tags are references.
        ... for my $part ($div->content_list) { print $part,"\n" if !ref($part); } ...
        If you want all text (including inner tags), you can use as_text method of elements. In this case it would also return title and date, but generally it is useful.

        --Roman