comment on

I'll apologize up front--I'm not answering your question. Instead, I'm going to provide a couple comments on your code.

When you create a subroutine, it's a bad idea to add a prototype (i.e., the parenthesized part) unless you know *exactly* what you're asking for. Perl prototypes aren't like prototypes in other languages.
When you write your code with good variable names and flow, then it's generally self-documenting, and comments can actually get in the way of making your intentions clear. Your comments are large and blocky, so they can be visually distracting. If you delete most of the comments in your code, it actually reads pretty clearly. If you keep comments, make them simple and non-distracting.

After applying these two suggestions to your subroutine, I get this:

############################################################
#specifica cosa deve fare la subroutine edit
############################################################
sub edit {
    my $file = $_;

    # only operate on html files
    if ((-e $file) && (! -d $file) && (/.html?/)){    
    open (FH, "<",$file) || die $!;

        my $tree = HTML::Tree->new();
    $tree->parse_file($file) || die $!;

        # The main div contains the post of interest
    my $getmaindiv = $tree->look_down(_tag => "div",id  => "post_princ
+ipale") || die $!;
    print $getmaindiv->as_HTML, "\n";    
    close FH;
    }
}
[download]

Most of your subroutine is inside an if statement. In cases like this, I prefer^[*] to simply return if the case isn't met, then you save an indentation level, reducing the visual complexity a bit.

sub edit {
    my $file = $_;

    # only operate on html files
    return unless (-e $file) && (! -d $file) && (/.html?/);

    open (FH, "<",$file) || die $!;

    my $tree = HTML::Tree->new();
    $tree->parse_file($file) || die $!;

    # The main div contains the post of interest
    my $getmaindiv = $tree->look_down(_tag => "div",id  => "post_princ
+ipale") || die $!;
    print $getmaindiv->as_HTML, "\n";    
    close FH;
}
[download]

Now that the code is a little easier to read, I notice that you're not actually using the file handle you open. You're using the HTML parsers ability to accept a filename instead of a file handle. So I'd just remove the file handle code:

sub edit {
    my $file = $_;

    # only operate on html files
    return unless (-e $file) && (! -d $file) && (/.html?/);

    my $tree = HTML::Tree->new();
    $tree->parse_file($file) || die $!;

    # The main div contains the post of interest
    my $getmaindiv = $tree->look_down(_tag => "div",id  => "post_princ
+ipale") || die $!;
    print $getmaindiv->as_HTML, "\n";    
}
[download]

[*] Just one of my preferences. Of course all of my suggestions are based on my preferences, but the other ones are pretty-well accepted, while this one is the most discretionary. Since I'm just another programmer among many, take it with a grain of salt.

I hope you find some of this useful...

Update: I specifically said "don't use prototypes", yet I left the prototype in all versions...removed.

...roboticus

When your only tool is a hammer, all problems look like your thumb.

In reply to Re: Scrape a blog: a statistical approach by roboticus
in thread Scrape a blog: a statistical approach by epimenidecretese

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.