Please don't give up on perl just yet. As others have pointed out, it certainly is possible for a subroutine to modify a variable of the caller. In fact, your script is doing that but you then throw the modified value away without using it.
Try the following:
use strict; use warnings; use diagnostics; use HTML::TreeBuilder; use HTML::Entities; use HTML::Element; sub traverse ; foreach my $file_name (@ARGV) { my $tree = HTML::TreeBuilder->new ; $tree->parse_file($file_name); $tree->dump ; print "\n\nWhere would you like to put the output file? " ; my $output = <STDIN> ; open OUTPUT_FILE, "> $output" or die $! ; select OUTPUT_FILE ; traverse ($tree); $tree = $tree->delete ; close OUTPUT_FILE or die $!; } sub traverse { foreach (@_) { if ($_) { if (ref $_) { print STDERR $_->tag(), "\n\n" ; if ($_->tag() ne "head" && $_->tag() ne "script" && $_->tag() ne "img" && $_->tag() ne "object" && $_->tag() ne "applet") { my @contents = $_->content_list() ; print STDERR "before: @contents\n"; traverse (@contents) ; print STDERR "after: @contents\n"; } if (!$_->parent) { my $s = $_->as_HTML ("",{}) ; $s =~ s/></>\n</g ; print $s ; } } else { print STDERR "Processing a string element...\n" ; $_ =~ s/\s&\s/ & /g ; $_ =~ s/</</g ; $_ =~ s/>/>/g ; $_ =~ s/'em\s/’em /g ; $_ =~ s/'tis\s/’tis /g ; $_ =~ s/'twas\s/’twas /g ; $_ =~ s/'Twas\s/’Twas /g ; $_ =~ s/'Tis\s/’Tis / ; $_ =~ s/'\s/’ /g ; $_ =~ s/^'/‘/g ; $_ =~ s/(\s)'/$1‘/g ; $_ =~ s/"'/“lsquo;/g ; $_ =~ s/'"/’”/g ; $_ =~ s/\s"/ “/g ; $_ =~ s/^'/‘/g ; $_ =~ s/^"/“/g ; $_ =~ s/"\s/” /g ; $_ =~ s/'$/’/g ; $_ =~ s/"$/”/g ; $_ =~ s/(,|\.)'/$1’/g ; $_ =~ s/(,|\.)"/$1”/g ; $_ =~ s/(\S)'(\S)/$1’$2/g ; print STDERR ($_ , "\n\n"); } } } }
On a simple test file:
<html> <head> <title>test</title> </head> <body> This is some "text" in the body. </body> </html>
This produced the following output:
<html> @0 <head> @0.0 <title> @0.0.0 "test" <body> @0.1 " This is some "text" in the body. " Where would you like to put the output file? test.ou html before: HTML::Element=HASH(0x841e528) HTML::Element=HASH(0x841e5c8) head body before: This is some "text" in the body. Processing a string element... This is some “text” in the body. after: This is some “text” in the body. after: HTML::Element=HASH(0x841e528) HTML::Element=HASH(0x841e5c8)
Note that the "after" value of @content is different from the "before" value - the subroutine is modifying the callers variable. But this variable is a lexical (my) variable within the scope of the block of the if statement. When it goes out of scope it is discarded without your having done anything with it.
update: If you want to modify the content of one of the nodes you might find the content_refs_list method useful. From the HTML::Element documentation:
This returns a list of scalar references to each element of $h’s content list. This is useful in case you want to in-place edit any large text segments without having to get a copy of the current value of that segment value, modify that copy, then use the "splice_content" to replace the old with the new. Instead, here you can in-place edit:
It will be well worth your time to read HTML::Element carefully.
Using this method will require some change to your traverse sub.
Alternatively, you can explicitly replace the node content as you traverse the tree with something like the following:
{ my @contents = $_->content_list() ; print STDERR "before: @contents\n"; traverse (@contents) ; print STDERR "after: @contents\n"; $_->detach_content(); $_->push_content(@contents); }
In reply to Re: Modifying a parameter to a recursive function
by ig
in thread Modifying a parameter to a recursive function
by CoDeReBeL
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |