Help with Toke Parser

StarkRavingCalm has asked for the wisdom of the Perl Monks concerning the following question:

good day monks

I have a script that will be used in a larger script but this part is giving me some trouble.

My goal for this part of the script is to perform a file listing on a webpage and into a hash with filename as key and file size as value.

But I have been unable to find a way to get filesize so I have tried to do it with just an array of filenames.

Here is the code as it currently stands, the issue is that it only prints the last element outside the loop, inside the loop it prints all of them. If anyone has a way to get it work with a hash as mentioned above, I'd rather that than messing with the array problem.

#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
use HTML::TokeParser;
use LWP::Simple;
use File::Basename;
use List::Compare;



## POC URL:
my $page=get('http://localhost/images');


my %urlhash;
my @urlfiles;
my @array;
my @newarray;


my $p= HTML::TokeParser->new(\$page);

while (my $token = $p->get_tag("a")) {


@array = $token->[1]{href} || "-";
my $text = $p->get_trimmed_text("/a");

## Just a few lines of crap cleaner...

for (@array) {s/test.txt//g};
for (@array) {s/\///g};
for (@array) {s/\?C\=N;O\=D//g};
for (@array) {s/\?C\=M;O\=A//g};
for (@array) {s/\?C\=S;O\=A//g};
for (@array) {s/\?C\=D;O\=A//g};


#print "@array\n";

}

print "@array\n";
[download]

The crap cleaner section is to remove that from the webpage. I have removed it on my POC Apache server, but it exists on the server I will run it against, which I have no control over.

Thanks in advance!

Comment on Help with Toke Parser Download Code

Replies are listed 'Best First'.
Re: Help with Toke Parser by tangent (Parson) on Oct 27, 2015 at 21:08 UTC
As you are already using LWP::Simple you can use that module's head() function to retrieve the size of the file: `my %hash; while ( my $token = $p->get_tag("a") ) { if ( my $href = $token->[1]{'href'} ) { # may need to prefix domain to $href my ($type, $length, $mod, $exp, $server) = head($href); $hash{$href} = $length; } }` [download]	[reply] [d/l]
Re^2: Help with Toke Parser by Anonymous Monk on Oct 28, 2015 at 01:32 UTC
Awesome! Thanks. Works great. Would still love to use a hash but this will get me to where I need for now.	[reply]
Re: Help with Toke Parser by stevieb (Canon) on Oct 27, 2015 at 20:52 UTC
I haven't ever used `HTML::TokeParser` so I'm unaware on how to get the file's size, but your array issue looks like it stems from the fact you're overwriting it in each loop, which explains why you are only getting the last element (actually, there would only be a single element, the one produced in the last loop of `while()`): `@array = $token->[1]{href} \|\| "-";` [download] I think what you want is this instead (see push): `push @array, $token->[1]{href} \|\| "-";` [download] Then it may be best to do the cleanup after `while()` loop: `my $p= HTML::TokeParser->new(\$page); while (my $token = $p->get_tag("a")) { push @array, $token->[1]{href} \|\| "-"; my $text = $p->get_trimmed_text("/a"); } for (@array){ next if /^-$/; # skip if line eq '-' s/ test.txt \| \/ \| \?C\=N;O\=D \| \?C\=M;O\=A \| \?C\=S;O\=A \| \?C\=D;O\=A //xg; }` [download] To understand how I've turned your multiple regexes into a single one with embedded whitespace for clarity, see x modifier in perlre.	[reply] [d/l] [select]