(jeffa) Re: Extracting info from URL into an array

UPDATE:
I didn't see that you are pulling the URI's out of files the first time i read your question. There really is no reason to use URI::Find if you have already found the URI's. ;) This code reads all text files (.txt extension) in your test1 folder. I used an absolute path in the glob instead of the .. metacharacter. I also assume that the files will always have the URI on the first line (that starts with a scheme) and will always end with the .txt extension.

use strict; 
use warnings;

use URI;
use File::Basename;

my @suffix = qw(.jsp .html .asp .htm);

for (</path/to/test1/*.txt>) {
    open (FH,$_);
    my $uri = URI->new(<FH>);
    close FH;

    next unless $uri->scheme;

    my %q = $uri->query_form;
    my (undef,@key) = split( /\//, dirname($q{content}) ); 
    push @key, basename($q{content},@suffix);

    print "<Textfile>\n",
        "filename: {", basename($_), "}\n",
        "Keys: {", join(',',@key), "}\n",
        "</Textfile>\n",
    ;
}
[download]

ORIGNAL POST:
Well, you request is confusing at best. If you want to parse URI's, URI::Find is a fine tool for doing so. Simply pass it a reference to a scalar (in my example i use the built-in DATA filehandle) and it will find the URI's for you. You can also pass a reference to a subroutine (or an anonymous sub) and URI::Find will call it every time it encounters a URI. Here is some code that sort of Does What You Want. File::Basename is used to remove the extension ... but i am starting to think that a better approach would be to remove any extension and split on the forward slash. Anyways, it's a start:

use strict; 
use warnings;

use URI::Find;
use File::Basename;

# add more if needed
my @suffix = qw(.jsp .html .asp .htm);

# optionally open a file here and replace DATA 
# with the name of the filehandle you opened
my $data = do {local $/;<DATA>};

my $finder = URI::Find->new(\&call_back);
$finder->find(\$data);

sub call_back {
    my $uri = shift;
    my %q = $uri->query_form;
    my $content = $q{content};

    # using split like this is a hack ... improvements anyone?
    my (undef,@key) = split(/\//,dirname($content));

    # this will add the file name minus its extension 
    push @key, basename($content,@suffix);

    # you could push these to an array instead of printing
    print "Filename: {", basename($content), "}\n";
    print "Keys: {", join(',',@key), "}\n\n";
}

__DATA__
http://www.yyy.com/store/application/meraqf?origin=rrr.jsp&event=link(
+goto)&content=/asp/administrative/catalog/products/Network/benefits.j
+sp

is this text automatically 'ignored'? yes, it is ;)
http://foo.com/?content=/asp/management/catalog/products/Network/propa
+ganda.asp
http://foo.com/?content=/path/to/bar.html
[download]

jeffa

L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)

Comment on (jeffa) Re: Extracting info from URL into an array Select or Download Code

Replies are listed 'Best First'.
Re: (jeffa) Re: Extracting info from URL into an array by Anonymous Monk on May 27, 2003 at 15:07 UTC
Jeffa Thank you very much for your help!! :). I will try it now!!	[reply]