JayBee has asked for the wisdom of the Perl Monks concerning the following question:

I am hoping it may be possible to use grep and/or map to change sections of an array. I an using @file = <FILE> to slurp files and then I want to redirect links from that page, down one directory, by appending "../" to them. something like this
open(FILE, "<file.htm") @file = <FILE>; close(FILE); @check_links = ('href="', 'src="'); @new_file = grep { foreach $item (@check_links) { map($_ . "../"; if ($item =~ /$_/, @page) } #end_foreach# } #end_grep# print @new_file;
Basically I want the file that reads
foo <a href="home.htm"><img src="image.gif"></a> foo
to appear as
foo <a href="../home.htm"><img src="../image.gif"></a> foo
I know this may be way off, but that's why I would like your help. Is there a module to do this perhaps? Thank you very much in advance.

Replies are listed 'Best First'.
Re: Altering an array with grep & map
by gaal (Parson) on Jan 18, 2005 at 11:25 UTC
    Yes, it is possible to modify an array with grep and map.

    You mean prepend, by your code.

    You don't neeed to slurp the whole file if you're operating on separate lines anyway.

    grep acts as a filter and returns a list containing only things for which BLOCK evaluated true. If I understand what you're trying to do, then (apart from the fact that you aren't calling grep correctly in terms of syntax, because you aren't feeding it the LIST you are supposed to), you are also using it wrong because it will weed out lines that didn't contain "links".

    If you just use line mode, and take advantage of the fact that substitutions fail silently, you can just do this:

    open my $fh, "<file.htm" or die "open: $!"; while (<$fh>) { s{(href|src)="}{$1="../}g; print; }
      Wow, that is beautiful, and very simple. Exact thing I was looking for. I also just learned a new opening method too, along with "line mode"<-(will check this out more)... Thanks gaal.

      Additional question: Since I got other mentions about this, can I use other conditional statements within the "while" loop that would ignore the http:// or even rewite it back if it's changed?

      How does this "fail silently"?

        You're welcome!

        I'm not sure what you want to do with "ignoring the http://". Do you want to filter this substring out of the output? Besically the BLOCK of the while loop is run for each line in the input, and you can do as many things as you like in it. So you can (for example) check if a line contains some specific text and decide not to print it out at all -- just put a

        next if /some indication that this line needs to be stripped/;

        before you print the line. Similarly, you can call s/// several times, so if you just wanted to delete the substring "http://", add s{http://}{}g; right next to your existing substitution.

        By "fail silently" I mean that we are attempting a substitution on all lines of the input, not first checking for a match then operating on only those inputs that match. The s/// operator can look at a line, fail to make a substitution, and not complain about it. In this sense it's silent. Think of s/// as encapsulating both the seach and the replace.

Re: Altering an array with grep & map
by Zaxo (Archbishop) on Jan 18, 2005 at 11:30 UTC

    Your usage is a little mixed up. grep and map each take a bit of code and a list, and each produces a list. They are commonly chained like this, my @foo = map { transform($_) } grep { choose($_) } @list; where you want to pick href and source attributes, you should craft choose() to return true for only the items you want to transform().

    That said, there are modules which will do this for you much more easily and accurately. HTML::Parser and HTML::LinkExtor are worth a look, depending on what you want to do with the result. The latter does exactly what you say you want, up to the actual URI remapping. There are others in the HTML and XML namespaces, as well.

    After Compline,
    Zaxo

Re: Altering an array with grep & map
by holli (Abbot) on Jan 18, 2005 at 11:18 UTC
    open(FILE, "<file.htm") @file = map { s:(href|src)=":$1="../:g; $_ } <FILE>; close(FILE);
    Update:
    removed typos.

    holli, regexed monk
Re: Altering an array with grep & map
by TedPride (Priest) on Jan 18, 2005 at 14:29 UTC
    Problem with setting the base tag is that it has to be an absolute URL, and therefore isn't as forgiving as relative URLs to directory moves. Depending on what he wants, it may or may not work.

    EDIT: The following will probably do what you want. Note that slightly more advanced processing is required to make sure that links that aren't supposed to be changed, such as absolute URLs or mailto links, are ignored.

    use strict; use warnings; my $path = 'file.htm'; my @check = ('href','src'); my ($handle, $text); open($handle, $path); $text = join('', <$handle>); close($handle); my $check = '(?:' . join('|', @check) . ')'; $text =~ s/($check=")([^"]+)/$1.relative($2)/ieg; open($handle, ">$path"); print $handle $text; close($handle); sub relative { return $_[0] if length($_[0]) < 2; # Ignore empty or / +URLs return $_[0] if index($_[0],':') != -1; # Ignore email links return $_[0] if substr($_[0],0,1) eq '/'; # Ignore absolute UR +Ls if (substr($_[0],0,1) eq '.') { return $_[0] if substr($_[0],1,1) eq '.'; # Ignore URLs alread +y ../ return '.'.$_[0]; # URLs ./ change to +../ } return '../'.$_[0]; # Rest changes to .. +/ }
Re: Altering an array with grep & map
by ysth (Canon) on Jan 18, 2005 at 12:56 UTC
    You can avoid changing every filename by inserting or changing the base tag.