comment on

first you need to clearly define the problem:

for all files in a given directory,
- make a backup copy
- change permissions
- extract patterns matching /IMG SRC=.../ and /Figure.../
print out each image URL found

this pretty much outlines the structure of the program....so writing it is a fairly simple process of following the above structure.

personally, i would make the backup copy of the files and change the permissions as a separate task (just on the general principle that a tool like this should only do one thing so that it can be re-used easily - and also so that you can run it while testing it WITHOUT making any changes to the files/directories on disk), but you can do it within the perl script if you want.

once you have the IMG SRC urls in a hash, you can do whatever you want with them, including printing them out as an <A HREF="..."> HTML link.

#! /usr/bin/perl -w

use strict;
use File::Copy;

# pass directory to scan as arg1 (default to current dir)
my $dir = shift || "./" ;

# get list of non-hidden files in directory
opendir(DIR, $dir) || die "can't opendir $dir $!";
my @files = grep { /^[^.]/ && -f "$dir/$_" } readdir(DIR);
closedir DIR;

my %images = ();

# process each file
foreach my $file (@files) {
    next if ($file =~ /\.pl/) ; # skip perl program files
    copy($file, "$file.bak");
    chmod 0600, $file;

    my $img = '';
    my $fig = '';

    open(FH,"<$file") || die "couldn't open $file for read: $!\n";
    while (<FH>) {
        chomp ;
        s/^\s*|\s*$//g;  # strip leading and trailing spaces
        if (/<IMG SRC/) {
            $img = $_ ;
        } elsif (/Figure\s+\d+/) {
            $fig = $_ ;
        } ;
    } ;
    close(FH);

    # if we found an IMG SRC line *AND* a Figure line, then
    # add it to the images hash.
    if ($img && $fig) { $images{$fig} = $img } ;
};

foreach (sort keys %images) {
    print "$_ : $images{$_}\n" ;
};
[download]

note: that point about read-only testing is an important one. it's one of the many reasons why it can be a good idea to write tools like this as a filter (i.e. input on stdin, output on stdout). if the program doesn't actually change the input files in any way then development can be an iterative process of hack and fix. also, without hard-coded directory/file names, you can run your program on a backup copy of the data while developing it. keeping your original data safe allows you to take risks with the backup that you can't afford to take with the original - if you mess it up, just take another copy and try again.

In reply to Re: Regular Expression Pattern Search Problem by cas2006
in thread Regular Expression Pattern Search Problem by xdbd063

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.