xdbd063 has asked for the wisdom of the Perl Monks concerning the following question:

I have hundreds of files in subdirectories. I need to go into each file in turn, create a backup copy of each original file, change permissions on the original file, then pattern search each file for two specific lines. The first line always begins with <IMB SRC and ends with BR>, the second line always begins with Figure followed by a space and one or more numbers. There could be any kind of character in the lines in between these two liness, spaces, ", (), /\<>., etc. I want to throw out everything between the two lines I need. After I find the two lines I need, I want to put them in a hash, with the Figure line as the key and the <IMG SRC line as the value. Then I want to search the entire file for any other occurrances of the matching Figure line that exist without matching <IMG SRC lines. At that point I will hopefully create a link from the hash value to that found Figure line. After one file is completed, it should proceed to the next file and start again, running through the 600 or so files. When complete each "Figure digit" will be linked to the correct <IMG SRC line.

I have the first part working. I am drilling down to the correct directories, making the backup copies and changing the permissions. I get no errors when I run this; however, it doesn't find anything. At least, nothing prints. I'm a beginner at this, and have been through the mountain of books I bought to help me, I'm still stuck. Any and all help would be more than appreciated. Thank you.


This is an excerpt from one file, showing the lines I need to get with the pattern search:

<IMG SRC="/CSS/tpubs_graphics/L-/5/7/L-57174.00000001.gif">
1. Check Valve
(whitespace) 2. Check Valve (Altair)
(whitespace) #82-22-02
(whitespace) Fuel Control Water Signal Check Valve REMOVAL-01
(whitespace) Figure 301 Page 302


# ARGUMENTS: engine_figurelinks.pl xx_manual_vvv # where xx = manual code, vvv = version # # MODIFICATIONS: # #--------------------------------------------------------------------- +- use warnings; use diagnostics; use Env qw(SERVER_NAME); use CGI qw(:standard :netscape); use File::Copy; # Perl supplied module for making copies new CGI; #--------------------------------------------------------------------- +- ($manualdir_param) = @ARGV; $working_dir = $manualdir_param; $working_dir =~ s/manualdir=//i; $data_area = "/tmp"; $html_dir = "$data_area/$working_dir"; #--------------------------------------------------------------------- +- # Loop to locate HTML files, change permissions, and make working temp +orary copies opendir( HTMLSTORIES, "$html_dir") || die "HTML dirs do not exist: $1" +; @FigureArray = grep{/^(09)(\w{1,5})(00)$/} readdir ( HTMLSTORI +ES ); foreach $FigFile (@FigureArray) { opendir( HTMSTORY, "$html_dir/$FigFile" ) || die "File +s do not exist: $1"; @FileArray = grep{/a.htm$/} readdir ( HTMSTORY + ); foreach $DirFile (@FileArray) { copy ("$html_dir/$FigFile/$DirFile", " +$html_dir/$FigFile/$DirFile.bak") or die "Can not make backup copy of + file: $1"; chmod 0600, "$html_dir/$FigFile/$DirFi +le"; while (< "$html_dir/$FigFile/$DirFile" >) { $Figures = "$html_dir/$FigFile/$DirFil +e" =~ /(<IMG.*?BR>)...(Figure\d*)/i; print $Figures; } } } closedir HTMSTORY; closedir HTMLSTORIES;

Edited by planetscape - removed unnecessary br tags

Replies are listed 'Best First'.
Re: Regular Expression Pattern Search Problem
by liverpole (Monsignor) on Mar 08, 2006 at 16:50 UTC
    I believe your problem is that you are not attempting to match the line from the file, but rather the filename itself.

    Try something like this instead, to read and parse lines from a file:

    use FileHandle; my $fh = new FileHandle; open($fh, "<", $file) or die "Unable to open file '$file' ($!)\n"; while (my $line = <$fh>) { chomp $line; if ($line =~ /(<IMG.*?BR>)...(Figure\d*)/i) { # You got a match } }

    @ARGV=split//,"/:L"; map{print substr crypt($_,ord pop),2,3}qw"PerlyouC READPIPE provides"
Re: Regular Expression Pattern Search Problem
by ahmad (Hermit) on Mar 08, 2006 at 16:46 UTC

    Hello

    i think you forgot to open the file in order to search inside it

    open the file

    open(FILE,"$html_dir/$FigFile/$DirFile") or die $!; while (<FILE>){ ... } close(FILE)

    HTH

    bye

Re: Regular Expression Pattern Search Problem
by thundergnat (Deacon) on Mar 08, 2006 at 20:00 UTC

    You've got lots of problems, some critical, some minor.

    A quick rundown:

    • You've got use diagnistics; which helps you figure out how you shot yourself in the foot, but not use strict; which help you not shoot yourself in the foot in the first place. Highly recommended for new (and not so new) users.
    • Use lexical rather than global file and directory handles. The scoping issues are much easier to track.
    • Use "or" rather than || when you aren't actually doing boolean logic, the precedence is much lower and it avoids needing to use lots of parenthesis.
    • The system/library error variable is $! not $1.
    • Don't use capturing parenthesis in regular expressions if you aren't capturing anything.
    • If you need to reuse open directory/file handles, you need close them first.
    • Factor out long complex expressions when you can, especially if you find yourself using them over and over.
    • Full Stops are significant inside regular expressions. Escape them if you want the literal character.
    • You need to open a file before you can read from it. You can't just read from the file name.
    • You can't match a multiline expression with a regex if you are reading the file in line by line.
    • Don't be afraid to use whitespace in your code. It can really make it easier to follow the logic.
Re: Regular Expression Pattern Search Problem
by cas2006 (Novice) on Mar 08, 2006 at 23:17 UTC
    first you need to clearly define the problem:
    • for all files in a given directory,
      • make a backup copy
      • change permissions
      • extract patterns matching /IMG SRC=.../ and /Figure.../
    • print out each image URL found

    this pretty much outlines the structure of the program....so writing it is a fairly simple process of following the above structure.

    personally, i would make the backup copy of the files and change the permissions as a separate task (just on the general principle that a tool like this should only do one thing so that it can be re-used easily - and also so that you can run it while testing it WITHOUT making any changes to the files/directories on disk), but you can do it within the perl script if you want.

    once you have the IMG SRC urls in a hash, you can do whatever you want with them, including printing them out as an <A HREF="..."> HTML link.

    #! /usr/bin/perl -w use strict; use File::Copy; # pass directory to scan as arg1 (default to current dir) my $dir = shift || "./" ; # get list of non-hidden files in directory opendir(DIR, $dir) || die "can't opendir $dir $!"; my @files = grep { /^[^.]/ && -f "$dir/$_" } readdir(DIR); closedir DIR; my %images = (); # process each file foreach my $file (@files) { next if ($file =~ /\.pl/) ; # skip perl program files copy($file, "$file.bak"); chmod 0600, $file; my $img = ''; my $fig = ''; open(FH,"<$file") || die "couldn't open $file for read: $!\n"; while (<FH>) { chomp ; s/^\s*|\s*$//g; # strip leading and trailing spaces if (/<IMG SRC/) { $img = $_ ; } elsif (/Figure\s+\d+/) { $fig = $_ ; } ; } ; close(FH); # if we found an IMG SRC line *AND* a Figure line, then # add it to the images hash. if ($img && $fig) { $images{$fig} = $img } ; }; foreach (sort keys %images) { print "$_ : $images{$_}\n" ; };
    note: that point about read-only testing is an important one. it's one of the many reasons why it can be a good idea to write tools like this as a filter (i.e. input on stdin, output on stdout). if the program doesn't actually change the input files in any way then development can be an iterative process of hack and fix. also, without hard-coded directory/file names, you can run your program on a backup copy of the data while developing it. keeping your original data safe allows you to take risks with the backup that you can't afford to take with the original - if you mess it up, just take another copy and try again.

Re: Regular Expression Pattern Search Problem
by InfiniteSilence (Curate) on Mar 08, 2006 at 16:47 UTC
    Looking at a few of your previous posts I see that you have been working at this same code at least since 2/23/2006. Is this something you are doing for work or is this for fun? Are you more proficient in another language or is Perl your first programming language?

    Celebrate Intellectual Diversity

      This is for work. I'm a UNIX shell scripter(Sys Admin) who was more or less thrown into this position.

      I have done a smattering of other languages, and lots of classes, but nothing prepared me for actual production coding. My company is in a jam and I am trying to help out until they can hire a more proficient Perl Programmer.

      I have found that I enjoy Perl, and plan to continue to work with it full time. I am enrolled in several classes; unfortunately, they don't start until April.

      Edited by planetscape - removed br tags, replaced with <p></p> tags

        Okay, now I understand. The problem, I think, for your situation is this: if you do a great job with this script with mostly borrowed code you will likely be asked to do a bunch more (and starting in April means you won't be done with those classes until what, June?).

        You need help now, but it isn't really code snippets that you need. I think you need a few good books that will help you basically do what Sys Admins need to do often. Here is a quick reading list:

        There are others, but if I were you I would read Effective Perl Programming cover-to-cover (it is a thin book), scan through the Perl Cookbook for script problems you are having right now, and then read the System Administration book. I used to have this Perl Black Book on my desk for a while until I read the above books and realized that the Black Book (at least my edition) was full of bad programming examples.

        BTW, Perl programming can be addictive. Before I started learning Perl I was a professional VB programmer. Nowadays I struggle to remember how to do simple things in VB because I use it so rarely. Welcome to the world of Perl.

        Celebrate Intellectual Diversity

        This probably doesn't help now, but you could have gotten a part-time proficient perl programmer from http://jobs.perl.org/ to get you through the crisis.

Re: Regular Expression Pattern Search Problem
by planetscape (Chancellor) on Mar 09, 2006 at 10:51 UTC