in reply to Need help with subdividing SGML files

UPDATE:

I've implemented most of the suggestions I received and the little program now looks like this:

#!/usr/bin/perl -w #Purpose: To Take a DOS file wildcard and thus take all the matching c +ustom SGML files in the working directory and subdivide them into new + files whose names are the id's of the divs in those original files. use strict; print "What file(s) do you want to run this program on?\n"; our $lines = ""; our @InFileNames; our @OutFileNames; our @OutFileContent; my $i = 0; my $j = 0; my $TheFile = <STDIN>; chomp ($TheFile); #open file and get all text sub OpenFile { open(FILE, $_[0]) or $lines = ""; local $/ = undef; $lines = <FILE>; #remove blank lines $lines =~ s/\n{2}/\n/gms; close(FILE); } #add ¥ to closing div tags sub MarkClose { $lines =~ s/(<\/div>)/$1¥/gms; } #open output.txt for appending and write results to it sub FileAppend { my $Outfile = ">>" . $_[0] . ".bsd"; my $Content = $_[1]; open(FILE, $Outfile) or die "Can't open $Outfile.\n"; print FILE $Content; print FILE "\n"; close FILE; } #Create an array containing all file in the directory matching the glo +b. sub GetInFileList { my $FileDef = $_[0]; @InFileNames = glob($FileDef); } #Populate an array with the contents of the id attribute of every <div +> tag in the input file. sub GetOutFilesList { my $OutFile; @OutFileNames = $lines =~ m/<div[^>]*>/gms; foreach $OutFile (@OutFileNames){ $OutFile =~ s/<div type=[^\s]* id="([^>]*)">/$1/gms; $OutFile =~ s/\./_/gms; } } #Subdivides the File into the subfiles. sub GetOutFileContent { my $LinesString = $_[0]; @OutFileContent = split /¥/, $LinesString; } ### Does the job &GetInFileList($TheFile); for ($i = 0, $i < @InFileNames, $i++) { &OpenFile($InFileNames[$i]); &MarkClose(); &GetOutFilesList; &GetOutFileContent($lines); for ($j = 0, $j < @OutFileNames, $j++){ &FileAppend($OutFileNames[$j], $OutFileContent[$j]); } } #be nice and say it's done print "Program Finished\n";

The script now dies at a very specific point: On line 13 or possibly 18, where it gives the following error message:

Use of uninitialized value in open at ##program name censored## line 18, <STDIN> line 1.

I take this to mean that the array entitled @InFileNames has no contents, because the glob function used to fill it on line 45 didn't behave as I thought it would.

Also, the debug mode behaved oddly when it reached the <STDIN> line, it prompted me for input with the line DB(1), and then prompted me again with DB(2) when I had given it its input, and so on ad infinitum.

Replies are listed 'Best First'.
Re: Re: Need help with subdividing SGML files
by BrowserUk (Patriarch) on Mar 12, 2003 at 21:20 UTC

    Why do you bother wasting your time using strict, if all you are going to do is name every undeclared variable at the top of your program. It's pointless.

    There are still two lines (at least) in your updated program that contain simple syntax errors that will prevent your program from doing anything like what you want it to do.

    Look up the syntax of perl's for statements.


    Examine what is said, not who speaks.
    1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
    2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
    3) Any sufficiently advanced technology is indistinguishable from magic.
    Arthur C. Clarke.

      1) Because I'm coming from a Visual Basic background where the strict definition of Variables at the top of the procedure/package is considered good practice.

      2) The debug program seemed to be prompting me to explicitly define all of the variables, so that's what I did.

      I take it that the syntax errors are so simple and that I've offended your sensibilities so severely that you can't be bothered to point them out.

      You seem to be implying that I'm using for and foreach incorrectly. I can't see how that would be. Perhaps you think the answer is so embarassing that I would rather figure it out for myself than live in the shame of having it explained to me. Also, you've twice declaimed that you don't know what I want, so it seems odd that you're so sure now.

      To put it bluntly, I find your tone to be insulting and if you don't want to give constructive criticism, I can do without your help.

        I put about as much effort into my answer as you have into your question, and your answers to requests for further information.

        As you seem incapable of reading documentation, I'll hold your hand a little further. This loop will execute exactly three times regardless of the values of any of the variables involved.

        for ($i = 0, $i < @InFileNames, $i++) {

        During the first iteration, $_ will be set to 0 (zero). During the second it will be set to either 0 or 1, most probably 1 unless @InFileNames is empty. The third iteration $_ will be set to 1.

        The value of $i will be zero for the first two iterations and 1 for the third.

        Similarly, this loop will also execute exactly 3 times regardless of the values of the variables involved

        for ($j = 0, $j < @OutFileNames, $j++){

        Again, $_ will take the values 0, 0|1, 1 during the three iterations. The value of $j will be 0, 0, 1 for those 3 iterations.

        It is pretty certain that this is not your intention. All the information regarding what is wrong, and how to put it right is available in

        • The documentation that came with your copy of perl. As you are on a windows platform, it is available in html format. To find ot you only need to look in the X:\yourpathto\perl\html directory and invoke the file index.html.
        • You could also use the command perldoc from a commandline.
        • look in the documentation available on this site.
        • or at perldoc.com

        If you had taken the constructively intended advice I offered, and looked up the syntax of for loops, I wouldn't have to be pointing these mistakes out to you.

        As for my tone........on my screen, it is black on white, any other tone you infer from the way the text is displayed on your screen is beyond my control.


        Examine what is said, not who speaks.
        1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
        2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
        3) Any sufficiently advanced technology is indistinguishable from magic.
        Arthur C. Clarke.