http://qs1969.pair.com?node_id=225430

young_david has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I'm looking for advice on how to complete what I'm finding to be a difficult task. What I'm dealing with are two files, (1) tagged ascii file (2) style sheet. Both files are generated from a proprietary typesetting system.

I want to be able to loop through the tagged ascii file and when I match a tag, pass it a subroutine and then replace it with the attributes for that particular tag from the style sheet. How can this be accomplished? Below is a small sample of the script I'm working with.

Thanks,
young_david

use strict; sub tagheader { my $header; open(STYLESHEET,'stylesheet.txt'); my $tag = shift; ########################################################### # "HEADER" appears before "nm" in the style sheet # Find each "HEADER" in the file and assign $1 to $header # If we find the $tag we're looking for return $header # while (<STYLESHEET>) { if (/^HEADER (\d+):/) { $header = $1; } if (/^nm : ($tag)/) { return \$header; } } close(STYLESHEET); } my $body; my $tag; open(FILE,'taggedfile.txt'); open(OUT,">$ARGV[0]"); while(<FILE>) { if (/^\{(\w+)\}/) { $tag = $1; $body = tagheader($tag); # Right here, $_ contains "nm : $tag" from the tagheader() # I see that $_ has been modified by tagheader() # This is my problem and I'm stumped :( s/\{($tag)\}/\{$1:$body\}/; # I want to search and replace the ($tag) with $1:$body # $body will contain a number that is assigned to the $tag # in the style sheet. } print OUT $_; } close(FILE); close(OUT); __DATA__ FILE { type : if } FILEHDR { File_Comment : "" } HEADER 1: { _std_comment : "" nm : TAG post : "" ppriority : 1st pmaxwhite : 0 } BODY 1:1 { endval : page priority : 1st ffamily : 2 fvariant : 3 fheight : 10q fwidth : same maxwhite : 0 prelead : no ldasc : normal lindent : 0 pgindent : 0 lnmeas : full ldextra : 2q lddesc : normal qdtbl : l hjtbl : 4 pgsplit : no lpabove : 0 lpbelow : 0 laabove : 0 labelow : 0 language : 1 xpath : "" capmode : no kernnum : 1 spotcolor : normal psrnum : 1 ipcrnum : active pair_kern : yes endstring : "" pre : "" repeat : repeat pgrful : none httbl : none pattern : none comment1 : "" }

Edit by tye to add READMORE

Replies are listed 'Best First'.
Re: Stumped with $_ being modified
by jdporter (Paladin) on Jan 09, 2003 at 05:25 UTC
    # Right here, $_ contains "nm : $tag" from the tagheader() # I see that $_ has been modified by tagheader() # This is my problem and I'm stumped :(
    O.k., that's simple enough to fix:
    Add a   local $_; near the top of the body of tagheader.

    One thing I would do differently is load the contents of the stylesheet file into an in-memory structure, rather than re-read the file every time you need to look up a tag.
    Perhaps something like this:
    { my %tags; # here's where we load the %tags: open STYLESHEET, '< stylesheet.txt' or die "Error opening stylesheet.txt: $!"; ########################################################### # "HEADER" appears before "nm" in the style sheet # Find each "HEADER" in the file and assign $1 to $header # If we find the $tag we're looking for return $header # local $_; my $header; while (<STYLESHEET>) { if ( /^HEADER (\d+):/ ) { $header = $1; } elsif ( /^nm : (\w+)/ ) { $tags{$1} = $header; } } close STYLESHEET; # and here's the function we call to look up a tag: sub tagheader { my $tag = shift; $tags{$tag} } }
    PS: ++ to you for using strict.

    jdporter
    The 6th Rule of Perl Club is -- There is no Rule #6.

Re: Stumped with $_ being modified
by tadman (Prior) on Jan 09, 2003 at 11:19 UTC
    Although using local will solve your problem, I'd stay away from it. While it does "fix" that one problem, you still read the stylesheet file each time you encounter a tag to substitute. Since you open the stylesheet file each time, reading exactly the same data, it would make a lot more sense to read it once, like jdporter suggests.

    Here's an example of my take:
    use strict; use warnings; my $stylesheet_file = "stylesheet.txt"; sub read_stylesheet { open(STYLESHEET, "<", $stylesheet_file) || die "Cannot read stylesheet_file\n"; my (%tag, $header); while (<STYLESHEET>) { if (/^HEADER\s+(\d+):/) { $header = $1; } elsif (/^nm\s+:\s+(\w+)/ { # Make a note of what header this tag # appeared in. $tag{$1} = $header; } } close(STYLESHEET); return \%tag; # Returns a reference to the hash } # ... (Main routine) my $input_file = "taggedfile.txt"; my ($output_file) = @ARGV; my $tag = read_stylesheet(); open(FILE, "<", $input_file) || die "Could not read $input_file\n"; open(OUT, ">", $output_file) || die "Could not write $output_file\n"; while (<FILE>) { # Perform substitutions s/^\{(\w+)\}/\{$1:$tag->{$1}\}/g; # Write line ($_) to OUT print OUT; } close(FILE); close(OUT);
    This variation has a routine which reads in the stylesheet and returns a hash reference to the data that was read. Theoretically, then, you can read in more than one stylesheet and choose which one you look up from, something that a global variable doesn't really permit.

    You shouldn't have to localise $_ if you're careful about what's going on. Having nested file reads is one way to cause trouble, which is what you had there.

    If you're not sure where $_ has been, the safe thing to do is used a named variable, such as this:
    while (my $line = <FILE>) { # ... Use $line where you would normally use $_ }
    I'd really suggest steering away from using local declared variables.
      While tadman gives a perfectly fine solution, I'd pick a nit with certain of his advice:
      Although using local will solve your problem, I'd stay away from it.

      I'd really suggest steering away from using local declared variables.
      That is really not good advice. local does not "declare" a variable, it localizes its value. This is a critically important capability -- as young_david's problem clearly demonstrates.
      In many cases we don't have to worry about variables being localized, because the ops we use do it implicitly. (I'm thinking specifically of foreach.) But others, such as while, do not, and so we have to localize the variable explicitly -- if we want to be safe.

      So rather than say "Don't use local", I'd say "use strict." That is a practice which will encourage good programming habits.

      jdporter
      The 6th Rule of Perl Club is -- There is no Rule #6.

      I too am stumped... by your reply. I don't know where to begin. How can you say that he shouldn't localize? What's wrong with localizing? I even find it hard to argue with you since it all boils down to "why are you so anti localization?" and all my arguments seem so obvious that they shouldn't have to be mentioned. I just don't see it.

      How far do you take this? Should you localize filehandles? Or should you use "good names" so you don't have to localize? (In perl5.6 you're saved by the open my $fh, $file syntax, but the issue is still very interesting.)

      If you steer away from using "local declared variables", do you also always use your own named variable in foreach loops? foreach my $foo (LIST) { ... }

      I just wrote a note about localization and while loops; the relevant thread is here. As you see, I have the very opposite opinion of you. So I'm very curious about what I've overlooked.

      Curiously,
      ihb
        There's a difference between lexically scoping with my and localising using local, although the terminology is one that seems rather contentious. Like the difference between arrays and lists.

        There are only a few things you still need to localise with local, such as the $" variable. Ever since 5.0 came out, a lot of effort has been put into moving people away from local and to my, for example, the way you can use lexically scoped filehandles. Where possible, it's good to use filehandles like that since they can be passed around easily from subroutine to subroutine, or stored in an object's hash easily. In this example, though, I'm using Old-School handles because it's not really an issue.

        As for loops, I won't hesitate to use $_ as long as it doesn't get too complicated. Where it's obvious what's being iterated, that is. For example:
        $_->iron() foreach (@shirts);
        You could say my $shirt, but it would be redundant. On the other hand, where there's no hint as to what you're using, a simple my declaration acts as documentation.
        foreach my $shirt ($laundry->contents()) ...
        You have to be careful with $_, just like with $1 and its relatives. Where there's risk of contamination, I use alternate names or copies. If there's no risk, then it's a matter of preference.

        I think my point is really this: Don't use local $_, instead, use a named lexical variable such as my $line.
Re: Stumped with $_ being modified
by Hofmator (Curate) on Jan 09, 2003 at 08:28 UTC
    Actually, very recently ihb and I were discussing exactly your problem (missing local $_;) starting approximately here.

    -- Hofmator