in reply to Re^4: adding lines at specific addresses
in thread adding lines at specific addresses

Well, on the whole, that's not so bad... But if you are creating 256 copies of that "if ... else ..." block, one for each possible numeric following "LABEL O", then you really have missed some important points about programming in general (understanding loops and variables) and about perl in particular (using regular expressions).

For that matter, I think the regular expression you've shown is probably not what you really want -- try putting square brackets around the "\n." -- and don't forget to include $3 when you print stuff out.

You also want to meet a new friend: $/ also known as "$INPUT_RECORD_SEPARATOR" (look for a description of it here: perlvar -- it's about a quarter of the way down). Based on this new information you've shown, it looks like the input data is structured in blocks, where each block ends with "(STOP)\n". You can tell perl to use that string and the end-of-record marker, instead of the default "\n", and simplify your code immensely:

open( IN, "some_file.tex" ) or die $!; { local $/ = "(STOP)\n"; my $expected_id = 1; while (<IN>) { # read a whole block up to "(STOP)\n" if ( s/\(LABEL O $expected_id\)\n/$1 NEWSTUFF/ ) { print; # all done with this block } else { print "(LABEL O $expected_id)\n NEWSTUFF\n(STOP)\n"; # add a new block } my $expected_id++; } } # closing this block drops the local value of $/ # now $/ is back to it's default value (in case you # have to read other stuff in the normal fashion).
So, does it really need to be any more complicated than that?

Replies are listed 'Best First'.
Re^6: adding lines at specific addresses
by pindar (Initiate) on Oct 12, 2005 at 06:29 UTC
    "you really have missed some important points about programming in general (understanding loops and variables) and about perl in particular (using regular expressions)."

    I couldn't agree more, I'm just a humanities guy dabbling in perl scripting... Your code looks wonderful, and in fact, I had been thinking about something to that effect, but there are two problems that make such a loop impracticable:

    1. Some of the values are not given as LABEL O (number), but as LABEL C A or LABEL C a, so I'd need more than one loop.

    2. (probably trivial, but insurmountable to me) The numbers O XXX are octal numbers, and I couldn't figure out how to make perl increment the $i++ in octals.

    Again, thanks for your help!

      **sigh**

      It seems we learn something new about your data every day... as opposed to seeing it described concisely at the outset -- even a humanities guy should be able to manage that (I used to be one myself).

      So if your file has:

      LABEL O 1 blah STOP LABEL C A blah STOP LABEL O 3 blah STOP
      What are you supposed to do with that? Put "LABEL O 2" before "LABEL C A"? After it? Instead of it? Don't put it in at all? If you get "LABEL C A" and then "LABEL C C", are you supposed to fill in a "LABEL C B" as well? I suppose you probably have "LABEL X (hex number)" also, and you need to invert their order if the file contains the string "goober"...

      Whatever the next wrinkle may be, the answer is most likely "no, you don't need more than one loop". You just need to provide enough "if ... else ... else ..." conditions in the single "while" loop over data blocks in order to cover all the possible scenarios.

      (And of course, you need to be able to describe these extra conditions clearly and without ambiguity; if you can't state them coherently so a human can understand them, you won't be able to write code to do it, either. My best advice: document the algorithm first, then code it.)

      As for handling the octal stuff, try altering the top of the while loop like this:

      while (<IN>) { my $exp_idstr = sprintf( "%o", $expected_id ); if ( s/\(LABEL O $exp_idstr\)\n/$1 NEWSTUFF/ ) { ...

      Update: sorry about the flame... and I wanted to add that there could be situations where a second (and maybe even third) pass over the data would simplify the process a lot -- e.g. on one pass, you handle all insertions of missing data blocks; on another pass, you make sure the data blocks are properly sorted; then maybe yet another pass (now that all blocks are present and in order) to add specific new lines of data to specific blocks.

      In this case, I would actually recommend that each stage/pass be written as a separate script: keep each script as simple, clear and reliable as possible, in order to do just one thing and do it right. Then run the scripts in succession over the data. (That's what pipeline commands are for: cat input | pass1 | pass2 | pass3 > output.)

        "sorry about the flame"

        No offense taken. I'm really just a n00b in perl. On the other hand, I am somewhat more fluent in TeX & friends, so I was assuming that the concept of a TeX property list was immediately clear to everyone. That was wrong.

        A property list has three large blocks:

        1. The header giving the basic information. This must be kept as is, exept for a line saying CHECKSUM XXX, this must be deleted.

        2. A large block beginning with

        (LIGTABLE

        and ending with

        )

        on a line of its own. This is where I need to insert some lines.

        3. A block beginning with CHARACTER, describing every one of the 256 characters. This must be kept as is.

        So block 2 is the interesting part. The data in this block consists of subsections that are always introduced by a line (LABEL XXX). Unofrtunately, the XXX has three different forms:

        LABEL O with an octal number;

        LABEL C with letters A-Z and a-z (describing the letters of the alphabet) for values between octal 101 and 132 and between octal 141 and 172.

        LABEL C with numbers 0-9 (describing the numerals), for values between octal 57 and octal 71.

        Any given property list can have any number from 0 to 256 LABEL entries in the LIGTABLE. I.e., some fonts will have huge instructions sets for kerning pairs, adding some rules to almost every LABEL, and some fonts have nothing of that sort, so the LIGTABLE may be completely empty.

        I need to insert about 250 additional rules into 53 LABELS of this LIGTABLE. If I try to describe what I need to do, there are three different options:

        A. The label already exists; a new instruction needs to be inserted. In that case, everything that was in the original entry must be kept as is, preceded by the new instructions.

        B. The entry exists, no new instructions. Everything must be kept as is.

        C. The entry does not exist. In that case, create a new entry with the appropriate (LABEL) header and insert at the appropriate position.

        If the LABEL does not exist and no new instructions need to be added, of course, nothing needs to be done.

        Your flame actually made me realize that the hardest part of what I wanted to do was describe it in a comprehensible manner. So I'm really grateful for your help. But don't spend too much time on it, my 256 if loops are ugly as hell, but they do what I want...