Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have to store the value under {AUTHOR} to start of another '{'
while(<DATA>){ chomp; $_ =~ s/{AUTHOR}([^{]+)//g; print $1; } __DATA__ {TAG} tag1 {AUTHOR} By June Fletcher JOURNAL {TAG} tag2 TOM MacCUBBIN {DATA} data1 {AUTHOR} Richard White {AUTHOR} MacCUBBIN {SOUR}

Replies are listed 'Best First'.
Re: regular expression
by ELISHEVA (Prior) on Aug 16, 2009 at 13:26 UTC

    What do you mean by store a value "to the start of another '{'"? Could you provide an example of what your output should look like?

    In the meantime here are some tips:

    • use diagnostics and strictures: use strict; use warnings; use diagnostics; at the beginning of each script file. If you use #!/usr/bin/perl at the start of your file, place them right under that line. When you do this you will quickly discover that your regular expression isn't matching anything so $1 is undefined.
    • By default, Perl treats "\n" (newline) as the end of record marker. (You can change that by setting $/ - see perlvar for details). chomp removes the end of record marker so after chomping the line "{AUTHOR}\n" becomes "{AUTHOR}". Your regular expression requires at least one character after the word "{AUTHOR}" so it never matches. To match 0 or more, use * not +.
    • I'm guessing the part of the regular expression after {AUTHOR} was meant to match the value below the line containing {AUTHOR}. Since the line read in contains only {AUTHOR} it can never do that. To capture what is on the line below you can set a flag to ON every time you see a line with {AUTHOR} and set the flag to OFF everything it sees another tag, e.g. {TAG} or {SOUR}. You then only print out lines where the flag is on.
    • To match a run of characters in a string without changing them, one uses m/myregexhere/ or just /myregexhere/. Your code sample uses s/// which is a substitution operator. In your sample code the second part (the replacement value) is the empty string so your regular expression replaces everything it matches with the empty string. That probably isn't what you want.

    I'd post some code to illustrate but I'm just a little concerned this is a homework problem. If there is something unclear in what I have written above please ask. Try using a flag on your own. If you are still stuck, update your post with the code your wrote trying to use a flag to find the author values.

    Best, beth

      Maybe something as simple as:

      my $flag; while(<DATA>){ if (/^{(\w+)}/) { $flag = $1; } elsif ($flag =~ m/AUTHOR/) { # "store" values here } }
        The output should be
        By June Fletcher JOURNAL Richard White MacCUBBIN
        Line below the {AUTHOR} and start of another '{'
Re: regular expression
by GrandFather (Saint) on Aug 16, 2009 at 20:21 UTC

    If you really wanted to do it with a single regular expression you'd need to slurp the file, and that's not recommended because it doesn't scale well. However, the flip flop operator (scalar context range operator - ..) provides an interesting solution:

    use warnings; use strict; while (<DATA>) { print if s/{AUTHOR}\n// .. (s/{AUTHOR}\n//, s/{(?!AUTHOR})\w+}\n// +); } __DATA__ {TAG} tag1 {AUTHOR} By June Fletcher JOURNAL {TAG} tag2 TOM MacCUBBIN {DATA} data1 {AUTHOR} Richard White {AUTHOR} MacCUBBIN {SOUR}

    Prints:

    By June Fletcher JOURNAL Richard White MacCUBBIN

    True laziness is hard work