in reply to Re^2: regexp over multiple lines
in thread regexp over multiple lines

Consider this scenario:

You have a contract to build a 121 story office tower. You've had problems excavating deep enough to put in the foundation. It's been a messy job but you've gotten close.

Now, you've started pouring footings and foundation... and in fact, have managed to get the steel up for the first few stories above ground.

That's your code to date.

But today, your consultant -- the engineer -- notices that the walls are off plumb -- are tilting, out of whack. They ascertain that your footings and foundation are NOT on bedrock.

Do you charge onward, to see how many stories up you can go before the whole enterprise crashes?

Unless this is a one-off project, it's going to cost less to tear down what you've done, and get the footings right before continuing.

Replies are listed 'Best First'.
Re^4: regexp over multiple lines
by liverpaul (Acolyte) on Aug 03, 2011 at 15:51 UTC

    The trouble is, I'm not a builder. I'm just teaching myself the building trade as I go along :-)

    This is just a once off project for my website, so the only thing it costs me is time and effort.

    I've decided that I'm going to try to do it the way everyone is recommending, I'm just not sure I have the ability to do it that way...yet. You see, the data I need to extract will be from HTML files and XML files. I will be trying to design a program that will process both types of input. I'll give it a day or two and see how I get on.

      "I'm going to try to do it the way everyone is recommending..."
      Good!

      My arms get terribly tired, beating people over the head.

      "I'm just not sure I have the ability to do it that way...yet."

      And when you have a problem -- trying to do it the right way -- why, that's why we're here. If you get stuck on some particular point (and have read the docs, etc.) post some code illustrating where you are, sample data and output, and errors from your code, if any.

      Helping folk at that point is far more gratifying than beati ^H^H^H^H^H, posting a picket line around their --- uh, applying verbal persuasion.

        Well, I've been able to slurp the files and read the data into arrays and it makes the coding much faster and easier. I no longer have to use anchor points and offset values and other messy stuff. I'm not getting the exact results I want yet but I'm getting there.

        I have another question:

        Is it possible to limit the scope of the regexp I'm using (ie. linit the search to an area I define by a regexp which defines the search block of text)? For example:

        <p id=paragraph_1> <a href="http://www.link1.com">Link1</a> <a href="http://www.link2.com">Link2</a> <a href="http://www.link3.com">Link3</a> </p> <p id=paragraph_2> <a href="http://www.link4.com">Link4</a> <a href="http://www.link5.com">Link5</a> <a href="http://www.link6.com">Link6</a> </p>

        If I use a regexp to parse the names of the above html links, I'm going to get all of them. What if I only want the ones within the paragraph_2 tags, how would I do that?

        Here's an dummy example of code I already have:

        local($/, *WEB_DATA);#sets $/ to undef for you and when the scope exits it will revert $/ back to its previous value (most likely "\n") open (WEB_DATA, "<$myFilename.tmp"); my $myData = <WEB_DATA>; close (WEB_DATA); my @linkName = $myData =~ m/regexp for linkName/g;

        I'm not sure if I've described it correctly, but what I want if to use a regexp like /<p id=paragraph_1>.+?<\/p>/ to define where I want to look, and another rexexp to define what data I want to parse within this block. I hope that makes sense.