Here is some code to do what you want:
#!/usr/bin/perl -w use strict; $/=undef; # undefines the record separator # which is by default \n # this means that there is no "line" # separator my $bigString = <DATA>; # would normally read one "line" # but since record separator is undefined # it reads all the data as a single string # this is what "slurp" the file means my @prices = $bigString =~ m|<homePrice>\s*(.+?)\s*</homePrice>|ig; print "@prices"; # prints: 1.91 295.3 KEuro __DATA__ <homePrice> 1.91</homePrice> <balh></balh><homePrice>295.3 KEuro</homePrice>
the regex term \s* means zero or more whitespace characters, there are 5 of them: 'space',\n,\r,\t,\f : space, new line, carriage return, tab, form feed. So this code just ignores any spaces or End-of-Line things that are seen(they are optional, zero or one is ok).

The (.+?) means one or more of any character, but "calm your greedy-ness down!" - don't keep going, but stop capturing when the term after the (.+?) matches. A "greedy match" would keep going until it saw the the last possible match of that next term.

The /g switch means to "match global" keep going and send all matches to the left. the /i is not needed here, but it means ignore case

This \n stuff is more complicated to explain than it is to use. Basically, Perl will almost always do what you expect. It can read line terminations by other operating systems and translate them into the single "\n" character. And when you do a write, it will write your OS specific "\n" thing.

Unix uses just <line feed> to mean End-of-Line. Windows (and Network standard TCP/IP) programs use <carriage return>, <line feed> to mean End-of-Line, and some versions of Apple stuff uses <carriage return> to mean End-of-Line. When reading a file on your platform, Perl will translate what it reads into a single \n character. A Perl program on Unix will be able to read my Windows file and it will just see one "\n" at the end of line (the \r that Windows put there is ignored).


In reply to Re: regexp over multiple lines by Marshall
in thread regexp over multiple lines by liverpaul

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.