matching paragraphs

imhotep has asked for the wisdom of the Perl Monks concerning the following question:

Apologies all for my poor posting the other day! I was advised by one of the faithful to take another look at the problem. I have done just that but am still having trouble matching paragraphs of text and printing them out within html paragraph elements! Could I change the default pattern match from single to multiple line and place each paragraph into a variable and then print something like,

print "<html para element> $variable </html para element>";
[download]

If so is changing the pattern match done like this?

open INPUT, "<input.txt";

undef $/;

$content = <INPUT>#The variable, input in this case being #standard in
+put
close INPUT;

$/ = "\n";#Restore for normal behaviour later on
[download]

Or have got it completely wrong? I think that this will read in the whole file and not just paragraphs of text. Alternatively as the paragraphs in the text are surrounded by blank lines, could place each paragraph in a variable by searching for a chunk of text sandwiched between 1 or more blank lines.

Updated Steve_p - added code tags.

Comment on matching paragraphs Select or Download Code

Replies are listed 'Best First'.
Re: matching paragraphs by deibyz (Hermit) on Apr 18, 2005 at 14:34 UTC
If you have a look at perlvar, in the `$/` section, you have this: $/ The input record separator, newline by default. This influences Perl's idea of what a "line" is. Works like awk's RS variable, including treating empty lines as a terminator if set to the null string. (An empty line cannot contain any spaces or tabs.) You may set it to a multi-character string to match a multi-character terminator, or to "undef" to read through the end of file. Set- ting it to "\n\n" means something slightly differ- ent than setting to "", if the file contains con- secutive empty lines. Setting to "" will treat two or more consecutive empty lines as a single empty line. Setting to "\n\n" will blindly assume that the next input character belongs to the next paragraph, even if it's a newline. [download] So, if you set `$/=""` you'll get something similar to your needs (if I'm understanding what you need). I would also recommend you to localize `$/`, so it will return to the previous value. So, you can surroud your paragraphs with tags with something like that: `#!/usr/bin/perl my @paras = do { local $/=""; <DATA> }; #Idiom localizing $/ for (@paras){ print "<html paragraph>\n"; print $_; print "</html paragraph>\n"; } __DATA__ This is the first paragraph of text. This is the second. This is also the second. This is the 3rd.` [download] P.D.: You should read Writeup Formatting Tips, please put code tags around your code.	[reply] [d/l] [select]
Re^2: matching paragraphs by imhotep (Novice) on Apr 18, 2005 at 20:13 UTC
Thanks for your reply! How do I get the input in? The file is in the same folder and thus far I have only learnt how to work from standard input using a while loop e.g. while ($line =<>); etc. I cannot see where the input is coming from? Apologies for my lack of understanding!!!!	[reply]
Re^3: matching paragraphs by starbolin (Hermit) on Apr 19, 2005 at 05:57 UTC
<> is an abreviation for < ARGV> and takes a filespec from the command line. < STDIN> reads from STDIO, `open FH, /dev/null or die "Can't open file\n"; while <FH> { #your code }` [download] This code sais: open the file /dev/null and give me a filehandle FH as a reference. Then read from the file referenced by FH and put that into $_. Stop reading at the end of the file. Reread post and ammended: imhotep wrote: #The variable, input in this case being #standard input That is wrong. You opened input.txt and created a filehandle INPUT. You are reading from input.txt. s//----->\t/;$~="JAPH";s//\r<$~~/;{s\|~$~-\|-~$~\|\|\|s \|-$~~\|$~~-\|\|\|s,<$~~,<~$~,,s,~$~>,$~~>,, $\|=1,select$,,$,,$,,1e-1;print;redo}	[reply] [d/l]
Re: matching paragraphs by Limbic~Region (Chancellor) on Apr 18, 2005 at 14:24 UTC
imhotep, Take a closer look at perldoc perlvar. To turn on "paragraph mode" in Perl, set $/ = "" not undef. Unfortunately, the rest of your post isn't very coherent (to me anyway), but the following might be what you want: `open (INPUT, '<', $file) or die $!; { local $/ = ""; while ( <INPUT> ) { print '<p>', $_, '</p>'; } }` [download] Cheers - L~R	[reply] [d/l]
Re: matching paragraphs by Joost (Canon) on Apr 18, 2005 at 14:34 UTC
Could you explain what you mean with a paragraph of text? You mean lines of text seperated by an empty line, you could do something like this to read a file by paragraph: `{ # create an extra scope local $/ = ""; # set record seperator to empty lines while (<INPUT>) { # $_ contains a paragraph of text from <INPUT> } } # $/ is restored to its original value` [download] Note that this code will tread more than one blank line as a single separator. See also the section $INPUT_RECORD_SEPARATOR in perlvar. "What should it profit a man, if he should win a flame war, yet lose his cool?"	[reply] [d/l]