comment on

Hello!

Not sure how to write the topic...

I have html page that contains a few occurrences of some text I need to record.

<START>TEXT1<END><br>
<various data><br>
<START>TEXT2<END><br>
<various data><br>
<START>TEXT3<END><br><br>
[download]

etc etc.

And so I need TEXT1, TEXT2, TEXT3 etc, recorded in a file.

I first reformated the page content in ONE line by striping the carriage return. So I have one string now and want to extract substrings. Why did I do this? Because I imagine it will be easier to extract the substrings..

Then:

open (FICH, "$file"); $all = <FICH>; close (FICH);
my $good = $1 if ($all =~ m/$start(.*?)$end/);
[download]

This will give me the first TEXT ($good) occurrence. But how to get all the next ones?

Thanks!

PS. Each page contains up to 50 substrings I need to extract, and I will have a large quantity of pages, that I will process one by one. Substrings to be recorded like CSV, one substring by line, to be later exploited by an Excel sheet.

In reply to Search all occurences of text delimited by START and END in a string by natol44

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.