comment on

Here is a naughty one liner to extract all occurances of IP addresses between tags <IP-ADDRESS> in any case and across lines. Very inefficient for a large file as it reads it all into memory. Change the word data to the name of your file.

perl -le 'local$/;open F,data;$_=<F>;s/\n//g;while(/<ip-address>(.+?)<
+\/ip/gi){print $1}'
[download]

Here is the same but to grap IP addresses from either <IP-ADDRESS> or <IP_NEIGHBOUR> tags.

perl -le 'local$/;open F,data;$_=<F>;s/\n//g;while(/<ip-(neigbour|addr
+ess)>(.+?)<\/ip/gi){print $2}'
[download]

update

For some reason I got marked -1 on this, if anyone can explain what I did wrong here I'd love to know. I realise the code is naughty for eating the file in one gulp but if the file is small this can't do much harm and it makes for a very simple solution to the possible problem of the addresses being broken accross line breaks.

Anyway, looking at this thread the OP looks to have changed his mind and not want to capture the <IP-NEIGHBOUR> addresses, as well as wanting only unique addresses returned, I will update this space soon with a version that is more mem friendly and possibly redeem myself in the eyes of the monastery.

further update

OK, I have had the error of my ways pointed out, thou shalt not parse XML with regexp. I shall stop sinning now, no further unholy code shall follow.
R.

In reply to Re^3: Extracting tagged data from a XML file by Random_Walk
in thread Extracting tagged data from a XML file by theroninwins

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.