comment on

So interesting problem I've run into. I have a script for pulling hostnames in a two-node cluster. I ghettoed a bit using a bash egrep expression to determine hostnames have been added to /etc/hosts on both nodes.

So I may have an /etc/hosts that looks like this:

192.168.1.199 hostname62a.domain.com hostname62a
192.168.1.200 hostname62b.domain.com hostname62b
192.168.1.201 hostname62.domain.com  hostname62
192.168.2.144 hostname62amgt.domain.com hostname62amgt
192.168.2.145 hostname62bmgt.domain.com hostname62bmgt
[download]

here's a snip of my code:

my $ha1 = "hostname62a";
my $ha2 = "hostname62b";

my $cmd1 = "egrep -i \"\\b$ha1\\b|\\b$ha2\\b\" /etc/hosts";

open(HOSTS1, "$cmd1|");
while(<HOSTS1>) {
    chomp;
    push (@hosts_ha1, $_);
}
close(HOSTS1);
[download]

I used word boundaries (\b) to make sure I only find what I'm looking for. Normally, this would return something like below:

192.168.1.199 hostname62a.domain.com hostname62a
192.168.1.200 hostname62b.domain.com hostname62b
[download]

This is what I want. Just the two hostnames.

The hostnames themselves follow whatever standard the customer sets, so we have little control over what they name their stuff. But usually the above code works well for just pulling out the hostnames. We do control how they format the names in /etc/hosts by providing a script interface, so how stuff is laid out in /etc/hosts is pretty constant.

Now here's the problem: (\b) boundaries work pretty well most of the time. But we have one customer that named his stuff like this:

192.168.1.199 hostname62a.domain.com hostname62a
192.168.1.200 hostname62b.domain.com hostname62b
192.168.1.201 hostname62.domain.com  hostname62
192.168.2.144 hostname62a-r.domain.com hostname62a-r
192.168.2.145 hostname62b-r.domain.com hostname62b-r
[download]

So the above egrep statement finds these:

192.168.1.199 hostname62a.domain.com hostname62a
192.168.1.200 hostname62b.domain.com hostname62b
192.168.2.144 hostname62a-r.domain.com hostname62a-r
192.168.2.145 hostname62b-r.domain.com hostname62b-r
[download]

This is because "-" isn't considered part of a word if it's at the end, so the "\b" ignores it. I got no idea how to craft the right expression to determine just the hostnames I want. I do have customers that name their stuff like below:

192.168.1.2 hostname-node1.domain.com hostname-node1
192.168.1.3 hostname-node2.domain.com hostname-node2
192.168.1.4 hostname-node1mgt.domain.com hostname-node1mgt
192.168.1.5 hostname-node2mgt.domain.com hostname-node2mgt
[download]

Which will return:

192.168.1.2 hostname-node1.domain.com hostname-node1
192.168.1.3 hostname-node2.domain.com hostname-node2
[download]

So I can't split on the "-". Ugh, even now my head hurts thinking about this issue. Does anyone have any idea for some nifty perl regex that could solve my problem?

In reply to Perl regex and word boundaries by MeatLips

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.