Howdy :)
I have a set of filenames that appear as follows:
hosta-sel-kr-1,my-domain,net.testa
hostb-sel-kr-1,my-domain,net.testb
hostc-sel-kr-1,my-domain,com.testa
hostd-sel-kr-1,my-domain,com.testc
hoste-sel-kr-1,my-domain,net.testxyz
hosta-mel-au-1,my-domain,net.testabc
hosta-mel-au-1,my-domain,net.testdef
hostxyz.testabc
someotherhost.someothertest
The format of each filename is:
- a hostname
- followed by an (optional) domain name
- followed by a period
- followed by a name of a test
I need to extract the hostname and the test name from each filename, however:
- If the domain name is .net, I need to drop it (Update: Just the domain name, not the whole filename)
- If the domain name is .com, I need to keep it AND replace the commas with periods.
I have written an expression that does everything except replace the commas, and this is where I am stuck.
I know I could simply post-process each file with
s/,/\./;, but that's not very elegant and I'm sure it can be done within the expression.
So I have two questions:
- How do I replace the commas within the expression?
- Can the expression be made more efficient? (In production, this will run every minute and will process approx 6000 files on each run)
Here is the code I have so far:
#!/usr/bin/perl -w
use strict;
while (<DATA>) {
my ($host, $test) = ($_ =~
m/
( # Start first capture
[\w\-]+ # One or more alphanum or hyphens
(?: # non-capturing lookahead
,my-domain,com # Literal string
)? # Make it optional
) # End of first capture
(?: # non-capturing lookahead
[\w\-,]+ # One or more alpanum or hyphens
)? # Make it optional
\. # A literal period
( # Start second capture
[a-z]+ # One or more lowercase chars
) # End second capture
/x)
or print "Cannot parse $_\n" and next;
print "Host:$host Test:$test\n";
}
__DATA__
hosta-sel-kr-1,my-domain,net.testa
hostb-sel-kr-1,my-domain,net.testb
hostc-sel-kr-1,my-domain,com.testa
hostd-sel-kr-1,my-domain,com.testc
hoste-sel-kr-1,my-domain,net.testxyz
hosta-mel-au-1,my-domain,net.testabc
hosta-mel-au-1,my-domain,net.testdef
hostxyz.testabc
someotherhost.someothertest
Any advice would be greatly appreciated.
Thanks in advance,
Darren :)
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.