Howdy :)

I have a set of filenames that appear as follows:

hosta-sel-kr-1,my-domain,net.testa hostb-sel-kr-1,my-domain,net.testb hostc-sel-kr-1,my-domain,com.testa hostd-sel-kr-1,my-domain,com.testc hoste-sel-kr-1,my-domain,net.testxyz hosta-mel-au-1,my-domain,net.testabc hosta-mel-au-1,my-domain,net.testdef hostxyz.testabc someotherhost.someothertest
The format of each filename is:

I need to extract the hostname and the test name from each filename, however:

I have written an expression that does everything except replace the commas, and this is where I am stuck.
I know I could simply post-process each file with s/,/\./;, but that's not very elegant and I'm sure it can be done within the expression.

So I have two questions:

  1. How do I replace the commas within the expression?
  2. Can the expression be made more efficient? (In production, this will run every minute and will process approx 6000 files on each run)
Here is the code I have so far:
#!/usr/bin/perl -w use strict; while (<DATA>) { my ($host, $test) = ($_ =~ m/ ( # Start first capture [\w\-]+ # One or more alphanum or hyphens (?: # non-capturing lookahead ,my-domain,com # Literal string )? # Make it optional ) # End of first capture (?: # non-capturing lookahead [\w\-,]+ # One or more alpanum or hyphens )? # Make it optional \. # A literal period ( # Start second capture [a-z]+ # One or more lowercase chars ) # End second capture /x) or print "Cannot parse $_\n" and next; print "Host:$host Test:$test\n"; } __DATA__ hosta-sel-kr-1,my-domain,net.testa hostb-sel-kr-1,my-domain,net.testb hostc-sel-kr-1,my-domain,com.testa hostd-sel-kr-1,my-domain,com.testc hoste-sel-kr-1,my-domain,net.testxyz hosta-mel-au-1,my-domain,net.testabc hosta-mel-au-1,my-domain,net.testdef hostxyz.testabc someotherhost.someothertest

Any advice would be greatly appreciated.
Thanks in advance,
Darren :)


In reply to Regex: Capturing and optionally replacing by McDarren

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.