Your example suggests that you are parsing SIP content (RFC 3261 etc).

Strangely, I don't see a CPAN module for SIP (wow).

The general answer to your specific enquiry (given elsewhere) is of course to use regular expressions.

In your specific case, a quick look at the RFC suggests that a SIP From: line can contain an 'name-addr' which seems to be an optional display-name and then an 'addr-spec', which is a SIPS URI or other kind of URI. This will be in angle brackets if a display name is present. (Just like RFC822 email addresses).

Getting this sort of thing right is hard (witness all the web apps which have their own ideas of what constitutes a valid email address, or the 6k-regexp to match an RFC822 address in 'Mastering Regular Expressions' - but I digress).

Happily - checking back on CPAN shows that the URI module supports sip: and sips: url schemes, so you basically want to:

  1. Pull out the value of the header (this is everything after the ':', but also including folding in any continuation lines which begin with tabs, try Mail::Header for this).
  2. (Hard bit) pull apart the (optional!) display name and the addr-spec (the bit which is between <> in your example, but which the standard says may occur bare if there is no display name - just like RFC822 addresses)
  3. Pass the addr-spec to the URI CPAN module to parse as a SIP URI, then call methods on that to break it down.
You need someone who is happy to read RFCs and write modules to do this properly, I'm not familiar with this one but the text appears to contain some odd requirements about 'odd' character handling which go beyond the details of the grammar.

One last idea is that you might be able to simply (hah) cut-and-paste the ABNF grammar from the RFC into a parser-generator (Parse::RecDescent?) and use that, but you may run into differences of how to specify a grammar.

Good luck :-)


In reply to Re: Extracting substrings from scalars by jbert
in thread Extracting substrings from scalars by sanjay nayak

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.