sanjay nayak has asked for the wisdom of the Perl Monks concerning the following question:

This node falls below the community's minimum standard of quality and will not be displayed.

Replies are listed 'Best First'.
Re: Extracting substrings from scalars
by GrandFather (Saint) on Oct 12, 2006 at 07:15 UTC

    Suppose you show us what you have tried and tell us what you have read already. Just in case you haven't done your homework I'd suggest you read the Perl documentation first: perlretut, perlrequick, perlre and perlreref.


    DWIM is Perl's answer to Gödel
Re: Extracting substrings from scalars
by davorg (Chancellor) on Oct 12, 2006 at 08:20 UTC

    We need more information before we can do anything other than guess at solutions (which would be a waste of both our time and yours).

    What are the rules that you are following to decide what needs to be extracted? For example, here are some possible rules that yield the behaviour you've asked for in your first example:

    • Extract the word "sanjay" and all following characters until you reach a number
    • Extract characters 7 to 25 inclusive
    • Extract anything that looks like an email address, but truncate it at the '@' sign
    • Extract anything between the first space and the first '@' sign.

    Without more information we have no way of knowing which rule you are following.

    --
    <http://dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

Re: Extracting substrings from scalars
by jbert (Priest) on Oct 12, 2006 at 10:24 UTC
    Your example suggests that you are parsing SIP content (RFC 3261 etc).

    Strangely, I don't see a CPAN module for SIP (wow).

    The general answer to your specific enquiry (given elsewhere) is of course to use regular expressions.

    In your specific case, a quick look at the RFC suggests that a SIP From: line can contain an 'name-addr' which seems to be an optional display-name and then an 'addr-spec', which is a SIPS URI or other kind of URI. This will be in angle brackets if a display name is present. (Just like RFC822 email addresses).

    Getting this sort of thing right is hard (witness all the web apps which have their own ideas of what constitutes a valid email address, or the 6k-regexp to match an RFC822 address in 'Mastering Regular Expressions' - but I digress).

    Happily - checking back on CPAN shows that the URI module supports sip: and sips: url schemes, so you basically want to:

    1. Pull out the value of the header (this is everything after the ':', but also including folding in any continuation lines which begin with tabs, try Mail::Header for this).
    2. (Hard bit) pull apart the (optional!) display name and the addr-spec (the bit which is between <> in your example, but which the standard says may occur bare if there is no display name - just like RFC822 addresses)
    3. Pass the addr-spec to the URI CPAN module to parse as a SIP URI, then call methods on that to break it down.
    You need someone who is happy to read RFCs and write modules to do this properly, I'm not familiar with this one but the text appears to contain some odd requirements about 'odd' character handling which go beyond the details of the grammar.

    One last idea is that you might be able to simply (hah) cut-and-paste the ABNF grammar from the RFC into a parser-generator (Parse::RecDescent?) and use that, but you may run into differences of how to specify a grammar.

    Good luck :-)

Re: Extracting substrings from scalars
by kabeldag (Hermit) on Oct 12, 2006 at 08:02 UTC
    Classic case for learning some Reg-Ex's :- ) Either that or just parse each char of the data via substr(,,).

    I don't mind substrings at all. That's just me though. Others may have different opinions.
Re: Extracting substrings from scalars
by davido (Cardinal) on Oct 12, 2006 at 16:29 UTC

    Here's a fragile "stab in the dark" approach to supposition one:

    my $result; if( $aa =~ m/From:\s([^@]+)@/ ) { $result = $1; $result =~ s/\s</</g; }

    And here's a fragile "stab in the dark" approach to supposition two:

    my $result; if( $bb =~ m/UDP\s*([\d.]+);/ ) { $result = $1; }

    These examples will handle the narrow examples you've provided, but are nowhere near fully complient SIP handlers. If you want a robust solution you'll need to give more information on what you're actually trying to do.


    Dave