dani_cv has asked for the wisdom of the Perl Monks concerning the following question:

Hello Everyone,


How should I take out User,Phone,Email,misc_info from the follwoing $tempStr string using Regex?


my $tempStr = "sdfasdfsdfsd
asdfasdfasdfasdfsdfsd
sadfsdafasd
asdfasdfasd


User soandso
password = 2341234sdf234
phone =
email = a@b.com
misc_info =
is_superuser = TRUE
is_appbuilder = TRUE
is_packagehacker = FALSE
is_user_maint = TRUE
authentication = CQ
is_dynamic_list_admin = FALSE
is_public_folder_admin = FALSE
is_security_admin = FALSE
is_raw_sql_editor = FALSE


User soandso1
password = 2341234sdf234
phone =
email = a@b.com
misc_info =
is_superuser = TRUE
is_appbuilder = TRUE
is_packagehacker = FALSE
is_user_maint = TRUE
authentication = CQ
is_dynamic_list_admin = FALSE
is_public_folder_admin = FALSE
is_security_admin = FALSE
is_raw_sql_editor = FALSE"


Thanks in advance,
Dan.

Replies are listed 'Best First'.
Re: regexp parsing from a big string...
by moritz (Cardinal) on May 12, 2008 at 19:52 UTC
    Don't do it with one regex, instead split it first into records.
    use Data::Dumper; for my $record (split m/\n\n\n/, $tempStr){ my %user_info; for (split m/\n/, $record){ if (m/^User (.*)$/){ $user_info{name} = $1; } elsif (m/^(\w+)\s*=\s*(.*)$/){ $user_info{$1} = $2; } } print Dumper \%user_info if %user_info; }

    (Update: Code now actually works).

    Note that your email addresses a@b.com try to interpolate the array @b if the string is in double quotes, so please use single quotes (or even better, a here-doc) instead.

      A here-doc will interpolate the @b also, unless the delimiter is specifically put in single quotes.

      my @fruit = qw( apple banana cherry ); my $bare_heredoc = <<BARE_HEREDOC; I am kyle@fruit.com, I say. BARE_HEREDOC ; my $qq_heredoc = <<"QQ_HEREDOC"; I am kyle@fruit.com, I say. QQ_HEREDOC ; my $q_heredoc = <<'Q_HEREDOC'; I am kyle@fruit.com, I say. Q_HEREDOC ; print "bare: $bare_heredoc"; print "double quoted: $qq_heredoc"; print "single quoted: $q_heredoc"; __END__ bare: I am kyleapple banana cherry.com, I say. double quoted: I am kyleapple banana cherry.com, I say. single quoted: I am kyle@fruit.com, I say.
Re: regexp parsing from a big string...
by psini (Deacon) on May 12, 2008 at 19:59 UTC

    I assume that you want to get an array of users

    The problem doesn't have a unique solution unless you know for certain that at least one of the items is always present, say User or the empty line.

    I don't know if this is the most elegant approach, but I'd try to split then string on /User / and then, on the resulting array elements, apply a regex like /([^\n]*).*phone =([^\n]*).../ and so on

    Rule One: Do not act incautiously when confronting a little bald wrinkly smiling man.
Re: regexp parsing from a big string...
by apl (Monsignor) on May 12, 2008 at 20:06 UTC
    • Split $tempStr into an array of rows
    • Have all of your desired keywords (user, password, etc.) as the keys of a hash
    • Loop through the array of rows. If the first field of the line is a key in the hash, store/display the balance of the line
Re: regexp parsing from a big string...
by GrandFather (Saint) on May 12, 2008 at 23:58 UTC

    You could:

    use strict; use warnings; my $tempStr = <<'STR'; sdfasdfsdfsd asdfasdfasdfasdfsdfsd ... is_raw_sql_editor = FALSE" STR open my $inFile, '<', \$tempStr; local $/ = "\n\n"; # Break input into paragraphs while (<$inFile>) { chomp; next unless length; next unless / User\s+(\w+).*? phone\s*=\s*(\w*).*? email\s*=\s*(\w*).*? misc_info\s*=\s*(.*) /xs; print "User $1, phone $2, email $3, misc_info:\n$4\n\n"; }

    Prints:

    User soandso, phone , email a, misc_info: is_superuser = TRUE is_appbuilder = TRUE is_packagehacker = FALSE is_user_maint = TRUE authentication = CQ is_dynamic_list_admin = FALSE is_public_folder_admin = FALSE is_security_admin = FALSE is_raw_sql_editor = FALSE User soandso1, phone , email a, misc_info: is_superuser = TRUE is_appbuilder = TRUE is_packagehacker = FALSE is_user_maint = TRUE authentication = CQ is_dynamic_list_admin = FALSE is_public_folder_admin = FALSE is_security_admin = FALSE is_raw_sql_editor = FALSE"

    which scales nicely to obtaining the input from a file.


    Perl is environmentally friendly - it saves trees
      I think that your (\w*) for the email address is not working, it seems to be dropping the "@b.com" that is in the string. Perhaps (\S*) would do the trick (not tested).

      Cheers,

      JohnGG