in reply to Re^3: Parsing file in Perl post processing
in thread Parsing file in Perl post processing

Thanks for the explanation, but I am still a bit confused

about how you exactly created the hash.

my %hash = map { $_, ($string =~ m/$_:(.+?)\s*$z/)} @keys;

what I understand is that you use map to create a new list from the @keys array. Am I correct in saying that $_ contains the first entry from @keys array? next part I don't understand at all could you please walk through the code?

Replies are listed 'Best First'.
Re^5: Parsing file in Perl post processing
by GotToBTru (Prior) on Sep 14, 2015 at 14:52 UTC

    map executes the contents of {} once for each element in @keys. Here is (part of) what is in @keys:

    0 'THREAD_ID' 1 'CDR_TYPE' 2 'SUB_TIME'

    map executes the contents of the block once for each value in @keys.

    { 'THREAD_ID', ('THREAD_ID:1bf1d698 CDR_TYPE:AO SUB_TIME:240815144 +127 DEL...' =~ m/THREAD_ID:(.+?)\s*$z/)} { 'CDR_TYPE', ('THREAD_ID:1bf1d698 CDR_TYPE:AO SUB_TIME:240815144 +127 DEL...' =~ m/CDR_TYPE:(.+?)\s*$z/)} { 'SUB_TIME', ('THREAD_ID:1bf1d698 CDR_TYPE:AO SUB_TIME:240815144 +127 DEL...' =~ m/SUB_TIME:(.+?)\s*$z/)}

    Each statement has two parts, separated by the comma. The first part returns the literal value of the key.

    The second part returns the matching value in the string, if any. $z has been defined to match a key, ie, any combination of uppercase letters and underscores followed by a colon. So in effect, I am saying "look in that long string for THREAD_ID followed by a colon, and then remember anything you find between there and the next thing that looks like a key. Then do it again for CDR_TYPE, SUB_TIME, etc."

    After all three of these have executed, map returns ('THREAD_ID,'1bf1d698','CDR_TYPE','AO','SUB_TIME','240815144127'). If I assign this list to a hash, Perl understands to treat the first element of each pair as the key of the hash and the second as the value.

    %hash = ('THREAD_ID,'1bf1d698','CDR_TYPE','AO','SUB_TIME','240815144127')

    is the same as:

    $hash{'THREAD_ID'} = '1bf1d698'; $hash{'CDR_TYPE'} = 'AO'; $hash{'SUB_TIME'} = '240815144127';

    By the way, that ability to say "match something that looks like this" is really the power of regexes. Computers are fantastic at matching things, and most languages have a built-in ability to "find 'X' in 'WXYZ'". Regexes let you say "find the first lowercase vowel that comes after the third consonant unless preceded by an exclamation point unless also followed by a question mark."

    Dum Spiro Spero

      Thank you for your patience I am close to understanding your code

      I have one last question

      $string =~ m/$_:(.+?)\s*$z/

      If I try to make sense of the code above I don't understand what you are doing in (.+?)

      And in $z you use the alternative meta character | followed by $ I am not sure what this means

        You need to go thru a good regex tutorial. Look in the Tutorials section on this site.

        (.+?) The () says remember the part of the string that matches this. Matches what? The period is a wildcard, meaning match any letter, number, punctuation, whatever, EXCEPT a linefeed. The plus means match the previous thing 1 or more times. The question mark means don't be greedy - the matching should stop as soon as it can. This is important in this case because the . matches almost everything. .+ by itself would match an entire paragraph if you let it. So how does it know when to stop? The stuff following it tells it. \s means a whitespace character, like a space or tab. The asterisk means match it zero or more times. \s* means zero or more spaces. And $z, as we said before, matches something that looks like one of your keys.

        So, taken all together, what does this mean? Let's assume $_ = 'THREAD_ID'. m/THREAD_ID:(.+?)\s*$z/. Look thru the string until you find THREAD_ID:. Starting with the very next letter, 1, match until you find either zero or more spaces followed by something that looks like a key. After the 8, there is a space and CDR_TYPE, which looks like a key (a combination of capital letters and underscores). So the match will stop right after the 8. The parentheses means 1bf1d698 is returned.

        When I first wrote this, $z just matched a key, and this worked for everything except the last one in the list, because, being the last one, it isn't followed by something that looks like a key. So I added |$. $ matches the end of line, that linefeed character. | is a logical or. $z now means match something that looks like a key or the end of the line.

        Dum Spiro Spero