rementis has asked for the wisdom of the Perl Monks concerning the following question:

Thanks in advance, oh wise Monks... OK, this is ugly, but I need to search it, take this glob of text:

SAOHXXX20090209XXX020607XXXchazerpap07_PRD_00XXX1XXXN0013IT_QALS-LWEDTL000216V00 0820090207X000800000000X000800000000X000833333333X000820090207N0004JESTL000268V0 034 000X00340000000000000000000000000000000000X003 400000000000XXX00000000000000000000000X00342222222222222222222222222222222333X00 340000000000000000000000000000000000N0014<%_TABLE_MAKT>L0003204V0003???X0006???? ??X0006??????N0009IT_LIST[]L00018V0019Table IT_1042x76XXX0TH034\PROGRAM=ZKQMR_ INSP\DATA=IT_LIST[]TH100Table reference: 56 TH055TABH+ 0(20) = FFFFFFFCC883EE98 FFFFFFFCC895FBB800XXX000000TH055TABH+ 20(20) = 000000380000006800000002000002F8F FFFFFFFTH047TABH+ 40(16) = 0400000000004040001024C401800000TH033store = 0 xFFFFFFFCC883EE98TH033ext1 = 0xFFFFFFFCC895FBB8TH033sXXXhmId = 0 (0x00000000)TH033id = 56 (0x00000038)TH033label = 104 (0x00000068)TH033fill = 2 (0x00000002)TH033leng = 760 (0x0 00002F8)TH033loopXXX = -1 (0xFFFFFFFF)TH026xtyp = TYPE#000328TH033oc cu = 16 (0x00000010)TH039access = 1 (ItAccessStandard)TH034 idxKind = 0 (ItIndexNone)TH034uniKind = 2XXX(ItUniqueNon)TH030keyK ind = 1 (default)TH032cmpMode = 8 (cmpManyEq)TH016occu0 = 1TH016groupCntl = 0TH016rfc = 0TH016unShareable = 0TH017mightBeS hared = 0TH020sharXXXedWithShmTab = 0TH016isShmLockId = 0TH016gcKind = 0T H016isUsed = 1TH016isCtfyAble = 1TH039>>>>> Shareable Table Header Data <<<<<TH033tabi = 0xFFFFFFFCC881FDE8%M

OK, in the middle of that mess there exists the string "PROGRAM=". I am regexing like this: $_ =~ /PROGRAM=(.*)\\/; in order to capture what comes after the = sign. This works most of the time, but for some of these blobs, like this one, I get "02" for an answer instead of "ZKQMR_ INSP", which I was hoping to get. Where is the "02" coming from? It's driving me crazy!

Replies are listed 'Best First'.
Re: Very Strange Result from REGEX
by ikegami (Patriarch) on Feb 19, 2009 at 22:48 UTC

    This works most of the time, but for some of these blobs, like this one, I get "02" for an answer instead of "ZKQMR_ INSP"

    uh, no?

    $_ = do { local $/; <DATA> }; s/\n//g; my ($prog) = $_ =~ /PROGRAM=(.*)\\/ or die("No match\n"); print("Program: $prog\n"); __DATA__ SAOHXXX20090209XXX020607XXXchazerpap07_PRD_00XXX1XXXN0013IT_ QALS-LWEDTL000216V000820090207X000800000000X000800000000X000 833333333X000820090207N0004JESTL000268V0034 000X00340000000000000000000000000000000000X003 400000000000XXX00000000000000000000000X003422222222222222222 22222222222222333X00340000000000000000000000000000000000N001 4<%_TABLE_MAKT>L0003204V0003???X0006??????X0006??????N0009IT _LIST[]L00018V0019Table IT_104[2x76XXX0]TH034\PROGRAM=ZKQMR_ INSP\DATA=IT_LIST[]TH100Table reference: 56 TH055TABH+ 0(20 ) = FFFFFFFCC883EE98FFFFFFFCC895FBB800XXX000000TH055TABH+ 20 (20) = 000000380000006800000002000002F8FFFFFFFFTH047TABH+ 40 (16) = 0400000000004040001024C401800000TH033store = 0 xFFFFFFFCC883EE98TH033ext1 = 0xFFFFFFFCC895FBB8TH033 sXXXhmId = 0 (0x00000000)TH033id = 56 (0x00000038)TH033label = 104 (0x00000068)TH033fil l = 2 (0x00000002)TH033leng = 760 (0x0 00002F8)TH033loopXXX = -1 (0xFFFFFFFF)TH026xtyp = TYPE#000328TH033occu = 16 (0x00000010)TH039acce ss = 1 (ItAccessStandard)TH034idxKind = 0 (ItIndexNone)TH034uniKind = 2XXX(ItUniqueNon)TH030keyK ind = 1 (default)TH032cmpMode = 8 (cmpMany Eq)TH016occu0 = 1TH016groupCntl = 0TH016rfc = 0TH016unShareable = 0TH017mightBeShared = 0TH020sharXX XedWithShmTab = 0TH016isShmLockId = 0TH016gcKind = 0T H016isUsed = 1TH016isCtfyAble = 1TH039>>>>> Shareabl e Table Header Data <<<<<TH033tabi = 0xFFFFFFFCC881F DE8%M
    Program: ZKQMR_INSP

    Possible problem elsewhere: You're using $1 without checking if the search matched.

    $_ = "xxxxx FOO=02 xxxxx PROGRAM=ZKQMR_INSP xxxxx"; /FOO=(\d+)/; /PROGRAM=(.*)\\/; my $program = $1; print("$program\n"); # 02

    Possible unrelated problem: Do you realize /(.*)\\/ will match until the *last* \\ in $_? I think you want /([^\\]*)\\/.

    Update: Added second code snippet.

Re: Very Strange Result from REGEX
by dwm042 (Priest) on Feb 19, 2009 at 22:55 UTC
    Without seeing the specifics of your code, it's going to be hard to tell.

    When I try to duplicate your results, I get the "expected" result, not a "02".

    #!/usr/bin/perl use warnings; use strict; my $blob = <DATA>; if ( $blob =~ /PROGRAM=(.*)\\/ ) { print "Dollar One = $1\n"; } __DATA__ SAOHXXX20090209XXX020607XXXchazerpap07_PRD_00XXX1XXXN0013IT_QALS-LWEDT +L000216V00 0820090207X000800000000X000800000000X000833333333X00082009 +0207N0004JESTL000268V0 034 000X00340000000000000000000000000000000000 +X003 400000000000XXX00000000000000000000000X0034222222222222222222222 +2222222222333X00 340000000000000000000000000000000000N0014<%_TABLE_MA +KT>L0003204V0003???X0006???? ??X0006??????N0009IT_LIST[]L00018V0019Ta +ble IT_1042x76XXX0TH034\PROGRAM=ZKQMR_ INSP\DATA=IT_LIST[]TH100Table +reference: 56 TH055TABH+ 0(20) = FFFFFFFCC883EE98 FFFFFFFCC895FBB800X +XX000000TH055TABH+ 20(20) = 000000380000006800000002000002F8F FFFFFFF +TH047TABH+ 40(16) = 0400000000004040001024C401800000TH033store = 0 xF +FFFFFFCC883EE98TH033ext1 = 0xFFFFFFFCC895FBB8TH033sXXXhmId = 0 (0x000 +00000)TH033id = 56 (0x00000038)TH033label = 104 (0x00000068)TH033fill + = 2 (0x00000002)TH033leng = 760 (0x0 00002F8)TH033loopXXX = -1 (0xFF +FFFFFF)TH026xtyp = TYPE#000328TH033oc cu = 16 (0x00000010)TH039access + = 1 (ItAccessStandard)TH034 idxKind = 0 (ItIndexNone)TH034uniKind = +2XXX(ItUniqueNon)TH030keyK ind = 1 (default)TH032cmpMode = 8 (cmpMany +Eq)TH016occu0 = 1TH016groupCntl = 0TH016rfc = 0TH016unShareable = 0TH +017mightBeS hared = 0TH020sharXXXedWithShmTab = 0TH016isShmLockId = 0 +TH016gcKind = 0T H016isUsed = 1TH016isCtfyAble = 1TH039>>>>> Shareabl +e Table Header Data <<<<<TH033tabi = 0xFFFFFFFCC881FDE8%M
    And the results are:

    C:\Code>perl parse_blob.pl Dollar One = ZKQMR_ INSP
Re: Very Strange Result from REGEX
by CountZero (Bishop) on Feb 19, 2009 at 22:43 UTC
    Please put your glob of text in <code> ... </code> tags so we can download it and try it.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James