jitender has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

i needs to match the value in yaml file and column value from excel sheet, in excel sheet column values having some special characters like '- * spaces' example : ABC8200 3*AB25 Products. in yaml file value is ABC8200 3*AB25 Products.

example :

yaml value : excel column value

NABCv-ABC : NABCv-ABC(not having space in both)

BC8200 3*AB25 Products : BC8200 3*AB25 Products (having space in both)

if i am using $value =~ s/ .*//; then in excel column values are matching but with space words giving error.

Regards! Jitender B

Replies are listed 'Best First'.
Re: regex format issue
by Corion (Patriarch) on Aug 27, 2018 at 10:27 UTC

    Can you show us the values that are equal but your code shows as different?

    Please also show us the relevant part of the code you've written and tell us the exact error message/output you get.

    As it is, it is difficult to understand for me where exactly you are having problems. You seem to have code that detects identity when there are no spaces in the product names, but it seems to fail when there is whitespace in the names.

    A potential cause for this might be that in the YAML or in the Excel data, there is whitespace at the end of the values.

    You can remove whitespace at the end of your values by using:

    $value =~ s/\s+$//;
Re: regex format issue (updated x 2)
by AnomalousMonk (Archbishop) on Aug 27, 2018 at 17:10 UTC

    I agree with Corion and haukex that your problem is very vaguely stated. However, I never let my ignorance keep me from offering advice. Based on several WAGs (Wild-Ass Guesses) about your actual data and your actual problem, here's a possible (?) approach to developing a framework for creating a solution:

    c:\@Work\Perl\monks>perl -wMstrict -le "my @data = ( 'NABCv-ABC : NABCv-ABC', 'BC8200 3*AB25 Products : BC8200 3*AB25 Products', ' BC8200 3*AB25 Products : BC8200 3*AB25 Products ', ' Q : Q ', 'something : else', 'U:U', ' : V', 'W : ', 'X', '', ); ;; DATUM: for my $datum (@data, @ARGV) { my $parsed = my ($ya, $ex) = $datum =~ m{ \A \s* (\S .*?) \s+ : \s+ (\S .*?) \s* \z }xms; ;; if (not $parsed) { print qq{nothing parsed from '$datum'}; next DATUM; } s{ \A \s+ | \s+ \z }{}xmsg for $ya, $ex; print qq{'$ya' and '$ex' are }, $ya eq $ex ? '' : 'NOT ', 'equal'; } " "what : ever" 'NABCv-ABC' and 'NABCv-ABC' are equal 'BC8200 3*AB25 Products' and 'BC8200 3*AB25 Products' are equal 'BC8200 3*AB25 Products' and 'BC8200 3*AB25 Products' are equal 'Q' and 'Q' are equal 'something' and 'else' are NOT equal nothing parsed from 'U:U' nothing parsed from ' : V' nothing parsed from 'W : ' nothing parsed from 'X' nothing parsed from '' 'what' and 'ever' are NOT equal
    Note that you should really be using some kind of Test::More development/testing framework as suggested by haukex here.

    Update 1: Any need to strip leading/trailing whitespace can be eliminated by proper design of the field components of the field extraction regex (tested):

    my $rx_ya = qr{ \S (?: \s* \S+)* }xms; my $rx_ex = $rx_ya; my $rx_sep = qr{ \s+ : \s+ }xms; ... $datum =~ m{ \A \s* ($rx_ya) $rx_sep ($rx_ex) \s* \z }xms;
    Incidentally, consider the records "foo:bar : foo:bar" and "foo : bar : foo : bar". Both are parsed by the code above, but one produces equal fields and the other does not. These are corner cases you need to pay attention to during development, and they're more reasons to use a Test::More-like development framework.

    Update 2: Also note that it's easy to write the extraction regex so that a single, whitespace-trimmed field is extracted only if both fields are equal.


    Give a man a fish:  <%-{-{-{-<

Re: regex format issue
by haukex (Archbishop) on Aug 27, 2018 at 10:35 UTC
Re: regex format issue
by talexb (Chancellor) on Aug 27, 2018 at 18:22 UTC

    I'm just wondering why you can't just use YAML to solve this problem.

    Alex / talexb / Toronto

    Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.