writch has asked for the wisdom of the Perl Monks concerning the following question:

I came across the 'hackerrank.com' website, and I thought it was a cool idea. I began running the practice examples in the REGEX section. I got stuck on one of them, stymied, and wrote to support asking what was going on, because my code was processing the data and returning the correct results locally, yet I was getting 'Wrong Answer' as the result when I submitted it for review.

When they replied, they suggested I go to www.regexr.com to see what the 'error' was, and pasting my expression in their simulator showed an error which wasn't coming up in my code on my Ubuntu 18 machine running Perl 5.26. In order to fix it, I modified the expression from  /([\w\.]+\@[\w]+\.[\w]+.*?)[^\w\.]/g to /([\w\.]+\@[\w]+\.[\w]+.*?)(?:$|[^\w\.])/g (explicitly looking for EOL as well).

Is this a version issue? Did some previous version of Perl not look at EOL as a non-word, and now it does?

And why did it match EOL on every line prior to EOD, but only Perl understood EOD to be a non-word? Why did I have to look for EOL so that I could match EOD?

Replies are listed 'Best First'.
Re: hackerrank.com question
by choroba (Cardinal) on Sep 27, 2018 at 13:34 UTC
    What string should we try the regex against? What assignment was it?

    Please give us more details. I tried running the regex against "z" or " " in 5.18.2 and blead, but the results are consistent across Perl versions.

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
      The test was at https://www.hackerrank.com/challenges/detect-the-email-addresses/problem

      This might be in a state of flux. I am getting a 404 error trying to load the example docs I had from before. As I mentioned, I have an open ticket on this issue, as my opinion is that if they offer 'tests' then their tests should perform exactly like the languages they are testing.

      The code I wrote was such:

      #!/usr/bin/perl use strict; use warnings; my (@res); while(<>){ my @add = $_ =~ /([\w\.]+\@[\w]+\.[\w]+.*?)[^\w\.]/g; $_ =~ s/\.$// for @add; push @res, @add; } my %uniq; $uniq{$_}++ for @res; @res = sort keys %uniq; print join ";", @res;

      The input data was this:

      51 E-MAIL ADDRESSES OF GMs AND DRMs ON IR RLY GM E-Mail Division DRM E-Mail CR gm@cr.railnet.gov.in Mumbai drm@bb.railnet.gov.in Bhusawal drm@bsl.railnet.gov.in Pune drm@pa.railnet.gov.in Nagpur drm@ngp.railnet.gov.in Solapur drm@sur.railnet.gov.in ER gm@er.railnet.gov.in Howrah drmhwh@er.railnet.gov.in Sealdah drmsdah@er.railnet.gov.in Asansol drmasn@er.railnet.gov.in Malda drmmldt@er.railnet.gov.in ECR gm@ecr.railnet.gov.in Danapur drmdnr@ecr.railnet.gov.in Dhanbad drmdhn@ecr.railnet.gov.in Mughalsarai drmmgs@ecr.railnet.gov.in Samastipur drmspj@ecr.railnet.gov.in Sonpur drmsee@ecr.railnet.gov.in ECoR gm@eastcoastrailway.gov.in Khurda Road drmkur@east +coastrailway.gov.in Sambalpur drmsbp@eastcoastrailway.gov.in Waltair drmwat@eastcoastrailway.gov.in NR gm@nr.railnet.gov.in Delhi drm@dli.railnet.gov.in Ambala drm@umb.railnet.gov.in Moradabad drm@mb.railnet.gov.in Lucknow drm@lko.railnet.gov.in Ferozepur drm@fzr.railnet.gov.in NCR gm@ncr.railnet.gov.in Allahabad drm@ald.railnet.gov +.in Jhansi drm@jhs.railnet.gov.in Agra drm@agc.railnet.gov.in NER gm@ner.railnet.gov.in Izzatnagar drmizn@ner.railnet.gov.i +n Lucknow drmljn@ner.railnet.gov.in Varanasi drmbsb@ner.railnet.gov.in NFR gm@nfr.railnet.gov.in Katihar drmkir@nfr.railnet.gov.in Alipurduar drmapdj@nfr.railnet.gov.in Tinsukhia drmtsk@nfr.railnet.gov.in Lumding drmlmg@nfr.railnet.gov.in Rangia drmrny@nfr.railnet.gov.in NWR gm@nwr.railnet.gov.in Jaipur drmjp@nwr.railnet.gov.in Ajmer drmaii@nwr.railnet.gov.in Bikaner drmbkn@nwr.railnet.gov.in Jodhpur drmju@nwr.railnet.gov.in SR gm@sr.railnet.gov.in Chennai drmmas@sr.railnet.gov.in Madurai drmmdu@sr.railnet.gov.in Salem drmsa@sr.railnet.gov.in Palghat drmpgt@sr.railnet.gov.in Tiruchirapalli drmtpj@sr.railnet.gov.in Trivandrum drmtvc@sr.railnet.gov.in SCR gm@scr.railnet.gov.in Secundrabad drmsc@scr.railnet.gov.i +n Hyderabad drmshyb@scr.railnet.gov.in Guntkal drmgtl@scr.railnet.gov.in Guntur drmgnt@scr.railnet.gov.in Nanded drmned@scr.railnet.gov.in Vijayawada drmbza@scr.railnet.gov.in

      The expected results were:

      drm@agc.railnet.gov.in;drm@ald.railnet.gov.in;drm@bb.railnet.gov.in;dr +m@bsl.railnet.gov.in;drm@dli.railnet.gov.in;drm@fzr.railnet.gov.in;dr +m@jhs.railnet.gov.in;drm@lko.railnet.gov.in;drm@mb.railnet.gov.in;drm +@ngp.railnet.gov.in;drm@pa.railnet.gov.in;drm@sur.railnet.gov.in;drm@ +umb.railnet.gov.in;drmaii@nwr.railnet.gov.in;drmapdj@nfr.railnet.gov. +in;drmasn@er.railnet.gov.in;drmbkn@nwr.railnet.gov.in;drmbsb@ner.rail +net.gov.in;drmbza@scr.railnet.gov.in;drmdhn@ecr.railnet.gov.in;drmdnr +@ecr.railnet.gov.in;drmgnt@scr.railnet.gov.in;drmgtl@scr.railnet.gov. +in;drmhwh@er.railnet.gov.in;drmizn@ner.railnet.gov.in;drmjp@nwr.railn +et.gov.in;drmju@nwr.railnet.gov.in;drmkir@nfr.railnet.gov.in;drmkur@e +astcoastrailway.gov.in;drmljn@ner.railnet.gov.in;drmlmg@nfr.railnet.g +ov.in;drmmas@sr.railnet.gov.in;drmmdu@sr.railnet.gov.in;drmmgs@ecr.ra +ilnet.gov.in;drmmldt@er.railnet.gov.in;drmned@scr.railnet.gov.in;drmp +gt@sr.railnet.gov.in;drmrny@nfr.railnet.gov.in;drmsa@sr.railnet.gov.i +n;drmsbp@eastcoastrailway.gov.in;drmsc@scr.railnet.gov.in;drmsdah@er. +railnet.gov.in;drmsee@ecr.railnet.gov.in;drmshyb@scr.railnet.gov.in;d +rmspj@ecr.railnet.gov.in;drmtpj@sr.railnet.gov.in;drmtsk@nfr.railnet. +gov.in;drmtvc@sr.railnet.gov.in;drmwat@eastcoastrailway.gov.in;gm@cr. +railnet.gov.in;gm@eastcoastrailway.gov.in;gm@ecr.railnet.gov.in;gm@er +.railnet.gov.in;gm@ncr.railnet.gov.in;gm@ner.railnet.gov.in;gm@nfr.ra +ilnet.gov.in;gm@nr.railnet.gov.in;gm@nwr.railnet.gov.in;gm@scr.railne +t.gov.in;gm@sr.railnet.gov.in

      My results in a shell are identical to those results. The missing address is the 'drmbza@scr.railnet.gov.in' in their minds.

        It seems the behaviour depends on whether the file ends in a newline or not, not on the Perl version. That's because the final [^\w\.] has nothing to match if there's no newline.

        If you are curious, here's my solution from three years ago:

        ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,