in reply to Grabbing numbers from a URL

Usually in this situation, anchor the regex to the end of the string and report however many digits are before .htm or html. Unless there is some specific reason to disallow 1,2,3 digit numbers, I wouldn't code it that way. Go with something simple that works for 1,2,3,4,5,6,7,8 or more digits.
#!/usr/bin/perl use strict; use warnings; foreach my $url ("abgc100.html", "xyz1000.htMl", "qwer10.html", "abc123.htm", "qrz12345.htm", "something-12341234.html") { my ($number) = $url =~ /(\d+)\.htm(l)?$/i; print "$number\n"; } __END__ 100 1000 10 123 12345 12341234
You have to decide about something-12341234.html. The code above captures 12341234 which is usually what is desired. Do you really only want to have 41234, 5 digits in that situation? I suspect not.

Replies are listed 'Best First'.
Re^2: Grabbing numbers from a URL
by BillKSmith (Monsignor) on Jul 09, 2017 at 13:20 UTC
    Your final comment brings up a bigger problem with the specification. If the current four-digit format allows more than four digits (only the last four of which are used), without more information, it will be impossible to distinguish between this case and the new five-digit numbers. All solutions so far assume that if there are at least five digits, all five are part of the number. I like your solution because it extends this assumption to any number of digits.
    Bill
      I was immediately thinking that this "number" is intended to be a "unique id number". I suppose that could be a wrong assumption, but in my experience if there is some number like 12389799 and somehow those last 5 digits are "special", then the name should be "....123_89799.htm". My algorithm will work with that. If I have any influence over the file naming convention, I will put an "_" in there to separate the fields. In my opinion, stuff like a fixed 5 digit deal, maybe 00123.html is a very bad idea. Often these software things grow and maybe at some point a 6th digit is needed? Then what? In general, I like the idea of having the basic parsing being one thing and if needed the validation of those fields being a separate thing. If two digits aren't allowed, then if ($number < 100) {}. "....123_89799" and "....123_897" should be valid if the naming convention guy has his eye on the ball. These details DO matter.
Re^2: Grabbing numbers from a URL
by htmanning (Friar) on Jul 09, 2017 at 06:43 UTC
    Very slick! Thank you. Some of this is above my head.