Here is a function, globtore (Glob to Regular Expression), that takes the "wildcard" syntax used by the 4DOS command shell and converts it to a Perl regular expression string.

Basically, * matches anything and ? matches one character. But there are a few special cases, and 4DOS has more, such as ranges and OR's. I figure for anything that's not really basic I'll use a RegExp anyway, so I don't want to make too fancy of a globber. But the docs for 4DOS was fairly complete, so I used that as my design spec. The test data below includes the examples from the 4DOS help page.

The last_paren_match_ordinal function is another story.

Edit: chipmunk 2001-06-27

use strict; use warnings; sub last_paren_match_ordinal() { my $n= $#-; return $n; # if the above stops working, comment out the return and let the foll +owing code # run. It seems that @- is populated only up to the actual number of + captures, # while @+ always contains the max number present in the pattern. Th +is is not # documented, and may be an artifact. while ($n) { # take $n initially as a maximum, and find the highest that's actu +ally present. no strict 'refs'; last if defined $$n; --$n; } return $n; } sub globtore ($) # initial version, 27-June-2001 by JMD { my $s= shift; my $f= sub { my $n= last_paren_match_ordinal; return "($n)"; }; my @f= ( undef, # index 0 never used, uses 1..6 sub { # brackets my $s= $+; return '.' if $s eq '[?]'; # special case meaning. return '(?=\.|;|\z)' if $s eq '[]' || $s eq '[!?]'; # also sp +ecial: match "no character present". # shell uses !, change to ^. Don't worry about anything else. + Could deal with original ^ in that spot. $s =~ s/^\[!/[^/; return $s; }, sub { # ?'s before dot or end my $count= length $+; return '.?' if $count == 1; return ".{0,$count}"; }, sub { # ?'s normal case my $count= length $+; return '.' if $count == 1; return ".{$count}"; }, sub { # *'s return '.*' }, sub { # ; return '$|^'; }, sub { # other funny chars that aren't otherwise matched. my $s= $+; return "\\$s"; } ); # smash out consecutive stars $s =~ s/\*+/*/g; # do the main replacements $s =~ s/ (\[.*?\]) | # brackets in $1 (\?+(?=\.|;|\z)) | # ?'s before dot or end in $2 (\?+) | # ?'s normally in $3 (\*+) | # *'s in $4 (;) | # ; seperator in $5 ([\$\^\\\.\{\}\[\]]) # special otherwise unmatched chars in $6 /$f[last_paren_match_ordinal]->() # see above functions for massi +ve replacement block. /gex; # fix up begin/end marks $s= "^$s\$"; $s =~ s/\Q^.*\E|\Q.*\E\$//; return $s; } while (<DATA>) { chomp; s/\s+#.*$//; # allow comments in input data print "$_ ==> "; my $result= globtore ($_); print "$result\n"; } __DATA__ LETTER?.DOC # comments allowed in test data, separated by at least on +e space. funny#file#name.txt # the prev # are not comments because no space. *.DO? file{braces}.^^x *.DO[?] foobar.exe xxy *.exe l?tter?.d?? *.* st*.d* *am*.txt letter[0-9].doc ?[aeiouy]*.* [a-dt-v]ip letter[?].doc letter[].doc letter[!?].doc test[!0-9].doc # anything except digits. ??[abc]*[def]*.[pq]* letter1;v2 letter1[;]v2

Replies are listed 'Best First'.
(tye)Re: Convert glob notation to regular expression
by tye (Sage) on Jun 28, 2001 at 11:03 UTC

    See also my File::KGlob from years ago which includes such functionality. (:

            - tye (but my friends call me "Tye")
My specifications
by John M. Dlugosz (Monsignor) on Jun 28, 2001 at 02:54 UTC
    From the JP Software documentation for 4NT 3.02, text by Hardin Brothers, Tom Rawson, and Rex Conn,
    Wildcards Wildcards let you specify a file or group of files by typing a partial + filename. The appropriate directory is scanned to find all of the f +iles that match the partial name you have specified. Wildcards are usually used to specify which files should be processed +by a command. If you need to specify which files should not be proce +ssed see File Exclusion Ranges (for internal commands), or EXCEPT (fo +r external commands). Most internal commands accept filenames with wildcards anywhere that a + full filename can be used. There are two wildcard characters, the a +sterisk [*] and the question mark [?], plus a special method of speci +fying a range of permissible characters. An asterisk [*] in a filename means "any zero or more characters in th +is position." For example, this command will display a list of all f +iles in the current directory: [c:\] dir *.* If you want to see all of the files with a .TXT extension, you could t +ype this: [c:\] dir *.txt If you know that the file you are looking for has a base name that beg +ins with ST and an extension that begins with .D , you can find it this way. Filenames such as STATE.DAT, STEVEN.DOC, +and ST.D will all be displayed: [c:\] dir st*.d* With 4NT, you can also use the asterisk to match filenames with specif +ic letters somewhere inside the name. The following example will dis +play any file with a .TXT extension that has the letters AM together +anywhere inside its base name. It will, for example, display AMPLE.T +XT, STAMP.TXT, CLAM.TXT, and AM.TXT: [c:\] dir *am*.txt A question mark [?] matches any single filename character. You can pu +t the question mark anywhere in a filename and use as many question m +arks as you need. The following example will display files with name +s like LETTER.DOC and LATTER.DAT, and LITTER.DU: [c:\] dir l?tter.d?? The use of an asterisk wildcard before other characters, and of the ch +aracter ranges discussed below, are enhancements to the standard wild +card syntax, and may not work properly with software other than 4DOS, + 4OS2, 4NT, and Take Command. "Extra" question marks in your wildcard specification are ignored if t +he file name is shorter than the wildcard specification. For example +, if you have files called LETTER.DOC, LETTER1.DOC, and LETTERA.DOC, +this command will display all three names: [c:\] dir letter?.doc The file LETTER.DOC is included in the display because the "extra" que +stion mark at the end of "LETTER? " is ignored when matching the shor +ter name LETTER. In some cases, the question mark wildcard may be too general. You can + also specify what characters you want to accept (or exclude) in a pa +rticular position in the filename by using square brackets. Inside th +e brackets, you can put the individual acceptable characters or range +s of characters. For example, if you wanted to match LETTER0.DOC thr +ough LETTER9.DOC, you could use this command: [c:\] dir letter[0-9].doc You could find all files that have a vowel as the second letter in the +ir name this way. This example also demonstrates how to mix the wild +card characters: [c:\] dir ?[aeiouy]*.* You can exclude a group of characters or a range of characters by usin +g an exclamation mark [!] as the first character inside the brackets. + This example displays all filenames that are at least 2 characters +long except those which have a vowel as the second letter in their na +mes: [c:\] dir ?[!aeiouy]*.* The next example, which selects files such as AIP, BIP, and TIP but no +t NIP, demonstrates how you can use multiple ranges inside the bracke +ts. It will accept a file that begins with an A, B, C, D, T, U, or V +: [c:\] dir [a-dt-v]ip You may use a question mark character inside the brackets, but its mea +ning is slightly different than a normal (unbracketed) question mark +wildcard. A normal question mark wildcard matches any character, but + will be ignored when matching a name shorter than the wildcard speci +fication, as described above. A question mark inside brackets will m +atch any character, but will not be discarded when matching shorter f +ilenames. For example: [c:\] dir letter[?].doc will display LETTER1.DOC and LETTERA.DOC, but not LETTER.DOC. A pair of brackets with no characters between them [], or an exclamati +on point and question mark together [!?],will match only if there is +no character in that position. For example, [c:\] dir letter[].doc will not display LETTER1.DOC or LETTERA.DOC, but will display LETTER.D +OC. This is most useful for commands like [c:\] dir /I"[]" *.btm which will display a list of all .BTM files which don't have a descrip +tion, because the empty brackets match only an empty description stri +ng (DIR /I selects files to display based on their descriptions). You can repeat any of the wildcard characters in any combination you d +esire within a single file name. For example, the following command +lists all files which have an A, B, or C as the third character, foll +owed by zero or more additional characters, followed by a D, E, or F, + followed optionally by some additional characters, and with an exten +sion beginning with P or Q. You probably won't need to do anything t +his complex, but we've included it to show you the flexibility of ext +ended wildcards: [c:\] dir ??[abc]*[def]*.[pq]* You can also use the square bracket wildcard syntax to work around a c +onflict between long filenames containing semicolons [;], and the use + of a semicolon to indicate an include list. For example, if you have + a file named C:\DATA\LETTER1;V2 and you enter this command: [c:\] del \data\letter1;v2 you will not get the results you expect. Instead of deleting the name +d file, 4NT will attempt to delete LETTER1 and then V2, because the s +emicolon indicates an include list. However if you use square bracke +ts around the semicolon it will be interpreted as a filename characte +r, and not as an include list separator. For example, this command w +ould delete the file named above: [c:\] del \data\letter1[;]v2 Extra caution should be taken using wildcards on long file names becau +se operations using wildcards will be performed on both long and shor +t filenames. See LFN File Searches for additional details.