Best approach to creating a regex from a filehandle

fieroboom has asked for the wisdom of the Perl Monks concerning the following question:

Hello wonderful monks, I've gleaned LOADS of great information here on my journey of Perl, but I have a question of my own now... I have a script that (among many other things) reads in a file with a list of words, then creates a regex string from those words. Here is the code I have now, which is working fine, but I wonder if there's a more direct way to do this, rather then filehandle -> array -> join & map array to scalar...

my $blacklist_file = 'BlacklistWords.txt';
open(BLIST, "<$blacklist_file") or die("Can't open $blacklist_file for
+ reading!!\n\n");
my @blacklist_words = <BLIST>;
close(BLIST);
chomp(@blacklist_words);
my $blacklist_regex = join"|" => map {"(?:$_)"} @blacklist_words; # Cr
+eate a regex from blacklisted words
print "blacklist regex:\n$blacklist_regex\n\n"; exit;
[download]

Here is an example of the regex string I'm after:

blacklist regex:
(?:LOL)|(?:XviD-RUBY)|(?:WEB-DL)|(?:H264)|(?:BluRay)|(?:x264)|(?:YIFY)
+|(?:DVDRip)|(?:MP3)|(?:ENG)|(?:DvDripaXXo)|(?:BRRiP)|(?:XviD)|(?:AbSu
+rdiTy)|(?:WEBRip)|(?:XviDETRG)|(?:XviD-ILLUMINATI)|(?:XviDExtraTorren
+tRG)|(?:AC3-3LT0N)|(?:XViD-PLAYNOW)|(?:XVIDSSB)|(?:XViD-SSB)|(?:BDRip
+)|(?:XviD-3LT0N)|(?:KillerRG)|(?:XviD-AMIABLE)|(?:x264-AVS720)|(?:Xvi
+D-NEUTRINO)|(?:3Li)|(?:DTS)|(?:x2643Li)|(?:GAZ)|(?:XviD-AWESOMENESS)|
+(?:XviDSCREAM)|(?:UnKnOwN)|(?:DVDRip_XviD)|(?:AZnTX)|(?:HDTV)|(?:x264
+LOL)|(?:ettv)|(?:R5)|(?:x264-LOL)|(?:PROPER)|(?:x264-2HD)|(?:XviD-AFG
+)|(?:x264-mSD)|(?:P2PDL)|(?:x264-DHD)|(?:PublicHD)|(?:x264-MiNDTHEGAP
+)|(?:hdtv-lol)|(?:xvid-xor)|(?:psychodrama)|(?:hdtv_xvid-fov)|(?:repa
+ck-lol)|(?:rerip)|(?:xvid-ctu)|(?:Lo-Fi)|(?:X264-DIMENSION)|(?:_evid)
+|(?:TorrentDay)|(?:XviD-MOMENTUM)
[download]

Basically just a list of non-capturing groups. Of course, I suppose I could make it a single non-capturing group for a little more efficiency, but that's another subject... Anyway, the question is, am I doing this the most PERLitically correct way, or is there a better way to go from <BLIST> to $blacklist_regex? Thanks so much!

Comment on Best approach to creating a regex from a filehandle Select or Download Code

Replies are listed 'Best First'.
Re: Best approach to creating a regex from a filehandle by choroba (Cardinal) on May 18, 2014 at 19:15 UTC
You can probably avoid map by using `'(?:' . join(')\|(?:', @blacklist_words) . ')'` [download] But, if the "words" can contain non-alphabetical characters with special meaning in regexes, you might need to map quotemeta to each word. لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l]
Re: Best approach to creating a regex from a filehandle by NetWallah (Canon) on May 18, 2014 at 21:06 UTC
Re interpreting " is there a better way ..." If all you are checking for is absence of a word in a black list, i'd suggest putting the black-listed words into a hash, and simply checking : `if ( exists $Black_List{$candidate_word} ){ # complain, bail, or whatever ... }` [download] You could upper/lower case the candidate word to maintain canonality. What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against? -Larry Wall, 1992	[reply] [d/l]
Re: Best approach to creating a regex from a filehandle by toolic (Bishop) on May 18, 2014 at 19:15 UTC
There is no need for the array: `my $blacklist_regex = join "\|" => map {"(?:$_)"} map { chomp; $_ } <BL +IST>;` [download]	[reply] [d/l]
Re^2: Best approach to creating a regex from a filehandle by smls (Friar) on May 18, 2014 at 20:26 UTC
Unless you're sure that the input file will only contain safe characters, you should call quotemeta on `$_` inside the `map`. Also, what's the benefit of chaining two `map`'s like that, instead of combining them into one? Edit: Oops, I only just now noticed that choroba already mentioned quotemeta in his answer. Sorry for the redundancy.	[reply] [d/l] [select]
Re^3: Best approach to creating a regex from a filehandle by toolic (Bishop) on May 18, 2014 at 22:15 UTC
You're right... it can be simplified using a single map `my $blacklist_regex = join "\|" => map { chomp; "(?:$_)"} <BLIST>;` [download]	[reply] [d/l]
Re: Best approach to creating a regex from a filehandle by fieroboom (Novice) on May 21, 2014 at 12:12 UTC
Perfect, I knew there was a simpler way! Toolic, I really like your second example; elegant, but still readable (at least in my mind, anyway). Thanks so much guys! EDIT: By the way, this is my first post here; if it's necessary for me to somehow mark this as "solved", I'll be happy to do so.	[reply]