comment on

Hello wonderful monks, I've gleaned LOADS of great information here on my journey of Perl, but I have a question of my own now... I have a script that (among many other things) reads in a file with a list of words, then creates a regex string from those words. Here is the code I have now, which is working fine, but I wonder if there's a more direct way to do this, rather then filehandle -> array -> join & map array to scalar...

my $blacklist_file = 'BlacklistWords.txt';
open(BLIST, "<$blacklist_file") or die("Can't open $blacklist_file for
+ reading!!\n\n");
my @blacklist_words = <BLIST>;
close(BLIST);
chomp(@blacklist_words);
my $blacklist_regex = join"|" => map {"(?:$_)"} @blacklist_words; # Cr
+eate a regex from blacklisted words
print "blacklist regex:\n$blacklist_regex\n\n"; exit;
[download]

Here is an example of the regex string I'm after:

blacklist regex:
(?:LOL)|(?:XviD-RUBY)|(?:WEB-DL)|(?:H264)|(?:BluRay)|(?:x264)|(?:YIFY)
+|(?:DVDRip)|(?:MP3)|(?:ENG)|(?:DvDripaXXo)|(?:BRRiP)|(?:XviD)|(?:AbSu
+rdiTy)|(?:WEBRip)|(?:XviDETRG)|(?:XviD-ILLUMINATI)|(?:XviDExtraTorren
+tRG)|(?:AC3-3LT0N)|(?:XViD-PLAYNOW)|(?:XVIDSSB)|(?:XViD-SSB)|(?:BDRip
+)|(?:XviD-3LT0N)|(?:KillerRG)|(?:XviD-AMIABLE)|(?:x264-AVS720)|(?:Xvi
+D-NEUTRINO)|(?:3Li)|(?:DTS)|(?:x2643Li)|(?:GAZ)|(?:XviD-AWESOMENESS)|
+(?:XviDSCREAM)|(?:UnKnOwN)|(?:DVDRip_XviD)|(?:AZnTX)|(?:HDTV)|(?:x264
+LOL)|(?:ettv)|(?:R5)|(?:x264-LOL)|(?:PROPER)|(?:x264-2HD)|(?:XviD-AFG
+)|(?:x264-mSD)|(?:P2PDL)|(?:x264-DHD)|(?:PublicHD)|(?:x264-MiNDTHEGAP
+)|(?:hdtv-lol)|(?:xvid-xor)|(?:psychodrama)|(?:hdtv_xvid-fov)|(?:repa
+ck-lol)|(?:rerip)|(?:xvid-ctu)|(?:Lo-Fi)|(?:X264-DIMENSION)|(?:_evid)
+|(?:TorrentDay)|(?:XviD-MOMENTUM)
[download]

Basically just a list of non-capturing groups. Of course, I suppose I could make it a single non-capturing group for a little more efficiency, but that's another subject... Anyway, the question is, am I doing this the most PERLitically correct way, or is there a better way to go from <BLIST> to $blacklist_regex? Thanks so much!

In reply to Best approach to creating a regex from a filehandle by fieroboom

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.