Hello wonderful monks, I've gleaned LOADS of great information here on my journey of Perl, but I have a question of my own now... I have a script that (among many other things) reads in a file with a list of words, then creates a regex string from those words. Here is the code I have now, which is working fine, but I wonder if there's a more direct way to do this, rather then filehandle -> array -> join & map array to scalar...

my $blacklist_file = 'BlacklistWords.txt'; open(BLIST, "<$blacklist_file") or die("Can't open $blacklist_file for + reading!!\n\n"); my @blacklist_words = <BLIST>; close(BLIST); chomp(@blacklist_words); my $blacklist_regex = join"|" => map {"(?:$_)"} @blacklist_words; # Cr +eate a regex from blacklisted words print "blacklist regex:\n$blacklist_regex\n\n"; exit;

Here is an example of the regex string I'm after:

blacklist regex: (?:LOL)|(?:XviD-RUBY)|(?:WEB-DL)|(?:H264)|(?:BluRay)|(?:x264)|(?:YIFY) +|(?:DVDRip)|(?:MP3)|(?:ENG)|(?:DvDripaXXo)|(?:BRRiP)|(?:XviD)|(?:AbSu +rdiTy)|(?:WEBRip)|(?:XviDETRG)|(?:XviD-ILLUMINATI)|(?:XviDExtraTorren +tRG)|(?:AC3-3LT0N)|(?:XViD-PLAYNOW)|(?:XVIDSSB)|(?:XViD-SSB)|(?:BDRip +)|(?:XviD-3LT0N)|(?:KillerRG)|(?:XviD-AMIABLE)|(?:x264-AVS720)|(?:Xvi +D-NEUTRINO)|(?:3Li)|(?:DTS)|(?:x2643Li)|(?:GAZ)|(?:XviD-AWESOMENESS)| +(?:XviDSCREAM)|(?:UnKnOwN)|(?:DVDRip_XviD)|(?:AZnTX)|(?:HDTV)|(?:x264 +LOL)|(?:ettv)|(?:R5)|(?:x264-LOL)|(?:PROPER)|(?:x264-2HD)|(?:XviD-AFG +)|(?:x264-mSD)|(?:P2PDL)|(?:x264-DHD)|(?:PublicHD)|(?:x264-MiNDTHEGAP +)|(?:hdtv-lol)|(?:xvid-xor)|(?:psychodrama)|(?:hdtv_xvid-fov)|(?:repa +ck-lol)|(?:rerip)|(?:xvid-ctu)|(?:Lo-Fi)|(?:X264-DIMENSION)|(?:_evid) +|(?:TorrentDay)|(?:XviD-MOMENTUM)

Basically just a list of non-capturing groups. Of course, I suppose I could make it a single non-capturing group for a little more efficiency, but that's another subject... Anyway, the question is, am I doing this the most PERLitically correct way, or is there a better way to go from <BLIST> to $blacklist_regex? Thanks so much!


In reply to Best approach to creating a regex from a filehandle by fieroboom

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.