I'm trying to create a robust way of detecting which season and episode a TV show is from.

Common patterns are things like "S5E6", "S5.E6", or "S05xE06" but there's no real consistency.

Also I want to fall back to detecting a date for shows like The Daily Show, which don't really have seasons/episodes.

So here's my first attempt, please comment. Also if you're interested, please test on bulk data from TV torrent websites, which is what I'm doing right now.

One straightforward question: what's the best-practice way to remove leading zeros? Convert to a number with something like $foo = ($foo + 0)? Strip the zeros as characters? sprintf?

#!/usr/local/bin/perl use strict; use warnings; use Data::Dumper::Simple; use Regexp::Common qw(time); my ( $count, $success, $failure ) = ( 0, 0, 0 ); while (<DATA>) { chomp; $count++; my $data = extract_show_info($_); if ($data) { print "Success? \n"; print Dumper($data); $success++; } else { print "Failure: $_\n"; $failure++; } } print "Processed: $count | Successes: $success | Failures: $failure \n +"; sub extract_show_info { my $input_string = shift(); my $result = undef; if ( $result = extract_episode_data($input_string) ) { $result->{type} = 'se'; } elsif ( my @date = $_ =~ /$RE{time}{ymd}{-keep}/ ) { $result = { type => 'date', year => $date[1], month => $date[2], day => $date[3] }; } return $result; } sub extract_episode_data { my $input_string = shift(); if ( $input_string =~ /s(\d+)\s*e(\d+)/i || $input_string =~ /s(\d+)\.e(\d+)/i || $input_string =~ /(\d+)x(\d+)/i || $input_string =~ /Season\s*(\d+),?\s*Episode\s*(\d+)/i || $input_string =~ /Series\.(\d+)\.(\d+)/ ) { my $episode_data = { season => $1, episode => $2 }; return $episode_data; } else { return; } } __DATA__ The.Walking.Dead.S01E03.FRENCH.LD.BDRip.XviD-JMT.avi 348.55 Mb Gogglebox.AU.s01e08.PDTV.x264.Hector.mp4 266.46 Mb Power S03E01 HDTV x264-FS.mp4 285.38 Mb Wentworth.s03e04.HDTV.x264.Hector.mp4 226.32 Mb Suits.S06E03.HDTV.x264-FUM[eztv].mp4 222.37 Mb Killjoys.S02E07.HDTV.x264-FUM[eztv].mp4 255.05 Mb Superfoods.The.Real.Story.Series.2.4of8.Seaweed.720p.HDTV.x264.AACmp4[ +eztv].mp4 439.43 Mb Keeping.Up.With.The.Kardashians.S12E01.Out.With.The.Old.In.With.The.Ne +w.HDTV-MEGATV.mp4 445.27 Mb Keeping.Up.With.The.Kardashians.S12E04.All.About.Meme.HDTV-MEGATV.mp4 +416.17 Mb Are You the One S04E08 HDTV x264-Nada.mp4 476.85 Mb Superfoods.The.Real.Story.Series.2.8of8.Avocados.720p.HDTV.x264.AAC.MV +Group.org.mp4 430.01 Mb Kingdom 2014 S02E20 No Sharp Objects HDTV x264-TTL.mp4 457.65 Mb The.Big.Bang.Theory.S09E19.HDTV.x264-LOL[eztv].mp4 144.79 Mb Superfoods.The.Real.Story.Series.2.4of8.Seaweed.720p.HDTV.x264.AAC.MVG +roup.org.mp4 439.43 Mb [www.Cpasbien.pe] Vikings.S02E07.FRENCH.HDTV.x264-DEAL.mp4 309.86 Mb BBC.Inside.Einsteins.Mind.1080p.HDTV.x265.AAC.MVGroup.Forum.mp4 728.65 + Mb

In reply to Please review this: code to extract the season/episode or date from a TV show's title on a torrent site by Cody Fendant

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.