in reply to Reliable way to detect base64 encoded strings

Once whitespace is removed from the input, the following regex pattern will tell you whether the input is a valid base64-encoded string.
m{ ^ (?: [A-Za-z0-9+/]{4} )* (?: [A-Za-z0-9+/]{2} [AEIMQUYcgkosw048] = | [A-Za-z0-9+/] [AQgw] == )? \z }x

As to whether the input was the result of bas64-encoding or not, one can't tell. This sentence would be found valid up to the final period.

$ perl -MMIME::Base64 -le'print decode_base64("This sentence would be +found valid up to the final period")' | od -t x1 0000000 4e 18 ac b1 e9 ed 7a 77 1e c2 8b a5 75 b7 9f a2 0000020 e9 dd bd a9 62 76 ea 6d a2 d8 5e 7e 29 da 96 97 0000040 ab 8a 87 0a 0000044

Usually, one refers to the header associated with the content.

Update: Replaced /(?:|...|...)/ with /(?:...|...)?/
Update: Changed // to m{} to fix an unescaped / (as brought up in a reply).

Replies are listed 'Best First'.
Re^2: Reliable way to detect base64 encoded strings
by Rodster001 (Pilgrim) on Jun 29, 2009 at 22:02 UTC
    This works nicely. I don't have a header to work with so this good. There are a few things I don't quite understand in that regex though, would you mind commenting each line so I can get my head around it? Thanks a lot!
      It's actually really straightforward.
      • Start of input
      • Followed by any number of groups of 4 characters from [A-Za-z0-9+/],
      • Followed by one of the following:
        • [always matches]
        • Four characters where
          • The first and second match /[A-Za-z0-9+/]/
          • The third matches /[AEIMQUYcgkosw048]/
          • The fourth is a "="
        • Four characters where
          • The first matches /[A-Za-z0-9+/]/
          • The second matches /[AQgw]/
          • The third and fourth are both a "="
      • Followed by the end of input

      It's probably a bit simpler after the update I just did for you:

      • Start of input
      • Followed by any number of groups of 4 characters from [A-Za-z0-9+/],
      • Followed by zero or one of the following:
        • Four characters where
          • The first and second match /[A-Za-z0-9+/]/
          • The third matches /[AEIMQUYcgkosw048]/
          • The fourth is a "="
        • Four characters where
          • The first matches /[A-Za-z0-9+/]/
          • The second matches /[AQgw]/
          • The third and fourth are both a "="
      • Followed by the end of input
        Your explanation makes it very clear... thanks again!

        Regex are not my favorite part of Perl....even if they are powerful. I'm trying to use your way to detect if a string is base64...so I was taking what you had and putting an if around it. I'm sure I'm just not getting it....could you give me some pointers?

        if($string_whole =~ m / ^ (?: [A-Za-z0-9+/]{4} )* (?: [A-Za-z0-9+/]{2} [AEIMQUYcgkosw048] = | [A-Za-z0-9+/] [AQgw] == )? \z /x ) $&Log ("it found base64-$i");
        Thanks in advance for your expertise.