Is there a good way to check if a string is encoded in base64 using Python?
11 Answers
I was looking for a solution to the same problem, then a very simple one just struck me in the head. All you need to do is decode, then re-encode. If the re-encoded string is equal to the encoded string, then it is base64 encoded.
Here is the code:
import base64
def isBase64(s):
try:
return base64.b64encode(base64.b64decode(s)) == s
except Exception:
return False
That's it!
Edit: Here's a version of the function that works with both the string and bytes objects in Python 3:
import base64
def isBase64(sb):
try:
if isinstance(sb, str):
# If there's any unicode here, an exception will be thrown and the function will return false
sb_bytes = bytes(sb, 'ascii')
elif isinstance(sb, bytes):
sb_bytes = sb
else:
raise ValueError("Argument must be string or bytes")
return base64.b64encode(base64.b64decode(sb_bytes)) == sb_bytes
except Exception:
return False
- 375
- 5
- 23
- 1,491
- 1
- 14
- 21
-
Nice and simple, I like it! – trukvl Sep 01 '17 at 14:39
-
If you like lambda: `isBase64 = lambda x: x.decode('base64').encode('base64').replace('\n','') == x` Note that this code will sometimes throw an incorrect padding exception. – id01 Oct 06 '17 at 16:05
-
3Side note: you can always just `return base64.b64encode(base64.b64decode(s)) == s` instead of using an if statement and returning a constant bool result :) – d0nut Nov 15 '17 at 19:49
-
True, but then you'll have to handle the binascii exceptions yourself outside the function as well. – id01 Nov 16 '17 at 00:17
-
6`isBase64('test')` return True – ahmed Feb 14 '18 at 18:17
-
2@ahmed that's because "test" is a valid base64 string. Base64 includes a-z, A-Z, 0-9, +, /, and = for padding. – id01 Feb 14 '18 at 21:48
-
Ah, d0nut, I think I get what you mean. Editing. – id01 Aug 03 '18 at 06:35
-
5on Python3 since `str` and `bytes` comparison doesn't covert them to same type implicitly(for the comparison) I had to do `return base64.b64encode(base64.b64decode(s)).decode() == s` for this to work. As my `s` was a unicode `str` while the value returned from `base64.b64encode(base64.b64decode(s))` was `bytes`. See this: https://stackoverflow.com/q/30580386/1781024 – Vikas Prasad Nov 16 '18 at 05:59
-
Vikas Prasad, thanks! I just added a version of the function for Python 3 that works on both `str` and `bytes`. – id01 Nov 16 '18 at 23:53
import base64
import binascii
try:
base64.decodestring("foo")
except binascii.Error:
print "no correct base64"
-
thank you, but I was wondering if it exists a function to test this instead of putting a try – lizzie Sep 07 '12 at 09:59
-
1I don't find any in [the documentation](http://docs.python.org/library/base64.html?highlight=base64#base64). – Sep 07 '12 at 10:05
-
3"easier to ask for forgiveness than permission", although I'd probably favour catching the actual exception that's likely to be raised (which I think will be binascii.Error) – LexyStardust Sep 07 '12 at 12:23
-
23This is incorrect, `base64.decodestring('čččč')` returns an empty string and no exception but I dont't think the string čččč is valid base64 – Roman Plášil Jan 21 '14 at 08:48
-
2base64.decodestring("dfdsfsdf ds fk") doesn't raise TypeError neither, the string doesn't seem to be a base64 string – erny Feb 22 '17 at 11:00
-
1As of 2018, Python 3.7, I use `base64.b64decode(item)` instead, and it works. – Polv Jul 16 '18 at 01:01
-
base64.decodestring("dfdsfsdf ds fk") is actually a base64 string. The base64 method ignores whitespace and what you are left with is a valid base64 string. – plaisthos Aug 07 '19 at 12:43
-
6`base64.b64decode(s, validate=true)` will decode `s` if it is valid, and otherwise throw an exception. `base64.decodestring` is very permissive and will strip any non-base64 characters which is potentially problematic. – Julian Nov 13 '19 at 23:04
This isn't possible. The best you could do would be to verify that a string might be valid Base 64, although many strings consisting of only ASCII text can be decoded as if they were Base 64.
- 87,717
- 12
- 108
- 131
-
2
-
3@coler-j yes, it is technically correct. It also probably should have been a comment but in 2012 SO was different. Maybe. – Wooble Dec 05 '18 at 22:13
The solution I used is based on one of the prior answers, but uses more up to date calls.
In my code, the my_image_string is either the image data itself in raw form or it's a base64 string. If the decode fails, then I assume it's raw data.
Note the validate=True keyword argument to b64decode. This is required in order for the assert to be generated by the decoder. Without it there will be no complaints about an illegal string.
import base64, binascii
try:
image_data = base64.b64decode(my_image_string, validate=True)
except binascii.Error:
image_data = my_image_string
- 5,312
- 21
- 39
Using Python RegEx
import re
txt = "VGhpcyBpcyBlbmNvZGVkIHRleHQ="
x = re.search("^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)?$", txt)
if (x):
print("Encoded")
else:
print("Non encoded")
Before trying to decode, I like to do a formatting check first as its the lightest weight check and does not return false positives thus following fail-fast coding principles.
Here is a utility function for this task:
RE_BASE64 = "^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)?$"
def likeBase64(s:str) -> bool:
return False if s is None or not re.search(RE_BASE64, s) else True
- 3,739
- 1
- 35
- 47
if the length of the encoded string is the times of 4, it can be decoded
base64.encodestring("whatever you say").strip().__len__() % 4 == 0
so, you just need to check if the string can match something like above, then it won't throw any exception(I Guess =.=)
if len(the_base64string.strip()) % 4 == 0:
# then you can just decode it anyway
base64.decodestring(the_base64string)
- 21
- 1
-
This does not work for strings with \n in them that are still valid base64 – plaisthos Aug 07 '19 at 12:47
@geoffspear is correct in that this is not 100% possible but you can get pretty close by checking the string header to see if it matches that of a base64 encoded string (re: How to check whether a string is base64 encoded or not).
# check if a string is base64 encoded.
def isBase64Encoded(s):
pattern = re.compile("^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$")
if not s or len(s) < 1:
return False
else:
return pattern.match(s)
Also not that in my case I wanted to return false if the string is empty to avoid decoding as there's no use in decoding nothing.
- 127
- 4
I know I'm almost 8 years late but you can use a regex expression thus you can verify if a given input is BASE64.
import re
encoding_type = 'Encoding type: '
base64_encoding = 'Base64'
def is_base64():
element = input("Enter encoded element: ")
expression = "^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)?$"
matches = re.match(expression, element)
if matches:
print(f"{encoding_type + base64_encoding}")
else:
print("Unknown encoding type.")
is_base64()
- 158
- 1
- 13
def is_base64(s):
s = ''.join([s.strip() for s in s.split("\n")])
try:
enc = base64.b64encode(base64.b64decode(s)).strip()
return enc == s
except TypeError:
return False
In my case, my input, s, had newlines which I had to strip before the comparison.
- 8,010
- 15
- 46
- 69
x = 'possibly base64 encoded string'
result = x
try:
decoded = x.decode('base64', 'strict')
if x == decoded.encode('base64').strip():
result = decoded
except:
pass
this code put in the result variable decoded string if x is really encoded, and just x if not. Just try to decode doesn't always work.
- 1
-
1instead of x == decoded.encode('base64').strip() should be x == decoded.encode('base64').replace('\n', '') because of in some cases encode add several '\n' – Andy Jul 22 '15 at 14:58