I'm having difficulty getting the character encoding of a file. The offending code is here:
    rawdata = open(file, "r").read()
    encoding = chardet.detect(rawdata.encode())['encoding']
    #return encoding
(Code courtesy of Ashish Greycube: https://github.com/frappe/frappe/pull/8061
I've copied a segment of the csv file I'm working on as a more manageable 'test' file. When I run the above code on it, it says it's 'ascii'. That might be part of the problem. Basically, I've found out that I need to know the encoding type for this prpogram.
The error report is as follows:
Traceback (most recent call last):
  File ".\program.py", line 26, in <module>
    my_encoding = get_file_encoding(data)
  File ".\program.py", line 20, in get_file_encoding
    encoding = chardet.detect(rawdata.encode())['encoding']
  File "C:\Users\stsho\AppData\Local\Programs\Python\Python38-32\lib\site-packages\chardet\__init__.py", line 38, in detect
    detector.feed(byte_str)
  File "C:\Users\stsho\AppData\Local\Programs\Python\Python38-32\lib\site-packages\chardet\universaldetector.py", line 211, in feed
    if prober.feed(byte_str) == ProbingState.FOUND_IT:
  File "C:\Users\stsho\AppData\Local\Programs\Python\Python38-32\lib\site-packages\chardet\charsetgroupprober.py", line 71, in feed
    state = prober.feed(byte_str)
  File "C:\Users\stsho\AppData\Local\Programs\Python\Python38-32\lib\site-packages\chardet\hebrewprober.py", line 227, in feed
    byte_str = self.filter_high_byte_only(byte_str)
  File "C:\Users\stsho\AppData\Local\Programs\Python\Python38-32\lib\site-packages\chardet\charsetprober.py", line 63, in filter_high_byte_only
    buf = re.sub(b'([\x00-\x7F])+', b' ', buf)
  File "C:\Users\stsho\AppData\Local\Programs\Python\Python38-32\lib\re.py", line 208, in sub
    return _compile(pattern, flags).sub(repl, string, count)
MemoryError
PS C:\Users\stsho\dev\csv_sanitizer_1.2> python .\program.py
Please enter filename: ANQAR
Traceback (most recent call last):
  File ".\program.py", line 26, in <module>
    my_encoding = get_file_encoding(data)
  File ".\program.py", line 19, in get_file_encoding
    rawdata = open(file, "r").read()
FileNotFoundError: [Errno 2] No such file or directory: 'ANQAR.csv'
PS C:\Users\stsho\dev\csv_sanitizer_1.2> python .\program.py
Please enter filename: ANQAR
Traceback (most recent call last):
  File ".\program.py", line 26, in <module>
    my_encoding = get_file_encoding(data)
  File ".\program.py", line 19, in get_file_encoding
    rawdata = open(file, "r").read()
FileNotFoundError: [Errno 2] No such file or directory: 'ANQAR.csv'
PS C:\Users\stsho\dev\csv_sanitizer_1.2> python .\program.py
Please enter filename: ANQAR
Traceback (most recent call last):
  File ".\program.py", line 26, in <module>
    my_encoding = get_file_encoding(data)
  File ".\program.py", line 20, in get_file_encoding
    encoding = chardet.detect(rawdata.encode())['encoding']
  File "C:\Users\stsho\AppData\Local\Programs\Python\Python38-32\lib\site-packages\chardet\__init__.py", line 38, in detect
    detector.feed(byte_str)
  File "C:\Users\stsho\AppData\Local\Programs\Python\Python38-32\lib\site-packages\chardet\universaldetector.py", line 211, in feed
    if prober.feed(byte_str) == ProbingState.FOUND_IT:
  File "C:\Users\stsho\AppData\Local\Programs\Python\Python38-32\lib\site-packages\chardet\charsetgroupprober.py", line 71, in feed
    state = prober.feed(byte_str)
  File "C:\Users\stsho\AppData\Local\Programs\Python\Python38-32\lib\site-packages\chardet\hebrewprober.py", line 227, in feed
    byte_str = self.filter_high_byte_only(byte_str)
  File "C:\Users\stsho\AppData\Local\Programs\Python\Python38-32\lib\site-packages\chardet\charsetprober.py", line 63, in filter_high_byte_only
    buf = re.sub(b'([\x00-\x7F])+', b' ', buf)
  File "C:\Users\stsho\AppData\Local\Programs\Python\Python38-32\lib\re.py", line 208, in sub
    return _compile(pattern, flags).sub(repl, string, count)
MemoryError
 
     
    