2

I need to decompress zip files generated in Windows with Japanese language. I'm using unzip.

If I use unzip files.zip I will get bad file names. So, according to this question, I used unzip with -O cp932 to decompress them. In this way, I can get correct file names.

However, some of these zip files require passwords. I know the correct passwords, but unzip always tells me the passwords are wrong.

After some investigation, I found out I can successfully decompress zip files with pure English passwords. That is, zip files with password like "Hello" will work, but a password like "こんにちは" leads to "wrong password". So I guess it has something to do with character encoding.

Actually, I tried both of these:

  1. unzip -O cp932 compressed.zip and pasted "こんにちは" when it asked for password.
  2. unzip -O cp932 -P 'こんにちは' compressed.zip.

None of them work.

I found a similar question here which has no answer. It looks like that question was asking for a way to provide any byte sequence to unzip as the password. If that question has an answer, then the solution would also apply to my question, since I can manually convert the passwords into correct character encoding, and give the converted string to unzip.

2 Answers2

0

If a zip file is created with a non unicode codec and also encrypted with a password, the password you pass to unzip command also needs to be encoded as bytes in this specific codec. On Linux, the argument you pass to unzip will be read as utf-8, this is why unzip -O cp932 -P 'こんにちは' compressed.zip doesn't work.

So to sum it up, you need a way to provide password encoded with cp932 as bytes to unzip. There's no simple way to do this with unzip command , but this can be done with a Python script:

from zipfile import ZipFile

def extract_zip(archive_name, out_path, pwd, codec): # password also needs to be encoded with codec password = pwd.encode(codec) if pwd else None # metadata_encoding argument is available in Python3.11 with ZipFile(archive_name, "r", metadata_encoding=codec) as myzip: myzip.extractall(out_path, pwd=password)

extract_zip("compressed.zip", "output_dir", "こんにちは", "cp932")

oeter
  • 334
0

zipfile does not support AES-encrypted zip files at all, so it's out of picture. A third party module pyzipper does work with AES zip file, regardless compression algorithm. but it was created based on zipfile in Python 3.7, so it has no metadata_encoding. One can modify its source code to add that feature (fairly trivially), but here is a short script to just fix the filenames in post processing:

from pathlib import Path
import shutil

import pyzipper

def extract_encrypted_ANSI_zip(zipfile, password, encoding='gbk', create_new_folder=True): zipfile = Path(zipfile)

output_dir = Path.cwd() / zipfile.stem if create_new_folder else Path.cwd()

# create a temp folder to extract into, so we can fix the filenames before moving them to the output folder
temp = Path('temp')
while temp.exists():
    temp = temp.with_name(temp.stem + '_')

temp.mkdir()

with pyzipper.AESZipFile(str(zipfile), 'r', compression=pyzipper.ZIP_DEFLATED, encryption=pyzipper.WZ_AES) as extracted_zip:
    extracted_zip.extractall(str(temp), pwd=password.encode(encoding, errors='replace'))

all_files = [f for f in temp.rglob('*') if f.is_file()]
for f in all_files:
    relative_path = f.relative_to(temp)
    old_path = str(relative_path)
    new_path = old_path.encode('cp437', errors='replace').decode(encoding, errors='replace')
    if new_path != old_path:
        print(old_path, '-->', new_path)

    f2 = (output_dir / new_path)
    f2.parent.mkdir(parents=True, exist_ok=True)
    try:
        f.rename(f2)
    except FileExistsError:
        print(f2, 'already exists')

shutil.rmtree(temp)

extract_encrypted_ANSI_zip('comparessed.zip', password="こんにちは", encoding='cp932')