9

Preamble

I am trying to ascertain the platform-type of an EXE and I cannot reconcile what I'm seeing with any known source.

I have a bit of PowerShell which reads the first 30 characters of an EXE as text, splits on the characters 'PE', and then gets the next two characters after it. I then convert those characters into UTF-16 bytes, the output of which I check against a lookup table to ascertain the platform of the executable. This gives me the following list:

Platform Text Hex UTF-16
x86 L 4C 01 4C 1
x86-64 d† 64 86 64 2020
AA64 64 AA 64 AA

All well and good so far; however, a recent error report received from a user reads thus:

Error! Unhandled input: 64 15E

Looking up 15E I receive the Ş character, which nothing I look up online wants anything to do with. I am having immense difficulty trying to reconcile this with any known PE signature and I want to say "the file must just be corrupt", but I wanted to check in here and make sure I wasn't missing anything.

Question

What does 15E (Ş) translate to in a PE signature, if anything at all?

Preëmpting reader questions

  • Why are you using PowerShell and not xyz?
    • Because I have to. I cannot use any third-party tools to accomplish this, I can only use binaries we can assume to be installed on a standard Windows system. Otherwise I'd use EXIFTool or something. For similar reasons, my code must be PowerShell 2.0-compliant.
  • You shouldn't be reading the input as text
    • Perhaps not, but this method works and it allows me to use get-content to only pull the first 30 characters without mountains of code. Doing the same thing using (say) a stream reader and only pulling the first 30 bytes is a huge effort.
  • Instead of pulling the first x characters and splitting on PE you should be looking at specific locations for the PE header
    • The location of the PE header is inconsistent across platforms (and even across executables), which is not great when the whole point of the code is to ascertain what the platform type is.

The Function

function getPEArch ($fileInput) { #build 4/seagull april 2025
    $arrHex=$()
    (((($(get-content "$fileInput" -TotalCount 30) -as [string]) -split 'PE')[1]).substring(2,2)).ToCharArray() | % {
        $arrHex+="$([System.String]::Format("{0:X}", [System.Convert]::ToUInt32($_))) "
    }
switch ($arrHex.trim()) {
    "4C 1" {
        return "x86"
    } "64 2020" {
        return "x86-64"
    } "64 AA" {
        return "Arm64"
    } default {
        return "! ERROR: Unhandled input $($_)"
    }
}

}

seagull
  • 6,216
  • 10
  • 33
  • 42

1 Answers1

22

Looking up 15E I receive the Ş character, which nothing I look up online wants anything to do with.

Yeah, that's because it is not "Ş". You admit that you already know that it is not text, even if you're reading it as text, therefore you know that PE header field values are defined as integers, not as Unicode codepoints. Treating them as Unicode codepoints is only being deliberately obtuse.

What does 15E (Ş) translate to in a PE signature, if anything at all?

Nothing. From your table it seems that your method does not read the two-byte field accurately as the "UTF-16" conversions are wrong; a two-byte input decoded as UTF-16LE cannot possibly result in a value larger than 16 bits.

For example, bytes 64 86 decoded as UTF-16LE should become the Unicode codepoint U+8664 (which is treated as a single integer 0x8664) – it makes no sense whatsoever for those bytes to become "64 2020" (as U+202064 is outside the range of a valid Unicode codepoint), nor does it make sense in general for it to become two values – during UTF-16 decoding, two bytes would become a single UInt16 which becomes a single codepoint. (Only the opposite is possible, with a range of values representing "surrogate characters" in which case two bytes would decode to half a codepoint.)

$bytes = [byte] 0x64, 0x86
$text = [Text.Encoding]::Unicode.GetChars($bytes)
echo $text.Length
 # -> 1
echo $text
 # -> "虤"
$uint16 = [int] $text[0]
echo "U+{0:X4}" -f $uint16
 # -> "U+8664"
echo ($text[0] -eq [char] 0x8664)
 # -> True

In the same way, 64 15E doesn't represent a UTF-16 codepoint (as it is two values) and it doesn't represent two bytes (as one of the values is above 0xFF) so in general it is nonsensical.

Instead, it seems that what you're doing is converting the input through the legacy "ANSI" codepage – this maps each byte separately to some Unicode codepoint. This is very much not UTF-16, instead the resulting codepoint generally has no visible correspondence to the input byte value when it comes to values above ASCII. But more importantly, this method is system locale dependent as "ANSI" can be aliased to Windows-1251 on one system, Windows-1257 on another, with different byte-to-codepoint mappings on each.

this method works and it allows me to use get-content to only pull the first 30 characters without mountains of code.

Aside from the UTF-16 issue (which might be fixable1 but I would say not worth the time as it's a poor approach), there is no guarantee that the PE header will start within the first 30 characters (in quick tests I've seen offsets of >200 bytes), and there's no guarantee that the preceding MZ header won't also happen to have PE in the middle.

The latter may be unlikely but it still means that a Split-based approach is not robust; many previous "parser confusion" bugs have shown that it's better to follow the same logic that "standard" PE/COFF parsers do – as shown below, the PE header is normally found through an indirect offset in the MZ header.

1 If you're working exclusively with Windows PowerShell 5.x (the one that comes bundled with Windows), then you might be able to use -Encoding Byte to read a raw byte-array, but this no longer exists in PowerShell 7.x.

Doing the same thing using (say) a stream reader and only pulling the first 30 bytes is a huge effort.

It is still fewer lines of text than your post:

Param($Path);

Platform names for executables the script has been tested with

https://github.com/bminor/binutils-gdb/blob/master/include/coff/pe.h

$platforms = @{ 0x014c = "Intel i386" # PuTTY website 0x0166 = "MIPS R4000" # WinNT 4.0 Server.iso; windowsce1.0 0x0184 = "Alpha AXP" # WinNT 4.0 Server.iso 0x01a2 = "Hitachi SH3 (WinCE)" # archive.org windowsce1.0 0x01c0 = "ARMv4 (WinCE)" # archive.org windows-ce-programs 0x01c2 = "ARMv4 Thumb (WinCE)" # archive.org windows-ce-programs 0x01c4 = "ARM 32-bit WinNT" # PuTTY website 0x01f0 = "PowerPC" # WinNT 4.0 Server.iso 0x0200 = "Itanium (IA64)" # some random Win2003 updates 0x8664 = "AMD64 (x86_64)" # regular Windows 0xaa64 = "ARM64 (AArch64)" # PuTTY website }

$ltoh16 = if ([BitConverter]::IsLittleEndian) { 0..1 } else { 1..0 } $ltoh32 = if ([BitConverter]::IsLittleEndian) { 0..3 } else { 3..0 }

$rd = [IO.FileStream]::new($Path, [IO.FileMode]::Open, [IO.FileAccess]::Read) $buf = [byte[]]::new(4) $null = $rd.Read($buf, 0, 4) if ([BitConverter]::ToUInt16($buf[$ltoh16], 0) -eq 0x5a4d) { echo "Found MZ header" $null = $rd.Seek(0x3C, [IO.SeekOrigin]::Begin) $null = $rd.Read($buf, 0, 4) $pe_offset = [BitConverter]::ToUInt32($buf[$ltoh32], 0) echo "MZ header indicates PE offset @ $pe_offset" # refill buffer from the new location, then fall through $null = $rd.Seek($pe_offset, [IO.SeekOrigin]::Begin) $null = $rd.Read($buf, 0, 4) } else { # no MZ header; fall through to checking for PE header } if ([BitConverter]::ToUInt32($buf[$ltoh32], 0) -eq 0x00004550) { echo "Found PE header" $null = $rd.Read($buf, 0, 2) $platform_id = [BitConverter]::ToUInt16($buf[$ltoh16], 0) $platform_name = $platforms[[int] $platform_id] if (-not $platform_name) { $platform_name = "Unknown 0x{0:X4}" -f $platform_id } echo "Platform is $platform_name" } else { echo "Not a PE file" } $rd.Dispose()

For building a test suite, Arm64 and Arm32 executables can be downloaded e.g. from the PuTTY website, while MIPS, Alpha, and PowerPC executables can be found in a Windows NT 4.0 .iso that you can get from Archive.org or WinWorldPC (the CD has files for all four platforms); IA64 .exe's can be found among Windows 2003 updates in various random Microsoft download archives.

The location of the PE header is inconsistent across platforms (and even across executables), which is not great when the whole point of the code is to ascertain what the platform type is.

No, it is very consistent: if an MZ header is present, then the PE header is always located at the offset read from 0x3C, without any variation between architectures; and if the MZ header isn't present then the PE header starts at 0x0 (which I think would be the case for .obj compiler output before linking).

grawity
  • 501,077