Looking up 15E I receive the Ş character, which nothing I look up online wants anything to do with.
Yeah, that's because it is not "Ş". You admit that you already know that it is not text, even if you're reading it as text, therefore you know that PE header field values are defined as integers, not as Unicode codepoints. Treating them as Unicode codepoints is only being deliberately obtuse.
What does 15E (Ş) translate to in a PE signature, if anything at all?
Nothing. From your table it seems that your method does not read the two-byte field accurately as the "UTF-16" conversions are wrong; a two-byte input decoded as UTF-16LE cannot possibly result in a value larger than 16 bits.
For example, bytes 64 86 decoded as UTF-16LE should become the Unicode codepoint U+8664 (which is treated as a single integer 0x8664) – it makes no sense whatsoever for those bytes to become "64 2020" (as U+202064 is outside the range of a valid Unicode codepoint), nor does it make sense in general for it to become two values – during UTF-16 decoding, two bytes would become a single UInt16 which becomes a single codepoint. (Only the opposite is possible, with a range of values representing "surrogate characters" in which case two bytes would decode to half a codepoint.)
$bytes = [byte] 0x64, 0x86
$text = [Text.Encoding]::Unicode.GetChars($bytes)
echo $text.Length
# -> 1
echo $text
# -> "虤"
$uint16 = [int] $text[0]
echo "U+{0:X4}" -f $uint16
# -> "U+8664"
echo ($text[0] -eq [char] 0x8664)
# -> True
In the same way, 64 15E doesn't represent a UTF-16 codepoint (as it is two values) and it doesn't represent two bytes (as one of the values is above 0xFF) so in general it is nonsensical.
Instead, it seems that what you're doing is converting the input through the legacy "ANSI" codepage – this maps each byte separately to some Unicode codepoint. This is very much not UTF-16, instead the resulting codepoint generally has no visible correspondence to the input byte value when it comes to values above ASCII. But more importantly, this method is system locale dependent as "ANSI" can be aliased to Windows-1251 on one system, Windows-1257 on another, with different byte-to-codepoint mappings on each.
this method works and it allows me to use get-content to only pull the first 30 characters without mountains of code.
Aside from the UTF-16 issue (which might be fixable1 but I would say not worth the time as it's a poor approach), there is no guarantee that the PE header will start within the first 30 characters (in quick tests I've seen offsets of >200 bytes), and there's no guarantee that the preceding MZ header won't also happen to have PE in the middle.
The latter may be unlikely but it still means that a Split-based approach is not robust; many previous "parser confusion" bugs have shown that it's better to follow the same logic that "standard" PE/COFF parsers do – as shown below, the PE header is normally found through an indirect offset in the MZ header.
1 If you're working exclusively with Windows PowerShell 5.x (the one that comes bundled with Windows), then you might be able to use -Encoding Byte to read a raw byte-array, but this no longer exists in PowerShell 7.x.
Doing the same thing using (say) a stream reader and only pulling the first 30 bytes is a huge effort.
It is still fewer lines of text than your post:
Param($Path);
Platform names for executables the script has been tested with
https://github.com/bminor/binutils-gdb/blob/master/include/coff/pe.h
$platforms = @{
0x014c = "Intel i386" # PuTTY website
0x0166 = "MIPS R4000" # WinNT 4.0 Server.iso; windowsce1.0
0x0184 = "Alpha AXP" # WinNT 4.0 Server.iso
0x01a2 = "Hitachi SH3 (WinCE)" # archive.org windowsce1.0
0x01c0 = "ARMv4 (WinCE)" # archive.org windows-ce-programs
0x01c2 = "ARMv4 Thumb (WinCE)" # archive.org windows-ce-programs
0x01c4 = "ARM 32-bit WinNT" # PuTTY website
0x01f0 = "PowerPC" # WinNT 4.0 Server.iso
0x0200 = "Itanium (IA64)" # some random Win2003 updates
0x8664 = "AMD64 (x86_64)" # regular Windows
0xaa64 = "ARM64 (AArch64)" # PuTTY website
}
$ltoh16 = if ([BitConverter]::IsLittleEndian) { 0..1 } else { 1..0 }
$ltoh32 = if ([BitConverter]::IsLittleEndian) { 0..3 } else { 3..0 }
$rd = [IO.FileStream]::new($Path, [IO.FileMode]::Open, [IO.FileAccess]::Read)
$buf = [byte[]]::new(4)
$null = $rd.Read($buf, 0, 4)
if ([BitConverter]::ToUInt16($buf[$ltoh16], 0) -eq 0x5a4d) {
echo "Found MZ header"
$null = $rd.Seek(0x3C, [IO.SeekOrigin]::Begin)
$null = $rd.Read($buf, 0, 4)
$pe_offset = [BitConverter]::ToUInt32($buf[$ltoh32], 0)
echo "MZ header indicates PE offset @ $pe_offset"
# refill buffer from the new location, then fall through
$null = $rd.Seek($pe_offset, [IO.SeekOrigin]::Begin)
$null = $rd.Read($buf, 0, 4)
} else {
# no MZ header; fall through to checking for PE header
}
if ([BitConverter]::ToUInt32($buf[$ltoh32], 0) -eq 0x00004550) {
echo "Found PE header"
$null = $rd.Read($buf, 0, 2)
$platform_id = [BitConverter]::ToUInt16($buf[$ltoh16], 0)
$platform_name = $platforms[[int] $platform_id]
if (-not $platform_name) {
$platform_name = "Unknown 0x{0:X4}" -f $platform_id
}
echo "Platform is $platform_name"
} else {
echo "Not a PE file"
}
$rd.Dispose()
For building a test suite, Arm64 and Arm32 executables can be downloaded e.g. from the PuTTY website, while MIPS, Alpha, and PowerPC executables can be found in a Windows NT 4.0 .iso that you can get from Archive.org or WinWorldPC (the CD has files for all four platforms); IA64 .exe's can be found among Windows 2003 updates in various random Microsoft download archives.
The location of the PE header is inconsistent across platforms (and even across executables), which is not great when the whole point of the code is to ascertain what the platform type is.
No, it is very consistent: if an MZ header is present, then the PE header is always located at the offset read from 0x3C, without any variation between architectures; and if the MZ header isn't present then the PE header starts at 0x0 (which I think would be the case for .obj compiler output before linking).