Which System.Text.Encoding is used by CharSet.Ansi?
I want to decode a string in a .NET Core app (that was marshalled before by C++ code) without defining a structure and using Marshal.PtrToStructure.
Encoding.GetEncoding(???).GetString(...)
In a .NET Framework application System.Text.Encoding.Default works:
[StructLayout(LayoutKind.Sequential, Pack = 1, CharSet = CharSet.Ansi)]
public struct Structure
{
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 5)]
public string FieldA;
}
public class Net461App
{
static void Main(string[] args)
{
var @struct = new Structure { FieldA = "äöüß" };
byte[] buffer = ToByteArray(@struct);
var unmarshalled = ToStructure<Structure>(buffer).FieldA; // "äöüß"
Console.WriteLine(Encoding.Default.GetString(buffer).Trim('\0')); // "äöüß"
Console.WriteLine(Encoding.Default.EncodingName); // Western European (Windows)
Console.WriteLine(Encoding.Default.CodePage); // 1252
int ansiCodePage = Thread.CurrentThread.CurrentCulture.TextInfo.ANSICodePage; // 1252
Encoding ansiEncoding = Encoding.GetEncoding(ansiCodePage); // works
Console.WriteLine(ansiEncoding.GetString(buffer).Trim('\0')); // "äöüß"
Console.WriteLine(ansiEncoding.EncodingName); // Western European (Windows)
Console.WriteLine(ansiEncoding.CodePage); // 1252
}
public static byte[] ToByteArray<T>(T structure) where T : struct
{
var buffer = new byte[Marshal.SizeOf(structure)];
IntPtr handle = Marshal.AllocHGlobal(buffer.Length);
try
{
Marshal.StructureToPtr(structure, handle, true);
Marshal.Copy(handle, buffer, 0, buffer.Length);
return buffer;
}
finally
{
Marshal.FreeHGlobal(handle);
}
}
public static T ToStructure<T>(byte[] buffer) where T : struct
{
IntPtr handle = Marshal.AllocHGlobal(buffer.Length);
try
{
Marshal.Copy(buffer, 0, handle, buffer.Length);
return Marshal.PtrToStructure<T>(handle);
}
finally
{
Marshal.FreeHGlobal(handle);
}
}
}
Encoding.Default is the same as Thread.CurrentThread.CurrentCulture.TextInfo.ANSICodePage which both produce the same string as CharSet.Ansi. But is TextInfo.ANSICodePage always the same as CharSet.Ansi?
Encoding.Default is different in .NET Core and the code page 1252 is not supported:
public class NetCore2App
{
static void Main(string[] args)
{
var @struct = new Structure { FieldA = "äöüß" };
byte[] buffer = ToByteArray(@struct);
var unmarshalled = ToStructure<Structure>(buffer).FieldA; // "äöüß"
Console.WriteLine(Encoding.Default.GetString(buffer).Trim('\0')); // "????"
Console.WriteLine(Encoding.Default.EncodingName); // Unicode(UTF - 8)
Console.WriteLine(Encoding.Default.CodePage); // 65001
int ansiCodePage = Thread.CurrentThread.CurrentCulture.TextInfo.ANSICodePage; // 1252
Encoding ansiEncoding = Encoding.GetEncoding(ansiCodePage); // throws "No data is available for encoding 1252."
Console.WriteLine(ansiEncoding.GetString(buffer).Trim('\0')); // ...
Console.WriteLine(ansiEncoding.EncodingName); // ...
Console.WriteLine(ansiEncoding.CodePage); // ...
}
// ...
}
Update:
Investigating the suggestion to use System.Text.Encoding.CodePages I found the following hint about getting the systems current ANSI code page:
Encoding.GetEncoding
- To get the encoding associated with the default ANSI code page in the operating system's regional and language settings, you can either supply a value 0 for the codepage argument
- If the registered provider is the CodePagesEncodingProvider, the method returns the encoding that matches the system active code page when running on the Windows operating system.
The following seems to work:
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
int currentAnsiCodePage = Encoding.GetEncoding(0).CodePage;
Encoding encoding = Encoding.GetEncoding(currentAnsiCodePage);
Each of the following gives me the same code page that correctly decodes the string in my test
- Encoding.GetEncoding(0).CodePage (requires
CodePagesEncodingProviderregistration) - Thread.CurrentThread.CurrentCulture.TextInfo.ANSICodePage
- CultureInfo.CurrentCulture.TextInfo.ANSICodePage
Not sure which one is preferable and works an all machines.