-1

I would like to use Microsoft Word (on a PC specifically) to open, edit, and then save again a plaintext file in UTF-8 format, but without adding the BOM character sequence to the beginning.

Let's just go ahead and assume that I'm asking in regard to any version of Word after, say, Word 2010.

I see no option in the Save As dialog to do this, nor anywhere else that I can see.

I can see this question asked any number of times about other programs, but I don't see anything specific to Word.

psoft
  • 275

1 Answers1

1

You can't do that directly in Word, because without the BOM there's no way to make sure that the file is encoded in UTF-8. Remember There Ain’t No Such Thing As Plain Text.

Despite the name, the BOM is not used for byte-order marking in UTF-8 but rather as a signature. Without the signature Word will ask you to confirm the encoding every time you open the file because what if the file is an ANSI code page (which is still the default in Windows). It has very good heuristics and guess correctly most of the time though, especially with encodings that are easy to guess like UTF-8. In my experience it works great even for tricky encodings in various languages

That said, you can write a macro to do the saving part instead of using Word's save feature. See

Alternatively just remove the BOM after saving with Word using other tools, like PowerShell, iconv, Notepad++ or a 3rd party editor. Here's the PowerShell script that does the conversion

$MyFile = Get-Content -Encoding UTF8 $MyPath
$Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
[System.IO.File]::WriteAllLines($MyPath, $MyFile, $Utf8NoBomEncoding)
phuclv
  • 30,396
  • 15
  • 136
  • 260