3

I am trying to utilize an xml document but am running into this error:

Invalid byte 3 of 3-byte UTF-8 sequence

My document looks something like this below but with more tags and content. Please do not focus on the document below though. I use several documents with this format. I believe it is a character in my document thats invalid but I just don't know the best way to find it because it is so large.

Any ideas or tools I could use? Thanks.

THanks!

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE map PUBLIC "-//OASIS//DTD DITA Map//EN" "file:C:/Documentum/Viewed/map.dtd">
<map xmlns:dctm="http://www.documentum.com" dctm:obj_status="Read-Only" dctm:obj_id="09002af8800af696" dctm:version_label="CURRENT" xmlns:ditaarch="http://dita.oasis-open.org/architecture/2005/">
    <title>Overview of the Commercial General Liability (CGL) Insurance Coverages  </title><moreTagsHere><!-- more tags here... --></moreTagsHere>
</map>
Chris W. Rea
  • 10,978
joe
  • 313

6 Answers6

2

There was an invalid curly quote in my xml.

joe
  • 313
0

I'd try XMLStarlet:

[...] XMLStarlet is a set of command line utilities (tools) which can be used to transform, query, validate, and edit XML documents and files using simple set of shell commands in similar way it is done for plain text files using UNIX grep, sed, awk, diff, patch, join, etc commands. [...]
[emphasis mine]

Chris W. Rea
  • 10,978
0

I have a sneaking suspicion you may be using a tool by Microsoft.

In my experience Expression Web used to place header information in text files to identify them as what ever format they were - nothing else recognized them, and they showed up as random characters. This was particularly an issue with PHP as it broke includes.

salmonmoose
  • 1,701
0

You've probably used an editor that adds a Byte Order Mark (BOM) to the file. many/most xml editors/website editors allows you to save the document with/without the bom. Check the save options in whatever editor you've been using.

You probably need to remove the BOM to avoid the error.

If your editor doesn't support that option I can recommend the excellent Notepad++

Paxxi
  • 7,186
0

Forget the fact it's XML, you need to validate the UTF-8. Maybe simply open up in Firefox and search for the � character? Otherwise see UTF-8 validation on Stack Overflow.

Arjan
  • 31,511
0

If you're using tomcat you proably need to set up the encoding, I am using tomcat as a service in Windows and in the configuration options the following commmand did the trick for me:

Dfile.encoding=UTF-8

Hope it helps.

Excellll
  • 12,847