77

I have some files that are corrupted with this symbol:

^@

It's not part of the string; it's not searchable. How do I substitute this symbol with nothing, or how do I delete this symbol?

Here is an example line from one file:

^@F^@i^@l^@e^@n^@a^@m^@e^@ ^@ ^@ ^@ ^@ ^@ ^@ ^@ ^@ ^@ ^@:^@ ^@^M^@
Brad Koch
  • 151
mrt181
  • 955

10 Answers10

66

You could try:

  • %s/<CTRL-2>//g (on regular PCs)

  • %s/<CTRL-SHIFT-2>//g (on Mac PCs)

where <CTRL-2> means first press down the CTRL on regular PCs, keeping it as pressed down, hit 2, release CTRL.

and <CTRL-SHIFT-2> means first press down the control on Mac PCs, keeping it as pressed down, press down shift on Mac PCs, keeping it as pressed down, hit 2, release control and shift.

Finally, both of the two commands should result in %s/^@//g on screen. ^@ means a single character (a NULL byte, which otherwise couldn’t be displayed), not ^ followed by @, so you can't just type ^ and @ in a row in the above command.

This command removes all the ^@.

phresus
  • 1,002
55

This actually worked for me within vim:

:%s/\%x00//g
jriggins
  • 669
53

I don't think your files are corrupted. Your example line looks like it contains regular text with null bytes between each character. This suggests it's a text file that's been encoded in UTF-16 but the byte-order mark is missing from the start of the file. See http://en.wikipedia.org/wiki/Byte-order_mark

Suppose I open Notepad, type the word 'filename', and save as Unicode Big-endian. A hex dump of this file looks like this:

fe ff 00 66 00 69 00 6c 00 65 00 6e 00 61 00 6d 00 65

If I open this file in Vim it looks fine - the 'fe ff' bytes tell Vim how the file is encoded. Now suppose I create a file containing the exact same sequence of bytes, but without the leading 'fe ff'. Vim inserts ^@ (or <00>, depending on your config), in place of the null bytes; Notepad inserts spaces.

So rather than remove the nulls, you should really be looking to get Vim to interpret the file correctly. You can get Vim to reload the file with the correct encoding with the command:

:e ++enc=utf16

jrb
  • 649
13

That 'symbol' represents a NULL character, with ASCII value 000.

It's difficult to remove with vim, try

tr -d '\000' < file1 > file2
pavium
  • 6,490
8

As others have noted, those are null bytes (ASCII 00). On Linux, the way to enter ASCII values into vim is to press Ctrl-V followed by the 3-digit octal value of any character. To replace all null bytes, use:

    :%s/Ctrl-V000//g

(with no spaces).

Likewise, you can search for nulls with:

    /Ctrl-V000

In both cases, it won't show the zeros as you're typing them, but after entering all three, it will display ^@. On color terminals it will show that in blue to indicate that it's a control character.

TheAmigo
  • 310
6

FWIW, in my case I had to use vim on cygwin to edit a text file created on a mac. The accepted solution didn't work for me, but was close. According to Vim wiki page about working with Unicode, there is a difference between Big Endian and Little Endian versions of the BOM byte. So, I had to explicitly tell vim to use a Little Endian version of BOM encoding.

Only after picking the right encoding I converted the file format (line endings) to dos so I could edit the file in Windows editor. Trying to set reset the file format before specifying the encoding gave me grief. Here is the full list of commands I used:

:e ++enc=utf16le
:w!
:e ++ff=mac
:setlocal ff=dos
:wq
rpyzh
  • 163
3

The accepted solution did not work for me. I made vim pipe the file through tr instead:

:%!tr -d '\000'

This would also work well with visual mode (just type :!tr -d '\000') or on a range of lines:

# Remove nulls from current line:
:.!tr -d '\000'

# Remove nulls from lines 3-5:
:3,5!tr -d '\000'
james
  • 323
2

^@ not a bad character if you use a proper encoding, but if you want to remove then try:

  • tr -d '\000'
  • sed 's/\000//g'

^M character is there in your example data

To convert your file to Unix/Linux format before any processing, try:

dos2unix filename - rhel and other

dos2ux filename [newfilename] - HP-UX

kenorb
  • 26,615
1

In addition to @jrb's answer, in Vim, the character encoding of the file is detected based on the fileencodings option. (note the 's' at end of fileencodings)

I.e. on Windows, the default value for the fileencodings option is ucs-bom, which means:

check if BOM exists at the beginning of the file.

If BOM exists, then 'read the character encoding of the file out of BOM'.

If BOM doesn't exist (and in this case that would also mean that all character encodings specified in the fileencodings option failed to match), then read the file with the character encoding specified in the encoding option. The default character encoding for the encoding option is: latin1. Now, because latin1 is the one byte length character encoding, all bytes in the file are valid latin1 characters (even the Nul character ^@ that you're seeing*).

*- actually, ^@ is the newline character in the Vim's buffer text, not the Nul character.

The proper way to read the file is to specify the character encoding manually as UTF-16 (as it looks like UTF-16 is the proper char encoding in this case).

colemik
  • 1,684
0

if you are here to delete some other character have a look. Try this:

sed 's/;*$//g' <file1> file2

s/ means entire document

; is the text to find. you can replace it with your character or regex

*$ means last character of line. you can use your own regex here

//g means replace with nothing, here again you can use your regex

< input file

'>'output file