1

whenever i try to open a binary file like an image or exe program i dont see 0 and 1 i just see some stuff like \x00\x8fi>\x9f\xd2\x98\x16\ How is that binary data ?

I am using : python 3.7 and output is on cmd on windows 10 pro i did a file.read() by "rb" and not "r"

2 Answers2

1

Those values are hexadecimal.

Hexadecimal is a way of representing of numbers using base-16

Binary is a way of representing numbers using base-2 (0,1).

The numbers are identical but are shown differently because binary 00001111 is more verbose than hexadecimal 0f (often displayed as x0f, the x denoting "hex").

As far as the computer is concerned the value is 00001111, but what is shown to you is more informationally dense and easier to "think" with because humans are used to thinking in base-10 (decimal). The number is stored in memory using the exact same binary bits, is used in calculations in exactly the same way and gives exactly the same results.

The only difference is in how the number is shown to you as a user.

You might want to read up on how to convert binary, decimal and hexadecimal as it can be useful to know how the numbers we use are merely representations of each other that can be easier to work with depending on your situation. Hexadecimal is, once you get the hang of it, a much better representation to work with computer data than normal decimal.

It seems you are probably also opening a binary file in a text editor. If you opened a binary file in notepad then you are likely seeing the program mixing the view of characters in with hexadecimal. Characters are nothing more than binary data in a certain range and within that range the text editor will show them as "text" rather than hex values. if you want to see the proper file data without interpretation then use a hex viewer, not a text editor.

Mokubai
  • 95,412
1

Assuming Linux in some cases below

In python, you can display any data in a "displayable" format, which is created by repr(value) where value is any variable (or literal?).

Now data is always the same, until you convert it, either directly or implicitly.

Most often you may decide how to interpret or display the data.

The file command will attempt to identify a file's content by looking for certain patterns of data. What content is "what" is then defined in the data-file that has been created together with the "file" executable.

Now, simple text files can be created in this style:

$ echo >test.txt "Hello, World!"

And then checked / displayed by use of od - a utility to verify file content...

$ od -t x1z test.txt 
0000000 48 65 6c 6c 6f 2c 20 57 6f 72 6c 64 21 0a        >Hello, World!.<
0000016

$ ls -l test.txt 
-rw-rw-r-- 1 hannu hannu 14 maj 26 19:08 test.txt

$ date
tis 26 maj 2020 19:08:58 CEST

$ od -t x1z test.txt 
0000000 48 65 6c 6c 6f 2c 20 57 6f 72 6c 64 21 0a        >Hello, World!.<
0000016

$ od -t o1z test.txt 
0000000 110 145 154 154 157 054 040 127 157 162 154 144 041 012          >Hello, World!.<
0000016

$ od -t a test.txt 
0000000   H   e   l   l   o   ,  sp   W   o   r   l   d   !  nl
0000016

$ 

So, the data is the same just interpreted or displayed in differing format
(man od to see what this utility can do, on top of the above)

When you you're using Excel (or e.g. LibreOffice Calc), you have similar possibilities; but more oriented towards formatting the data you're displaying; based on the actual cell content.

e.g. the number "25569" can be seen as just that number formatted in different ways with thousands separators, decimals and whatnot. Or when set to be a date; 1970-01-01 (in LbreOffice at least; hold CTRL, hit 1, and click Date in the dialog that appears - same in Excel).

More Python?

$ python3
Python 3.8.2 (default, Apr 27 2020, 15:53:34) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> f=open('test.txt','rb')

>>> s=f.readline()

>>> s
b'Hello, World!\n'

>>> s.decode()
'Hello, World!\n'

>>> print(s.decode())
Hello, World!

>>> print(repr(s))
b'Hello, World!\n'

>>> from binascii import *

>>> hexlify(s)
b'48656c6c6f2c20576f726c64210a'

>>> quit()

$
Hannu
  • 10,568