2

I am interested in System Software development. I have been analyzing the working of a compiler for a few days. An Assembly code generated by a Compiler(say) clc has an opcode f8 and I am sure that the Assembler assembling the above mnemonic, substitutes its opcode f8 in it's place.

What is bothering me is the aftermath of this stage(I'm aware of the Linking stage in-between).

I mean, What exactly happens after this stage? Say the final executable is a raw binary file. Does that mean the opcode f8 is converted into binary data 1111 1000 and stored in the file?

If that is the case, why am I not able to view the binary contents of a binary file using a normal text editor(say Notepad) - after all it's '0's and '1's right?

1 Answers1

2

First, always use the right tool for the job. Text editor for viewing binary files is the same as to use a knife for nailing. Use any HEX viewer/editor for such tasks or better use the tool that knows internals of the binary file in question. If we talking about CPU's opcodes then something like IDA Pro free or OllyDbg would be useful for analyzing internals of executable files.

Does that mean the opcode f8 is converted into binary data 1111 1000 and stored in the file?

As was correctly pointed by @Mokubai - 0xF8 is same number as 1111 1000, one represented in HEX notation and the last one as binary representation. It is the same as number 248 in decimal system.

If you creating manually executable code from CPU opcodes (or compile assembler source code), then i386 CPU will recognize 0xF8 (or 0b11111000 or 248 - it all the same) as CLC instruction.

An Assembly code generated by a Compiler(say) clc has an opcode f8 and I am sure that the Assembler assembling the above mnemonic, substitutes its opcode f8 in it's place.

That's true, except - "An Assembly code generated by a Compiler". I just want to be sure you correctly understanding difference between "Assembly code" and opcodes. Opcodes are exact language that CPU can understand, it just numbers ( and it is how we programmed first computers when translators from CPU mnemonics aka assembler was a dream )

Nowadays, we mostly using "direct" compilation from high level programming language directly to executable binaries with compilers such C/C++/GoLang that produce CPU opcodes.
(When I said "direct compilation" that's not actually true, under the hood compilers doing multiple steps before it produced executable binaries, but for the end user it looks the same as we driving a car without need to know how gasoline converted to movement)

As was mentioned correctly by @sawdust in comment, higher level programming languages can use different strategies to create CPU opcodes. You can analyze for example gcc compiler how it would cook opcodes by telling it to generate assembler code that would be used to make opcodes(object codes)

 gcc -S -o myprogram.asm myprogram.c

If that is the case, why am I not able to view the binary contents of a binary file using a normal text editor(say Notepad) - after all it's '0's and '1's right?

Notepad speak another language. It understands its own "opcodes" - ASCII, anything else it's "greek" to Notepad.

Alex
  • 6,375