Using some tools that are usually distributed with Linux distros, we can inspect the binaries generated by our assembly code (and other binaries too).
The first tool we are going to use is hexdump
. If you are in the project's
root directory and have assembled and linked our first.asm
program, then type:
$ hexdump bin/first
0000000 457f 464c 0102 0001 0000 0000 0000 0000
0000010 0002 003e 0001 0000 0080 0040 0000 0000
0000020 0040 0000 0000 0000 0190 0000 0000 0000
0000030 0000 0000 0040 0038 0001 0040 0005 0004
0000040 0001 0000 0005 0000 0000 0000 0000 0000
0000050 0000 0040 0000 0000 0000 0040 0000 0000
0000060 008c 0000 0000 0000 008c 0000 0000 0000
0000070 0000 0020 0000 0000 0000 0000 0000 0000
0000080 3cb8 0000 bf00 0000 0000 050f 0000 0000
0000090 0000 0000 0000 0000 0000 0000 0000 0000
00000a0 0000 0000 0000 0000 0000 0000 0003 0001
(I truncated part of the output because it was too large.)
What we see here is the contents of our binary executable file, in hexadecimal format. The first number in each line is just the offset of that line with respect to the beginning of the file. Then each character after that represents 4 bits. Each group of 4 characters is then a group of 2 bytes.
In this way, we can see the exact output that nasm
and ld
generated for
our assembly code. You will also notice that certain combinations of
instruction/operand will always generate the same output.
We can also "disassemble" our binary, which means to generate the assembly code from the binary. To do this, type:
$ objdump -D bin/first
bin/first: file format elf64-x86-64
Disassembly of section .text:
0000000000400080 <_start>:
400080: b8 3c 00 00 00 mov $0x3c,%eax
400085: bf 00 00 00 00 mov $0x0,%edi
40008a: 0f 05 syscall
We can see some interesting things here. First, our mov rax, 60
command
from first.asm
was translated to b8 3c 00 00 00
. But more importantly,
through the disassembly we learn that this is the code for mov $0x3c,%eax
,
which, in the syntax that we are writing (called "Intel" syntax) would be
mov eax, 60
. Recall from earlier that EAX is
just the lower 32-bit portion of RAX.
B8
, if I understand correctly, is the opcode for mov eax
, and 3C
is just
60
in hexadecimal. And EAX is a 32-bit register, so that is why
the four pairs of hexadecimals (for a total of 4 bytes or 32 bits) passed
after it.
Another cool tool to look into binaries is the program xxd
(it was already
pre-installed in my Linux Mint OS).
Look at the output if we run it against our hello
binary:
$ xxd bin/hello
000000a0: 0d00 0000 0000 0000 0000 2000 0000 0000 .......... .....
000000b0: b801 0000 00bf 0100 0000 48be d800 6000 ..........H...`.
000000c0: 0000 0000 ba0d 0000 000f 05b8 3c00 0000 ............<...
000000d0: 4831 ff0f 0500 0000 4865 6c6c 6f2c 2077 H1......Hello, w
000000e0: 6f72 6c64 0a00 0000 0000 0000 0000 0000 orld............
000000f0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000100: 0000 0000 0300 0100 b000 4000 0000 0000 ..........@.....
(This is just part of the output, I truncated it above and below.) As you can see, this program shows the ASCII characters corresponding to some bytes in the executable (and dots when it is not an ASCII char).
Another interesting to compare, not even using these tools, is the size of
the executables generated when we made standalone binaries (with a
_start
label) and when we linked with the C program. You can use
the ls -lh
command to check the sizes of files in a directory.
Since the programs linked with our caller.c
helper (using the caller_c
script) are much larger than "pure" assembly programs, you can also look
into the .o
object files with objdump
, which will be much smaller, as
they don't have the instructions inserted by the gcc
.