Architecture 1001: x86-64 Assembly - OST

Go take the Architecture 1001: x86-64 Assembly course if you didn’t!

This page is just a web port of the Obsidian notes I wrote while taking the course. There are no new informations. I just wanted a centralized page where I can reference the main concepts and links provided during the course and be able to consult it on my phone or tablet.

Maybe someone else can use it too.
Here the Obsidian Vault folder.

I knew my way around assembly, but this class taught me a lot of new concepts and tricks in a fun and entertaining way ❤️ It made me more confortable with Windows, WinDbg and the Visual Studio Debugger and forced me to dig deeper on old and new stuff. During this class I also practised with other tools such as radare2.

I created different PDFs of the Intel documention for each instruction teached during the course.

Why?

Because the Intel manual is 5000 pages long and finding any particular istruction is a pain in the ass every time. I used the December 2023 Intel Manual version and not the 2020 one used in the course.

Here if you prefer the directory listing instead of the following table.

Instructions#

# Instruction Manual PDF
1 🌟 NOP xchg-nop.pdf
2 🌟 PUSH push.pdf
3 🌟 POP pop.pdf
4 🌟 CALL call.pdf
5 🌟 RET ret.pdf
6 🌟 MOV mov.pdf
7 🌟 ADD add.pdf
8 🌟 SUB sub.pdf
9 🌟 IMUL imul.pdf
10 🌟 MOVZX movzx.pdf
11 🌟 MOVSX/MOVSXD movsx-movsxd.pdf
12 🌟 LEA lea.pdf
13 🌟 JMP jmp.pdf
14 🌟 Jcc jcc.pdf
15 🌟 CPM cmp.pdf
16 🌟 AND and.pdf
17 🌟 OR or.pdf
18 🌟 XOR xor.pdf
19 🌟 NOT not.pdf
20 🌟 INC inc.pdf
21 🌟 DEC dec.pdf
22 🌟 TEST test.pdf
23 🌟 SHL shl.pdf
24 🌟 SHR shr.pdf
25 🌟 SAR sar.pdf
26 🌟 SAL sal.pdf
27 🌟 DIV div.pdf
28 🌟 IDIV idiv.pdf
29 🌟 REP STOS stos.pdf
30 🌟 REP MOVS movs.pdf
31 🌟 LEAVE leave.pdf
32 🌟 CMOVcc cmovcc.pdf
CDQ cdq.pdf
REP rep.pdf
ENDBR64 endbr64.pdf

Intel vs. AT&T Syntax#

Even though Xeno advises to learn the AT&T assembly flavour as good as the Intel one, I didn’t bother with it too much. I can read it and I can use it, but I don’t like it.

I don’t like the style of it. Now that I finished the course I don’t like even more because there are a lot of differences between the AT&T syntax and the Intel manual. Too many aliases specific for AT&T syntax.

Since there are already too many f*n manuals to read in our field, I prefer not to have the need to consult the AT&T one.

Virtually all the tools allow you to change the assembly syntax they display. That’s what I do with gdb and objdump.

Mistery Listery#

  1. Why do the GCC/Clang HelloWorlds have balanced push/pop instructions but Visual Studio doesn’t?
  2. What’s up with the sub/add 0x28 in main()?
  3. Why is VS over-allocating space for a single variable?

Introduction#

Intro Slides

Why learn assembly?#

Here’s the reasons listed by Xeno:

  1. Rare skillz are more valuable 🤑🤑🤑
  2. It’s essential to RE
  3. It’s essential to write memory corruption exploits
  4. It’s really satisfying to know how things work!

I’m in mostly for reasons #3 and #4.

About reason #1 I think that in the long run it is true. At the same time, I think I wouldn’t still be poor as sh** if I invested that time on React or some similar (ugly) stuff 😂

But that’s not the same fun! 😏

About the course#

Concepts:

  • x86-64 bit instructions and hardware
  • Implementation of the stack
  • Common Debugging/Disassembly Tools

14 Assembly Instructions accounts for 90% of the code

By the end of this class you should:

  • Know the x86-64 general purpose registers, and their 32 and 16 bit sub-register names
  • Understand how data like local variables or return addresses are stored on the stack
  • Understand function calling conventions
  • Be comfortable writing C code in an IDE and reading and stepping through the disassembly
  • Be able to read assembly well enough to determine the expected inputs to influence the control flow for an opaque binary (the infamous Binary Bomb lab)

Refresher: Hex <-> Binary#

Flippy But and the Attack of the Hexadecimals from Base16

This game is USEFUL!
When I saw the two year old crush me at the game I felt bad 😢

Refresher: Signedness & Two’s complement negative numbers#

Wikipedia articles for Signedness and Sign Bit

Refresher: C Data Types Sizes#

C Types

Background: Endianness#

Endianness applies to memory, not registers
Endianness applies to bytes, not bits

Endianness

Computer Registers#

Registers Slides

Memory Hierarchy#

Memory Hierarchy

x86-64 general purpose registers#

x86-64 general purpose registers

Register Usage
RAX Stores Function Return Values
RBX Base pointer for the data section
RCX Counter for string and loop operations
RDX I/O pointer
RSI Source Index pointer for string operations
RDI Destination Index pointer for string operations
RSP Stack pointer
RBP Stack frame base pointer
RIP Instruction Pointer

Microsoft x64 ABI conventions

Your First Instruction: No-Operation (nop)#

NOP Slides

NOP - 🌟 #1 - PDF

Instruction that executes no instruction, I guess 🤷🏻‍♂️

Fun Fact: NOP is just a mnemonic for the XCHG EAX, EAX instruction.

The Stack#

Stack Overview#

Stack Overview Slides

The stack grows towards lower addresses.

You do you on what direction to use when drawing a diagram of it.

What can you find on the stack?

  • Return addresses so a called function can return back to the function that called it
  • Local variables
  • Sometimes used to pass arguments between functions
  • Save space for registers so functions can share registers without smashing the value for each other
  • Save space for registers when the compiler has to juggle too many in a function
  • Dynamically allocated memory via alloca()

Push & Pop Instructions#

Push Pop balance

Push/Pop Slides

Xeno uses the expression r/mX to indicate any of the forms mentioned in the Intel manual: r/m8, r/m16, r/m32 and r/m64.

An r/mX can take 4 forms:

  1. A register -> rbx
  2. Memory, base-only -> [rbx]
  3. Memory, base + index * scale -> [rbx + rcx * X] (For X in {1,2,4,8})
  4. Memory, base + index * scale + displacement -> [rbx + rcx * X + Y]

Mistery Listery 1

Why do the GCC/Clang HelloWorlds have balanced push/pop instructions but Visual Studio doesn’t?

🌟 #2 - PUSH (PUSH Onto the Stack) - PDF
Push a QWORD on the stack and decrements RSP by 8

🌟 #3 - POP (POP a Value From the Stack) - PDF
Pop a QWORD from stack and increment RSP by 8

Calling Functions#

CallASubRoutine#

CallASubroutine1 Slides
CallASubroutine in VS Slides

🌟 #4 - CALL (CALL Procedure) - PDF

CALL transfers control to a different function.
It pushes the address of the next instruction onto the stack and changed the RIP to the address given in the instruction.

The destination address of the target function can be specified with:

  • Absolute Address
  • Relative Address (Relative to the end of the instruction or some other register)

🌟 #5 - RET (RETurn From Procedure) - PDF

The instruction takes two forms:

  1. ret - Pop the top of the stack into RIP (and remember pop implicitly increment the stack pointer RSP)
  2. ret 0x20 - Pop the top of the stack into RIP and add constant number of bytes to RSP

🌟 #6 - MOV (MOVe) - PDF

Can move:

  • Register -> Register
  • Memory -> Register, Register -> Memory
  • Immediate -> Register, Immediate -> Memory

NEVER Memory -> Memory
Memory to memory transfer is not allowed in x86

🌟 #7 - ADD - PDF
Example: add rsp, 8

🌟 #8 - SUB - PDF
Example: sub rax, [rbx*2]

Additions and Subtractions work as expected.
Destination: r/mX or Register
Source: r/mX, Register or Immediate

No source and destination as r/mXs because that would allow memory to memory transfer.

Mistery Listery 2

What’s up with the sub/add 0x28 in main()?

Local Variables#

Various examples to understand how local variables are stored on the stack and get confortable with visualizing the stack layout.

SingleLocalVariable#

SingleLocalVariable Slides

Mistery Listery 3

What’s up with the sub/add 0x28 in main()?

Mistery Listery 3 Solved!#

Mistery Listery 3 Slides
Microsoft x64 stack usage

Why is VS over-allocating space for a single variable?

It was the 16-byte-alignment padding.

Detective Cat

ArrayLocalVariable#

ArrayLocalVariable Slides

This lesson shows how the imul instruction is used to access specific indexes in a local array variable.
Local variables need not be stored on the stack in the same order they are defined in the high level language.

🌟 #9 - IMUL (SIgned MULtiply) - PDF

The VS compiler have a predilection for using imul over mul.
If you see an imul instruction it can be an indication that the binary was compiled with the VS compiler.
There are many forms. For example:

  • imul r/mX
  • imul reg, r/mX
  • imul reg, r/mX, immediate

But they can be divided in 5 groups.

IMUL Group 1 - Single Operand

Form Registers Used
IMUL r/m8 AX = AL * r/m8
IMUL r/m16 DX:AX = AX * r/m16
IMUL r/m32 EDX:EAX = EAX * r/m32
IMUL r/m64 RDX:RAX = RAX * r/m64

IMUL Group 2 - Two Operands

⚠️ Truncation Warning ⚠️

Form Registers Used
IMUL r16, r/m16 r16 = r16 * r/m16
IMUL r32, r/m32 r32 = r32 * r/m32
IMUL r64, r/m64 r64 = r64 * r/m64

IMUL Group 3 - Two Operands

⚠️ Truncation Warning ⚠️

Form Registers Used
IMUL r16, r/m16, imm8 r16 = r/m16 * sign-extended imm8
IMUL r32, r/m32 r32 = r/m32 * sign-extended imm8
IMUL r64, r/m64 r64 = r/m64 * sign-extended imm8

IMUL Group 4 - Two Operands

⚠️ Truncation Warning ⚠️

Form Registers Used
IMUL r16, r/m16, imm16 r16 = r/m16 * imm16

IMUL Group 5 - Two Operands

⚠️ Truncation Warning ⚠️

Form Registers Used
IMUL r32, r/m32, imm32 r32 = r/m32 * imm32
IMUL r64, r/m64, imm32 r16 = r/m16 * sign-extended imm32

🌟 #10 - MOVZX (MOVe with Zero-EXtend) - PDF
🌟 #11 - MOVSX (MOVe with Sign-EXtension) - PDF

Used to move small values/types into larger registers/types.
Support the same forms as a normal MOV.

Zero Extend means the CPU unconditionally fills the high order bits of the larger register with zeroes.
Sign Extend means the CPU fills the high order bits in the destination register with whatever the sign bit is set to on the smaller value.

MOVSXD - PDF

You need to use MOVSXD to sign extend a 32 bit value to 64 bits.
MOVSX works only for 8 or 16 bit values.

StructLocalVariable#

StructLocalVariable Slides

Fields in a struct must be stored in the same order they are defined in the high level language.
Uppermost field at the lowest address.

Function Parameter Passing#

Pass1Parameter#

Pass1Parameter Slides

How function parameters are passed on the stack.

The parameter value passed in a register is still being stored on the stack.
Why?

TooManyParameters#

TooManyParameters Slides

Shadow Store

The x64 Application Binary Interface (ABI) uses a four-register calling convention by default. Space is allocated on the stack as a shadow store for callees to save those registers.
The caller must always allocate sufficient space to store four register parameters, even if the callee doesn’t take that many parameters.
Any parameters beyond the first four must be stored on the stack after the shadow store before the call.

The Shadow Store is explained in the Microsoft x64 calling convention documentation.

The Shadow Knows

The Microsoft compiler specifically implements a calling convention that not only passes the first 4 arguments through registers, but also reservers shadow store for them on the stack.

The callee has the responsibility of dumping the register parameters into their shadow space if needed.

Compiler reserves this space even if no function parameters are passed to another function.

Mistery Listery 2 Solved!#

Mistery Listery 2 Slides

What’s up with the sub/add 0x28 in main()?

It was the Microsoft x64 ABI Shadow Space.

Scooby-Doo Dance

64-bit Calling Conventions#

Calling Conventions Slides
x86 Calling Conventions Wikipedia
System V ABI Specification
MS x64 ABI Specification

ABI (Application Binary Interface) - Standard on how executables are supposed to work on an architecture.

In calling conventions there are two main elements of importance:

  • Register conventions for which register belong to the caller vs. callee
  • Parameter-Passing Conventions

Both of them are compiler-dependent.
The lesson talks about the MS x64 ABI and the System V x86-64 ABI.

Caller-save Registers

The caller should assume they will be changed by the callee.
MS also calls them volatile registers.

The caller is in charge of saving the value before a call to a subroutine and restoring their value after the call returns.

Caller will typically save registers right before the call and restore right after the call.

Callee-save Registers

The caller should assume they will not be changed by the callee. MS calls them non-volatile registers.

Callee will typically save registers at the beginning of the function and restore them at the end of the function.

Caller vs. Calle Registers

Parameter-Passing Conventions

x86-64 compilers use a subset of the Caller-save registers to pass parameters into and out of functions.

RAX or RDX:RAX passes out the return value.
RAX holds anything 64 bits or smaller.
RDX:RAX can be used to return 128 bit values.

In Microsoft the first 4 parameters are put into registers, the rest on the stack so that the left-most parameter is at the lowest address.

In the System V ABI the first 6 parameters are passed in the register, the rest on the stack.

Parameter Passing Order

32-bit Calling Conventions#

32-bit Calling Conventions Slides

In 32-bit code there are more calling conventions in use.

  • cdecl - Default for most C code. Caller cleans up stack
  • stdcall - For Windows’ Win32 APIs. Callee cleans up the stack

Function parameters are pushed onto the stack from right to left.
Leftmost parameter (the first one in the function) ends up at the lowest address.

Caller Calle Register 32 bit

Both cdecl and stdcall conventions perform explicit stack frame linkage.
Each time a function is entered, the old base pointer is pushed onto the stack and the new stack pointer gets moved into the base pointer.

32 Bit Max Stack

Mistery Listery 1 Solved!#

Mistery Listery 1 Slides

The Windows ABI generally doesn’t use frame pointers for most code.

64 Bit Visual Studio Max Stack

When does VS use a frame pointer?

[1] MS x64 stack usage
[2] MS x64 prolog and epilog

From [1], it does if _alloca() is used.

_alloca is required to be 16-byte aligned and additionally required to use a frame pointer.
If space is dynamically allocated (alloca) in a function, then a non-volatile register must be used as a frame pointer to mark the base of the fixed part of the stack and that register must be saved and initialized in the prolog.

Note it said “a non-volatile register”.
It didn’t say it had to be RBP.
In [2] they have an example using R13 instead of RBP.

Not clear is there are any other situations.

64 Bit Visual Studio Frame Pointer Max Stack

SysV Max Stack Diagram

64 Bit SYS V Max Stack

Mistery Solved

Why do the GCC/Clang HelloWorlds have balanced push/pop instructions but Visual Studio doesn’t?

Differences between MS & System V ABIs.

Mistery Solved Simpson

SpecialMaths#

SpecialMaths Slides

🌟 #12 - LEA (Load Effective Address) - PDF

Usually the bracket [] syntax in assembly means dereference.
The value in the brackets should be used as a memory address and the value at that address should be fetched.

The LEA instruction is an exception to the rule.

Example:
rbx = 0x2,
rdx = 0x1000,
lea rax, [rdx+rbx*8+5].

rax = 0x1015, not the value at 0x1015

Control Flow#

Conditional vs. Unconditional Control Flow

GotoExample#

GoToExample Slides

🌟 #13 - JMP (Jump) - PDF
Unconditionally change RIP to given address.

IfExample#

IfExample Slides

🌟 #14 - Jcc (Jump if condition Is Met) - PDF

There are more then 3 pages of conditional jump types!
A lot of them are synonyms for each other.

RFLAGS

RFLAGS is just EFLAGS zero-extended to 64 bits.

RFLAGS register holds many single bit flags. The ones we need now are:

  • Zero Flag (ZF) - Set if the result of some instruction is zero. Cleared otherwise.
  • Sign Flag (SF) - Set equal to the most-significant bit of the result, which is the sign bit of a signed integer (0 indicates a positive value and 1 indicates a negative value)

RFLAGS Todo

Notable Jcc Instructions

Instruction Condition
JZ/JE if ZF == 1
JNZ/JNE if ZF == 0
JLE/JNG if ZF == 1 or SF != OF
JGE/JNL if SF == OF
JBE/JNA if CF == 1 or ZF == 1
JB if CF == 1

Mnemonic Translations

Modifier Translation
B Below - Unsigned Notion
A Above - Unsigned Notion
N Not - Like Not less than JNL
G Greater Than - Signed Notion
L Less Than - Signed Notion
E Equal - Same a Z, Zero flag set

FF BANGLE

Flag setting

Before you can do a conditional jump, you need something to set the condition status flags for you.
Typically done with CMP, TEST or whatever instructions happen to have flag-setting side-effects.

🌟 #15 - CMP (CoMPare Two Operand) - PDF

The comparison is performed by subtracting the second operand from the first operand and then setting the status flags in the same manner as the SUB instruction

The difference with SUB is with CMP the result is computed, the flags are set, but the result is discarded.

IfExample2#

IfExample2 Slides

The hardware doesn’t know or care about whether the humans are interpreting the bits as signed or unsigned. That’s the compiler problem to sort out.

The compiler must emit instructions which treat the bits as signed or unsigned based on what’s specified in the high level language.

SwitchExample#

SwitchExample Slides

Switch statements tend to look like a bunch of if-equal checks

Boolean Logic#

Refresher: Boolean logic

Difference between logical operator (&&, ||, !) and bitwise operators (&, |, ^, ~) in C.

Learn-C.org bitmask section
Wikipedia Boolean algebra article

BooleanBecause#

BooleanBecause Slides

🌟 #16 - AND (Bitwise AND) - PDF
🌟 #17 - OR (Bitwise OR) - PDF
🌟 #18 - XOR (Bitwise XOR) - PDF
🌟 #19 - NOT (One’s Complement Negation) - PDF

Takeway:

C binary operators correspond 1:1 to assembly instructions

ForLoopNoRet#

ForLoopNoRet Slides

🌟 #20 - INC (INCrement) - PDF
🌟 #21 - DEC (DECrement) - PDF

The use of the INC and DEC instruction is discouraged by the Intel manual.
In absence of an explicit return value, VS & GCC default to returning 0.

BitmaskExample#

Bitmask Slides

🌟 #22 - TEST (Logical Compare) - PDF

Compute the bit-wise AND of the first operand and the second operand and sets the SF, ZF, and PF status flags according to the result

TEST acts sorta like CMP - sets flags, and throws away the result

Don't Get Cocky

Bit Shifting#

ShiftExample1#

ShiftExample1 Slide

Time to get sw..emh..shifty in here!

Let's get Schwifty

🌟 #23 - SHL (SHift Logical Left) - PDF

Explicitly used with the << C operator.
Bit shifted off the left hand side are “shifted into” (set) the carry flag (CF).

🌟 #24 - SHR (SHift Logical Right) - PDF

Explicitly used with the >> C operator.
Bit shifted off the right hand side are “shifted into” (set) the carry flag (CF).

ShiftExample2Unsigned#

ShiftExample2Unsigned Slide

When a multiply or divide is by power of 2, compilers prefer shift instructions as a more efficient way to perform the computation.

ShiftExample3Signed#

ShiftExample3Signed Slides

The power of sign is a curious thing! Make one man weep, make another man sing!

🌟 #25 - SAR (Shift Arithmetic Right) - PDF

Explicitly used with the >> C operator, if operands are signed.

🌟 #26 - SAL (Shift Arithmetic Left) - PDF

Actually behaves exactly the same as SHL!

CDQ (Convert Doubleword to Quadword) - PDF

Multiply and Divide#

MulDivExample#

MulDivExample Slides

🌟 #27 - DIV (Unsigned DIVide) - PDF

Three forms

Dividend Divisor Quotient Remainder
ax r/m8 al ah
edx:eax r/m32 eax edx
rdx:rax r/m64 rax rdx

If dividend is 32/64 bits, edx/rdx will just be set to 0 by the compiler before the instruction.
If the divisor is 0, a divide by zero exception is raised.

🌟 #28 - IDIV (SIgned DIVide) - PDF

Same three forms as DIV.

I should clarify one minor thing here:
the DIV instruction, actually only takes 1 parameter, not 2, despite what Visual Studio (and I, in this video) show you.
Visual Studio, and I, show two parameters since that way you don’t need to remember that ax, eax, rax are implicitly the dividend.
But other disassemblers, like IDA, may show you the more accurate instruction like “div rcx” and expect you to remember that it’s dividing rax by rcx.
Later on when you learn to Read The Fun Manual, you can look at DIV and verify that this is true.

Takeaways:
When a multiply or divide is not by a power of 2, compilers will use normal multiply/divide instructions

CISC Delight: REPeatable Instructions#

ArrayLocalVariable2#

ArrayLocalVariable2 Slides

🌟 #29 - REP STOS (REPeat STOre String) - PDF
REP (REPeat String Operation Prefix) - PDF

STOS is one of number of instructions that can have the REP prefix added to it, which repeat a single instruction multiple times.
Either stores 1, 2, 4, or 8 bytes at a time from al/*ax to [*di].
Increments the *di register by 1/2/4/8 bytes at a time, so that the repeated store operation is storing into consecutive locations.

Before the actual REP STOS occurs:

  • Set *di to the start destination
  • Set *ax/al to the value to store
  • Set *cx to the number of times to store

All rep operations use the *cx register as a counter to determine how many timees to loop through the instruction.

Takeaways:
If you’re manually coding asm, REP STOS is functionally a memset().
Sometimes when you use memset() from C, the compiler may turn it into a REP STOS.

ThereWillBe0xb100d#

ThereWillBe0xb100d Slides

Q: Where does the rep stos come from in this example?
A: Compiler-auto-generated code. From the stack frames runtime check option. This is enabled my default in the debug build.

JourneyToTheCenterOfMemcpy#

Journey2Memcpy Slides

🌟 #30 - REP MOVS (REPeat MOVe Data From String to String) - PDF

Fill 1/2/4/8 bytes at a time from [*si] to [*di].
Move the *di and *si registers forward 1/2/4/8 at a time, so that the repeated store opeation is storing into consecutive locations.

Before the actual rep movs occurs:

  • Set *si to the start source
  • Set *di to the start destination
  • Set *cx to the number of times to store

Note: Unlike MOV, MOVS can move memory to memory, but only between [*si] and [*di].

Direction Flag (DF)

🌟 - DF (Direction Flag)

Used to decide whether *si and *di should be incremented or decremented.
The DF decides (for the rep movs instructions) if the incremental copies are made towards higher or lower memory addresses.

Oh Sh**! ACCUMULATOR (A Boss Appear)#

ACCUMULATOR

Looking at all those examples on Linux!#

In this section, Xeno goes over all the previous examples but using Linux/GCC and gdb.
The examples are useful to study how the code is compiled by GCC and how to read the AT&T syntax.

All the examples are compiled without optimization and without stack canaries:

gcc -fno-stack-protector example.c

I’m going to talk about a particular example only if new informations are presented.

Intel vs. AT&T assembly syntax

AT&T Syntax Slides

CallAFunction

ENDBR64 (Terminate [END] an Indirect BRanch in 64-bit Mode) - PDF

Instruction related to Intel CET (not important at the moment).
It acts as a NOP in architecture that do not support CET.

CET - Control Flow Enhancement Technlogy
Writeup of how Windows uses CET
Intel CET manual

SingleLocalVariable

SingleLocalVariable GCC Slides

Why is the program accessing the end of the stack?

Because it is a leaf function.

Stack Frame with Base Pointer SysV

The 128-byte area beyond the location pointed to by %rsp is considered to be reserved and shall not be modified by signal or interrupt handlers.
Therefore, functions may use this area for temporary data that is not needed across function calls.
In particular, leaf functions may use this area for their entire stack frame, rether than adjusting the stack pointer in the prologue and epilogue.
This area is known as the red zone
.

Pass1Parameter

Pass1Parameter GCC Slides

TooManyParameters

TooManyParameters GCC Slides

SpecialMaths

SpecialMaths GCC Slides

🌟 #31 - LEAVE (Exit a function) - PDF

It’s literally just the same thing as the two instructions you’d typically expect to see right before you return from a function that is using stack frames:
mov rsp, rbp
pop rbp

ForLoopNoRet

Differently to VS, GCC does not use the INC and DEC instructions.

ShiftExample3Signed

Explain why there is the particular math before the operation.

🌟 #32 - CMOVcc (Conditional MOVe) - PDF

From a user in the comments of the lesson:

TLDR - the reason for the addition of 0xF is to get a correct answer when dealing with odd negative numbers.
First lets understand the code (eax currently holds argv[1]):

0x0000555555555164 <+27>: lea 0x0(,%rax,8),%edx   // multiply eax (a) by 8 and store into rdx (b)
0x000055555555516b <+34>: lea 0xf(%rdx),%eax      // add 0xf to rdx (b) and store into eax (now b+0xf)
0x000055555555516e <+37>: test %edx,%edx          // test b with itself (b &b)
0x0000555555555170 <+39>: cmovns %edx,%eax        // if b is NOT negative, move b into eax, else do the arithmetic on eax (b+0xf)

So, if our number is positive, the arithmetic will be done regularly, but if our number is negative, the arithmetic will be preformed over number+0xF.
Now why is that? who is affected?
0xF will add 1111 to the least significant nibble of the number after a LSH(3).
Let us look at odd and even numbers after LSH(3):

odd: xxxx1000
even: xxxx0000

Adding 0xF will only affect an odd number considering that a RAS(4) is applied afterwards.

Let's see why it does not affect an even number:
00001100 (LSH3) = 01100000 + 0xF = 01101111 (RAS4) = 00000110   // the 0xF did not have any effect, rather than going into CF.

Now if we take a look at an odd negative number, we see indeed that the addition is needed in order to get the right result:
-31 (LSH3) = -248 (RAS4) = -16
-31 (LSH3) = -248 + 15 = -233 (RAS4) = -15

Hope that helps a bit.

MulDiv Example

This example shows how the division in the source is translated to assembly and how GCC (and other compilers) use Reciprocal Multiplication instead of executing a regular division.

Binary Fractions

Note:

This is a rathole which I don’t really think you need to go down.
You just need to know it exists.
Unless you’re going to be studying compiler optimizations, for the most part you don’t care about this, you can just step through the instructions which do a divide-equivalent, and see the result and move on.

ArrayLocalVariable2

ArrayLocalVariable2 Slides

Oh No! GNASTY ACCUMULATOR is here!#

He’s here to avenge his brother!

Gnasty ACCUMULATOR

Learning to Fish: Read The F*n Intel Manual!#

MOOOORE

MOOORE!!

MOOOORE CAT

RTFM Slides

Volume 1

Volume 2

  • Sections 1.3.1 “Bit and Byte Order”
    This shows how you can expect to see bits and bytes represented in the rest of the manual.
    Low addresses low, high addresses high ;)
    But if you’re coming from another architecture, their manuals may not have used this convention, so you should know how Intel writes stuff.

IF you want to be an expert… (and I mean, if you got this far, I have to assume you want to be an expert ;)) you should read the manual page for every single instruction we learned about in this class.
This way you can see where I was simplifying stuff for you, and learn how things actually work.
You can also then know what you don’t know.
I.e. if it starts mentioning stuff you don’t know about like segments/segmentation, far control flow transfer, privilege levels, page faults, etc, don’t worry about it (and just skip over it). You can learn about that in Arch2001!

Learning to Fish: Writing Inline Assembly#

WritingAsm Slides

Assembly can be written in two ways:

  • Inline Assembly: written in place inside C/C++ files. VS doesn’t support it for 64 bit.
  • Standalone Assembly: Pure assembly in a standalone file, assembled by an assembler into an object file, and then linked into the final binary with other compiled objects.

MASM - Microsoft Assembler
NASM - Netwide Assembler (Cross-Platform)

Most of the time we can write raw bytes instructions both in inline and standalone assembly.

Visual Studio#

WritingAsm in VS Slides

A subset of assembly instructions can be inserted into C code by using VS compiler intrinsics.
Visual Studio x86-64 intrinstics

GCC inline Assembly#

WritingAsm in GCC Slides

The Most Important Assembly Exercise You’ll Ever Do: Binary Bomb Lab#

In this intro, Xeno explain why this is the most important assembly exercise.

This lab is kinda the weed-out exercise for the paths that build on this knowledge.
You have to have a particular personality type that allows you to be very detail-oriented, singe-mindedly focused, so that you can understand what is going on with software when faced with a lot of ambiguity and a lot of difficult choices.
It’s better to find out now that this is not for you.

If you are interested, a student of OST2 made a Binary Bomb Lab WriteUp

(Optional) Basic Buffer Overflow Lab#

I guess the ritual citation to the classic Smashing the stack for fun and profit was finally due 🤗

Conclusion#

Well, Thank you Xeno!