There is a recent publication at nature.com, Faster sorting algorithms discovered using deep reinforcement learning, where it talks about AlphaDev discovering a faster sorting algorithm. This caught my interest, and I've been trying to understand the discovery.
Other articles about the topic are:
- AlphaDev discovers faster sorting algorithms
- Understanding DeepMind's AlphaDev Breakthrough in Optimizing Sorting Algorithms
Here is the pseudo-code of the original sort3 algorithm against the improved algorithm that AlphaDev discovered.
Original Pseudo-Code
Memory [0] = A
Memory [1] = B
Memory [2] = C
mov Memory[0] P  // P = A
mov Memory[1] Q  // Q = B
mov Memory[2] R  // R = C
mov R S
cmp P R
cmovg P R  // R = max(A, C)
cmovl P S  // S = min(A, C)
mov S P    // P = min(A, C)
cmp S Q
cmovg Q P  // P = min(A, B, C)
cmovg S Q  // Q = max(min(A, C), B)
mov P Memory[0]  // = min(A, B, C)
mov Q Memory[1]  // = max(min(A, C), B)
mov R Memory[2]  // = max(A, C)
AlphaDev Pseudo-Code
Memory [0] = A
Memory [1] = B
Memory [2] = C
mov Memory[0] P  // P = A
mov Memory[1] Q  // Q = B
mov Memory[2] R  // R = C
mov R S
cmp P R
cmovg P R  // R = max(A, C)
cmovl P S  // S = min(A, C)
cmp S Q
cmovg Q P  // P = min(A, B)
cmovg S Q  // Q = max(min(A, C), B)
mov P Memory[0]  // = min(A, B)
mov Q Memory[1]  // = max(min(A, C), B)
mov R Memory[2]  // = max(A, C)
The improvement centers around the omission of the single move command, mov S P. To help understand, I wrote the following assembly code. However, my testing shows that the sorting algorithm does not work when A=3, B=2, and C=1, but it does work when A=3, B=1, and C=2.
This is written, compiled, and run on Ubuntu 20.04 Desktop.
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.6 LTS
Release:    20.04
Codename:   focal
$ nasm -v
NASM version 2.14.02
$ ld -v
GNU ld (GNU Binutils for Ubuntu) 2.34
My assembly code test...
; -----------------------------------------------------------------
;
; sort_test.asm
;
; Test for AlphaDev sorting algorithm
;
; My findings are that AlphaDev's removal of 'mov S P' doesn't work when:
;   a = 3, b = 2, c = 1
; But it does work with:
;   a = 3, b = 1, c = 2
;
; Output: The sorted values of a, b & c printed to stdout with spaces
;
; Compile & run with:
;
; nasm -f elf32 sort_test.asm && ld -m elf_i386 sort_test.o -o sort_test && ./sort_test
;
; -----------------------------------------------------------------
global _start
section .data
  a equ 3
  b equ 2
  c equ 1
section .bss
  buffer resb 5
section .text
_start:
; ------------------- ; AlphaDev pseudo-code
  mov eax, a          ; P = A
  mov ecx, b          ; Q = B
  mov edx, c          ; R = C
  mov ebx, edx        ; mov R S
  cmp eax, edx        ; cmp P R
  cmovg edx, eax      ; cmovg P R  // R = max(A, C)
  cmovl ebx, eax      ; cmovl P S  // S = min(A, C)
; The following line was in original sorting algorithm,
; but AlphaDev determined it wasn't necessary
;  mov eax, ebx       ; mov S P   // p = min(A, C)
  cmp ebx, ecx        ; cmp S Q
  cmovg eax, ecx      ; cmovg Q P  // P = min(A, B)
  cmovg ecx, ebx      ; cmovg S Q  // Q = max(min(A, C), B)
; add values to buffer with spaces
  add eax, 30h
  mov [buffer], eax
  mov [buffer+1], byte 0x20
  add ecx, 30h
  mov [buffer+2], ecx
  mov [buffer+3], byte 0x20
  add edx, 30h
  mov [buffer+4], edx
; write buffer to stdout
  mov eax, 4      ; sys_write system call
  mov ebx, 1      ; stdout file descriptor
  mov ecx, buffer ; buffer to write
  mov edx, 5      ; number of bytes to write
  int 0x80
  mov eax, 1      ; sys_exit system call
  mov ebx, 0      ; exit status 0
  int 0x80
I've run this test on the command line to print the results of the sort, but I also used gdb to step through this executable line-by-line. During this debugging, I clearly see that the register for "A", aka "P", aka "eax", is never updated when A=3, B=2, and C=1 but is updated when A=3, B=1, and C=2.
Full disclosure... I'm not an assembly programmer. I'm also not proficient in any other specific language, but I've experimented with C, C++, Javascript, PHP, & HTML to get small projects done. Basically, I'm self taught on what I do know. To get to the point to write this test, I've had to learn quite a bit. Therefore, I could certainly be making mistakes or not understanding the problem.
Anyway, please help me understand why I'm observing what I am.
- Am I misunderstanding the problem?
- Am I misunderstanding the pseudo-code?
- Am I making a mistake transforming the pseudo-code into assembly?
- Is there a mistake with my assembly code?
- Is the pseudo-code wrong?

 
     
    
 
     
    