Compiling C to FASM


It is well-known that programs written in assembler are much smaller than programs of same functionality written in high-level language, such as C. Convenience and portability has their price.

I want to talk about way to sacrifice portability (x86 & x86_64 only) to drastically reduce code size. Given that today most computers of that architecture are overpowered -- at least 512Mb of RAM and many gigabytes of disk storage -- this is not going to be life-changer, yet I am surprised that I have never saw it before.

Let us talk about this simple program that exits with code specified on command line:

#include <inttypes.h>

int64_t s2uint(const unsigned char *s)
{
    uint64_t res = 0;
    unsigned char c;

    for (; c = *s; s++) {
    	c -= '0';
    	if (c > 9) {
    		return -1;
    	}
    	res = 10 * res + c;
    }

    return res;
}

int main(int argc, char **argv)
{
    if (argc != 2)
    	return -1;
    return s2uint(argv[1]);
}

Even compiled with size optimization, linked with musl^1 C library instead of glibc^2 and stripped, resulting binary is around 13Kb. Sure, you can't just dump raw instruction opcodes into file and call it a day, ELF format has its overhead, but 13Kb overhead for barely 100 bytes of code?!

^1
^2

What we are going to do to is ask compiler to just output assembler code for these functions. Compilers are good at compiling, but we'd rather do linking ourself. We will use fasm^3 assembler for that.

^3

Unfortunately, syntax of assembler file generated by neither GCC nor clang is fully compatible with fasm, so some minor automatic post-processing is required. After that we need to write a tiny bit of assembler code.

include "out/cc/s2uint.fasm"
include "out/cc/exit.fasm"

entry $
    mov rdi, [rsp]
    lea rsi, [rsp + 8]
    call main
    mov edi, eax
    mov eax, sys_exit
    syscall

It forwards command line arguments to main function and after it returns, invokes "exit" system call with appropriate argument. Our result is 201 byte. We managed reduce program size by factor 26.

Of course, it all went smoothly because our program did not use any functions from standard library. Otherwise we would have to untangle them from sources of standard library, and they never were intended for that. I am yet to discover how much work it will be to compile some real application, like text browser or git this way.

What I described is crude hack. It is shame that after decades of theoretical research and practical engineering of optimizing compilers, none of them is capable to generate binary of size even remotely close to optimal. See elaborate comparison here^4

^4