On Transpilers

Transpilers, or source-to-source compilers, is a type of a translator app that converts source code from one programming language to the other. You might argue that compiler does the same thing, translating it into assembly or machine code. Transpilers are different in a sense that they operate at the similar abstraction levels: they convert high-level languages into high-level languages, keeping the semantics and using equivalent code constructs where possible.

There are plenty of transpilers that offer new syntax for existing languages (Google Closure Compiler, CoffeeScript, Dart, Haxe or TypeScript and many others). Some of them “compile” things to C, assuming that C is portable and can be compiled further to any target platform.

Here, on the other hand, we’ll pay attention to the transpilers from C into other high-level languages such as Go, Rust and Zig. These are direct competitors to C in many spheres and transpiler becomes a powerful mechanism to increase the adoption rate of such languages.

Test source code

My initial thought was to challenge the transpilers with some programs from the obfuscated C compiler contest - IOCCC. However this idea quickly revealed its absurdity because many transpilers failed to recognise it as a valid C code, and probably for the best.

Instead I digged out my old toy BASIC interpreter, which was short enough to analyse by hand, was self-contained and had no external dependencies, was not even using dynamic memory to simplify the integration with memory-safe or garbage-collected languages. Here’s the complete source code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int vars[27];    /* global variables, 'a'..'z' */
char line[64];   /* current input line */
char code[1024]; /* complete program code */
char *p;         /* current pointer */

/* reads a number from a stream */
static int num() { return strtol(p, &p, 10); }
/* returns the end of the current line */
static char *eol() { return strchr(p, '\n'); }
/* skips while cond(c) is true */
#define eat(cond) for (int c; (c = *p) && (cond); p++)
/* skips whitespace */
#define space() eat(c == ' ' || c == '\t')
/* skips until whitespace */
#define token() eat(c != ' ' && c != '\t')

static int expr();
static int var() {
  int c = *p | 0x20;
  if (c >= 'a' && c <= 'z') {
    p++;
    return c - 'a' + 1;
  }
  return 0;
}
static int atom() {
  space();
  if (*p == '-') { p++; return -atom(); }
  if (*p >= '0' && *p <= '9') { return num(); }
  if (*p == '(') {
    p++;
    int o = expr();
    space();
    p++;
    return o;
  }
  return vars[var()];
}
static int add() {
  int x = atom();
  space();
  if (*p == '+') {
    p++; return x + atom();
  } else if (*p == '-') {
    p++; return x - atom();
  }
  return x;
}
static int mul() {
  int x = add();
  space();
  if (*p == '*') {
    p++; return x * mul();
  } else if (*p == '/') {
    p++; return x / mul();
  }
  return x;
}
static int expr() {
  int x = mul(); space();
  if (*p == '=') { int y = expr(); return x == y; }
  return x;
}
static char *find(int n) {
  p = code;
  while (*p) {
    char *line = p;
    int i = num();
    p = eol() + 1;
    if (i >= n) { return p = line; }
  }
  return p = (code + strlen(code));
}
static void stmt() {
  space();
  char *s = p;
  int v = var();
  space();
  if (p[0] == '=') {
    p++;
    vars[v] = expr();
    return;
  }
  p = s;
  if (p[0] == 'p') {
    token();
    space();
    if (p[0] == '"') {
      p++;
      eat(c != '"') { putchar(c); }
      p++;
    } else {
      printf("%d", expr());
      space();
    }
    space();
    if (*p != ';') { printf("\n"); } else { p++; }
  } else if (p[0] == 's') {
    token();
    space();
  } else if (p[0] == 'g') {
    token();
    int line = expr();
    find(line);
  } else if (p[0] == 'i') {
    int c = p[1];
    token();
    space();
    if (c == 'n') {
      printf("input %s", p);
    } else {
      if (expr()) { stmt(); }
    }
  }
}
static void run() {
  p = code;
  for (;;) {
    int n = num();
    if (n == 0) { return; }
    stmt();
  }
}
static void xmove(void *dst, const void *src, unsigned int n) {
	char* d = dst;
  char* s = src;
  if (s < d) {
    s += n;
    d += n;
    while (n--) *--d = *--s;
  } else {
    while (n--) *d++ = *s++;
  }
}
static int loop() {
  for (;;) {
    int n;
    if (fgets(line, 64, stdin) == NULL) {
      break;
    }
    p = line;
    if (n = num()) {
      char *start = find(n);
      char *end = eol();
      if (num() != n) {
        end = start;
      }
      if (end) {
        xmove(start + strlen(line), end, strlen(end));
      }
      xmove(start, line, strlen(line));
    } else {
      switch (line[0]) {
      case 'n': code[0] = 0; break;
      case 'r': run(); break;
      case 'l': puts(code); break;
      }
    }
  }
  return 0;
}
int main() { return loop(); }

Go

Let’s go alphabetically. Transpiler from C to Go made the news when SQLite was converted to pure Go without any of the cgo code - https://pkg.go.dev/modernc.org/sqlite. Available at https://gitlab.com/cznic/cc this transpiler is built on top of a custom C lexer/parser and AST walker that runs in multiple phases to convert C AST to Go code. For libc features it uses a custom library - https://pkg.go.dev/modernc.org/libc - as well as the similar project for the CRT (C runtime). These libs become the hard dependencies of the generated code.

go mod init basic
go get modernc.org/libc
ccgo basic.c
go build -o basic-go

The resulting source code seems to be quite long, but most of that is comments. It’s only 1000 lines of pure code. It imports “math”, “reflect”, “unsafe” and “sync/atomic” apart from the modernc libs. The code, however, is barely readable - all is wrapped into unsafe.Pointer casting and spiced up with goto. Does it work though? Sometimes. I was able to enter some basic code and run it, but it crashed on “list”:

10 print "hello world"
run
hello world
list
libc.go:199:Xputs: TODOTODO
goroutine 1 [running, locked to thread]:
runtime/debug.Stack()
        /opt/homebrew/Cellar/go/1.18.1/libexec/src/runtime/debug/stack.go:24 +0x68
modernc.org/libc.todo({0x0?, 0x1004a5e40?}, {0x0?, 0x0?, 0x0?})
        /Users/serge/go/pkg/mod/modernc.org/libc@v1.16.12/etc.go:100 +0x10c
modernc.org/libc.Xputs(...)
        /Users/serge/go/pkg/mod/modernc.org/libc@v1.16.12/libc.go:199
main.loop(0x1000144d0?)
        /Users/serge/src/serge/tmp/transpile/a_darwin_arm64.go:5007 +0x214
main.main1(0x140000021a0?, 0x14280?, 0x26?)
        /Users/serge/src/serge/tmp/transpile/a_darwin_arm64.go:5016 +0x20
modernc.org/libc.Start(0x1003dca18)
        /Users/serge/go/pkg/mod/modernc.org/libc@v1.16.12/libc.go:125 +0x164
main.main()
        /Users/serge/src/serge/tmp/transpile/a_darwin_arm64.go:22 +0x28

The hint is that puts() is not implemented in libc. Replacing it with a printf and re-generating the Go code helped. Go wins in the simplicity category and is surprisingly convenient.

With all the amount of troubles that cgo brings into Go, I’d say ccgo is a good solution for migrating existing C code to Go. Since generated code closely followed the original code structure – further refactoring may happen function by function, if needed.

Rust

Rust is known to have a c2rust tool. This was the least user-friendly tool to install, at least on Mac. Go transpiler requires a single go install and is self-contained, Zig transpiler comes with Zig toolchain already, but c2rust is an external tool that requires LLVM and a few other dependencies to be installed first. No big deal, moving on.

Invoking c2rust is also different from other transpilers, because it’s targeted at large C codebases from what I could tell - it can’t transpile a single file, but can transpile a project that uses CMake, or generate transpiler config in a few other ways. All right, adding the most minimal CMakeLists.txt and running cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=1 ..:

cmake_minimum_required(VERSION 3.1)
project(basic C)
add_executable(basic basic.c)

This generates compile_commands.json that we can hopefully feed into c2rust. Somehow the transpiler process exited with an error, complaining about some internal headers in macOS SDK, but basic.rs has been generated. It happened to be the shortest generated transpiled code. Names and structure matches the original C code, but variable names are mangled and some switch/case statements converted char constants to integer ASCII codes.

Does it run? Yes. Nothing to be fixed, using the libc crate, this transpiled basic interpreter runs fine without any tweaks. Rust loses in the user experience category but wins in reliability after all.

Zig

Finally, it’s Zig turn. Since Zig compiler is also a C compiler itself, it comes with a convenient translate-c option:

zig translate-c basic.c > basic.zig
zig build-exe --name basic-zig basic.zig 

This creates a Zig file or ~3500 LOC where the top and the bottom part are Zig definitions for stdlib parts and the actual transpiled code is hidden in the middle of the file. The code is much more readable than Go translation and follows not only the function structure, but also the structure of conditionals and loops. Even the variable names match C code most of the time.

Does it work? Almost. Originally it failed because my C code was using memmove() which is not part of the Zig C runtime yet. Once I replaced that with a hand-written xmove - everything worked like a charm. Thus, Zig is doing a good job, but still shows the signs of its young age. Resulting executable is as large as the original C implementation and works just like that. If I was to judge how close the transpiler result is to its origins - Zig would be the winner.

The “better C” world

It’s hard to imagine C fading away any time soon, but these transpilers make the sunset of C closer. It’s great that popular C libraries and apps get translated to safer languages and even though there is no clear winner in the “better C” niche yet - fair competition makes it even better.

I hope you’ve enjoyed this article. You can follow – and contribute to – on Github, Mastodon, Twitter or subscribe via rss.

Jul 06, 2022

See also: A "Better C" Benchmark and more.