Nokia Composer in 512 bytes

Those who ever owned an old Nokia phone like 3310 or 3210 might still remember its wonderful ability to compose your own ringtones straight on the phone’s keyboard. By arranging notes and pauses you could end up with a popular tune beeped out of the phone’s speaker, and furthermore, you could share it with your friends! If you missed that era, here’s how it looked like:

nokia-composer

Unimpressed? Well, trust me, it was really cool back then especially if you are into music.

The music notation and the format used in the Nokia Composer is known as RTTTL (Ring Tone Text Transfer Language) and it is still widely used among the hobbyists to play monophonic tunes on Arduino and such.

RTTTL allows you to write music only for one “voice”, notes can only be played sequentially, with no chords or polyphony. This limitation is, however, the killer feature of the format, because it’s easy to write, easy to read, easy to parse, and easy to play.

In this article, we will try to build an RTTTL player in JavaScript, with a bit of code-golfing and math hacks to make it as small as possible.

parsing rtttl

There is a formal grammar for RTTTL - it consists of three parts: song name, song defaults like tempo (BPM), default octave, and default note duration. We will however mimic the behavior of the Nokia Composer itself and will only parse the melody part and treat BPM tempo as a separate input. The name of the song and the rest of the defaults will be left out of the scope.

The melody is just a sequence of notes/rests separated by commas with some optional whitespace. Each note consists of a duration (2/4/8/16/32/64), the note pitch (c/d/e/f/g/a/b), possibly a sharp sign (#), and an octave number (1-3 since only three octaves have been supported).

The simplest approach would be to use regexps. New browsers come with a very convenient matchAll function that returns a collection of all matches within the string:

const play = s => {
  for (m of s.matchAll(/(\d*)?(\.?)(#?)([a-g-])(\d*)/g)) {
    // m[1] is optional note duration
    // m[2] is optional dot in note duration
    // m[3] is optional sharp sign, yes, it goes before the note
    // m[4] is note itself
    // m[5] is optional octave number
  }
};

The first thing to figure out about each note is how to convert it into a frequency. Of course, we can make a hashmap for all possible seven letters of the notes, but since note letters are sequential - it should be simpler to treat them as numbers. For each note letter, we get a char code (ASCII code). For ‘A’ that would be 0x41 and for ‘a’ that would be 0x61. For ‘B/b’ that would be 0x42/0x62, for ‘C/c’ - 0x43/0x63 and so on:

// 'k' is an ASCII code of the note:
// A..G = 0x41..0x47
// a..g = 0x61..0x67
let k = m[4].charCodeAt();

We shall probably ignore the upper bits and only use k&7 as the note index (a=1, c=2, …, g=7), but what’s next? The next part is unpleasantly tricky, and it’s due to the music theory. If only we had 7 notes, yet, there are 12. And these sharp/flat notes are squeezed between the regular notes in a very uneven manner:

         A#        C#    D#       F#    G#    A#         <- black keys
      A     B | C     D     E  F     G     A     B | C   <- white keys
      --------+------------------------------------+---
k&7:  1     2 | 3     4     5  6     7     1     2 | 3
      --------+------------------------------------+---
note: 9 10 11 | 0  1  2  3  4  5  6  7  8  9 10 11 | 0

As we see here, the note index within the octave is increasing faster than the (k&7) note code. Also, it’s increasing not linearly: the “distance” between E and F or between B and C is 1 semitone and not 2 like between the rest of the notes.

Intuitively, we may try multiplying (k&7) by 12/7 (there are 12 semitones and 7 notes):

note:          a     b     c     d     e      f     g
(k&7)*12/7: 1.71  3.42  5.14  6.85  8.57  10.28  12.0

If we look at these numbers without the fractional components - we immediately notice they are already non-linear, much like we expected:

note:                 a     b     c     d     e      f     g
(k&7)*12/7:        1.71  3.42  5.14  6.85  8.57  10.28  12.0
floor((k&7)*12/7):    1     3     5     6     8     10    12
                                  -------

..But not in the way we would expect - the “semitone” distance should be between B/C and E/F, not between C/D. Let’s try other coefficients (the underlines indicate the semitones):

note:              a     b     c     d     e      f     g
floor((k&7)*1.8):  1     3     5     7     9     10    12
                                           --------

floor((k&7)*1.7):  1     3     5     6     8     10    11
                               -------           --------

floor((k&7)*1.6):  1     3     4     6     8      9    11
                         -------           --------

floor((k&7)*1.5):  1     3     4     6     7      9    10
                         -------     -------      -------

Clearly, the values of 1.8 and 1.5 do not fit - the first one has just one semitone and the last one has way too many. The other two, 1.6 and 1.7, are actually quite good: 1.7 results in a major scale of G-A-BC-D-EF and 1.6 results in a major scale of A-B-CD-E-F-G, which is exactly what we need!

Now we need to slightly adjust the values so that C would be 0, D would be 2, E would be 4, F would be 5, and so on. We should shift it by 4 semitones, but subtracting 4 would make A note lower than C note, so instead, we add 8 and calculate modulo 12 if the value overflows the octave:

let n = (((k&7) * 1.6) + 8) % 12;
// A  B C D E F G A  B C ...
// 9 11 0 2 4 5 7 9 11 0 ...

We should also take into consideration the sharp sign, which is captured by the m[3] regexp group. If it is present - we should increase the note value by 1 semitone:

// we use !!m[3], if m[3] is '#' - that would evaluate to `true`
// and gets converted to `1` because of the `+` sign.
// If m[3] is undefined - it turns into `false` and, thus, into `0`:
let n = (((k&7) * 1.6) + 8)%12 + !!m[3];

Finally, we should use the correct octave. Octaves are already stored as numbers in the m[5] regexp group, music theory tells us that each octave is 12 seminotes, so we can multiply octave number by 12 and add to the note value:

// n is a note index 0..35 where 0 is C of the lowest octave,
// 12 is C of the middle octave and 35 is B of the highest octave.
let n =
  (((k&7) * 1.6) + 8)%12 + // note index 0..11
  !!m[3] +                 // semitote 0/1
  m[5] * 12;               // octave number

clamping

What if someone writes 10 or 1000 as an octave number? That might result in an ultrasonic pitch. We should only allow the correct set of values for such parameters. Restricting a number between two other numbers is commonly known as “clamping”, and modern JS has a special function Math.clamp(x, low, high), which, however, is not available in most browsers yet. The easiest alternative would be to use:

clamp = (x, a, b) => Math.max(Math.min(x, b), a);

But since we try to shorten our code as much as possible, we can re-invent the wheel and avoid using Math functions. We use x=0 default value to make clamping work with undefined values as well:

clamp = (x=0, a, b) => (x < a && (x = a), x > b ? b : x);

clamp(0, 1, 3) // => 1
clamp(2, 1, 3) // => 2
clamp(8, 1, 3) // => 3
clamp(undefined, 1, 3) // => 1

tempo and note durations

We expect BPM to be passed as the parameter to out play() function, we only need to validate it:

bpm = clamp(bpm, 40, 400);

Now, to calculate how long the note should last in seconds we can get its musical duration (whole/half/quarter/…), which is stored in m[1] regexp capture group, and use the following formula:

note_duration = m[1]; // can be 1,2,4,8,16,32,64
// since BPM is "beats per minute", or usually "quarter note beats per minute",
// BPM/4 would be "whole notes per minute" and BPM/60/4 would be "whole
// notes per second":
whole_notes_per_second = bpm / 240;
duration = 1 / (whole_notes_per_second * note_duration);

If we minify these forumlas into one and clamp the note duration, we get:

// Assuming that default note duration is 4:
duration = 240 / bpm / clamp(m[1] || 4, 1, 64);

It would be also nice to support dotted duration, which increases the current note length by 50%. We have a capture group m[2] whose value can be either a dot . or undefined. Applying the same trick we used for the sharp sign, we get:

// !!m[2] would be 1 if it's a dot, 0 otherwise
// 1+!![m2]/2 would be 1 for normal notes and 1.5 for dotted notes
duration = 240 / bpm / clamp(m[1] || 4, 1, 64) * (1+!!m[2]/2);

Now we are able to calculate note numbers and durations for each note. Time to look into WebAudio API to actually play the tune.

WebAudio

We only need 3 parts from the whole WebAudio API: the audio context, the oscillator to procude the sound wave and the gain node to mute/unmute the sound. I will be using the square wave oscillator to make it sound much like the terrible buzzer of the old phones:

// Osc -> Gain -> AudioContext
let audio = new (AudioContext() || webkitAudioContext);
let gain = audio.createGain();
let osc = audio.createOscillator();
osc.type = 'square';
osc.connect(gain);
gain.connect(audio.destination);
osc.start();

This code on its own does not produce any music yet, but as we parse our RTTTL melody - we can tell WebAudio which note to play, when, with what frequency, and for how long.

All WebAudio nodes have a special method setValueAtTime, which schedules an event to modify the value, such as frequency or gain level of the node.

If you recall from the previous parts of this article, we already had note ASCII code stored as k, note index as n and we had note duration in seconds. Now, for each note we can do the following:

t = 0; // current time counter, in seconds
for (m of ......) {
  // ....we parse notes here...

  // Note frequency is calculated as (F*2^(n/12)),
  // Where n is note index, and F is the frequency of n=0
  // We can use C2=65.41, or C3=130.81. C2 is a bit shorter.
  osc.frequency.setValueAtTime(65.4 * 2 ** (n / 12), t);
  // Turn on gain to 100%. Besides notes [a-g], `k` can also be a `-`,
  // which is a rest sign. `-` is 0x2d in ASCII. So, unlike other note letters,
  // (k&8) would be 0 for notes and 8 for rest. If we invert `k`, then
  // (~k&8) would be 8 for notes and 0 for rest. Shifing it by 3 would be
  // ((~k&8)>>3) = 1 for notes and 0 for rests.
  gain.gain.setValueAtTime((~k & 8) >> 3, t);
  // Increate the time marker by note duration
  t = t + duration;
  // Turn off the note
  gain.gain.setValueAtTime(0, t);
}

That’s all. Our play() routine can now play complete melodies, written in RTTTL notation. Here’s the full code, with a few minifictions, such as using v as a shortcut for setValueAtTime, or using single-letter variables (C = context, z = oscillator because it buzzes, g = gain, q = bpm, c = clamp):

c = (x=0,a,b) => (x<a&&(x=a),x>b?b:x); // clamping function (a<=x<=b)
play = (s, bpm) => {
  C = new AudioContext;
  (z = C.createOscillator()).connect(g = C.createGain()).connect(C.destination);
  z.type = 'square';
  z.start();
  t = 0;
  v = (x,v) => x.setValueAtTime(v, t); // setValueAtTime shorter alias
  for (m of s.matchAll(/(\d*)?(\.?)([a-g-])(#?)(\d*)/g)) {
    k = m[4].charCodeAt(); // note ASCII [0x41..0x47] or [0x61..0x67]
    n = 0|(((k&7) * 1.6)+8)%12+!!m[3]+12*c(m[5],1,3); // note index [0..35]
    v(z.frequency, 65.4 * 2 ** (n / 12));
    v(g.gain, (~k & 8) / 8);
    t = t + 240 / bpm / (c(m[1] || 4, 1, 64))*(1+!!m[2]/2);
    v(g.gain, 0);
  }
};

// Usage:
play('8c 8d 8e 8f 8g 8a 8b 8c2', 120);

When minified with terser, this code takes 417 bytes. This is still below the expected threshold of 512 bytes, so why don’t we add a stop() function to interrupt the playback:

C=0; // initialize audio conteext C at the beginning with zero
stop = _ => C && C.close(C=0);
// using `_` instead of `()` for zero-arg function saves us one byte :)

That’s still about 445 bytes, and if you paste this code into your developer console - you would be able to play RTTTL and stop the playback by calling JS functions play() and stop().

UI

However, I think adding some UI for the composer would improve the composing experience. At this point I would suggest forgetting about code golfing and make a tiny editor for the RTTTL melodies without saving any byte, using normal HTML and CSS, and including the minified script for playback only.

I wouldn’t put the code here, it’s pretty boring, instead, you may find it all on github. Also, you may try the live demo here: https://zserge.com/nokia-composer/

nokia-composer-demo

If your muse has left you for today and you are not into writing music today, feel free to try a few existing songs and enjoy the familiar beeping sound:

Enjoy! Ah, by the way, if you actually composed something there - please share the URL (the whole song and the BPM are stored in the hash part of the URL, so saving/sharing your songs is as simple as copying or bookmarking the link.

I hope you’ve enjoyed this article. You can follow – and contribute to – on Github, Mastodon, Twitter or subscribe via rss.

Oct 13, 2020

See also: Making a tiny 2x3 bitmap font and more.