Beep, beep, I'm a sheep

Audio I/O is a complex topic, which scares away many inspired musicians who are into programming (or programmers who are into music). Let’s try making it look less complex and see how audio works on each modern desktop platform.

Our show case today will be a simple beeper. Remeber that annoying thing inside your PC boxes producing the irritating buzzing sound? It’s long time gone, but I suggest to sumon its soul and make a library that plays beeps across all the platforms.

The end result is available here - https://github.com/zserge/beep.

Windows

We are lucky here - Windows already has a Beep(freqency, duration) function in the <utilapiset.h>, we can reuse that.

The function has got a very long and hard life. It has been introduced to play beeps through the hardware beeper using the 8245 programmable timer. As more and more PCs came without a beeper, this function became obsolete, but in Windows 7 it has been rewritten to play beeps using the regular sound card API.

Yet, behind the simplicity of this function hides the complexity of all the Windows sound APIs. There has been MME released in 1991, which is the default choice for audio because it’s well supported. However, MME is known to have very high playback latency and probably won’t be suitable for most audio apps. There is also WASAPI released in 2007, which has much lower latency, especially when used in exclusive mode (i.e. when you app runs - the user can’t listen to Spotify or any other app, your app owns the sound hardware exclusively). WASAPI is often a good choice for audio apps, but there is also DirectSound, which basically wraps WASAPI for DirectX interfacing.

If in doubt - use WASAPI.

Linux

Audio is one of a few areas, where Linux APIs are no better than the rest of the platforms. First of all, there is ALSA, which is part of the kernel itself. ALSA works directly with the hardware and if you would like your app to work exclusively with sound - ALSA could probably be a good compromise between complexity and performance. If you are bulding a synth or a sampler for a Raspberry Pi - ALSA could be a good choice.

Then, there is PulseAudio, which is a modern desktop abstraction layer built on top of ALSA. It routes sound from various apps and tries to mix the audio streams so that the critial apps would not suffer from latency issues. Although PulseAudio brings many features that would not be possible with ALSA, like routing sound via internet, most musician apps don’t use it.

They use JACK Audio Connection Kit. JACK was created for professional musicians and it cares about real-time playback, whereas PulseAudio was made for casual users who might tolerate some latency from their youtube playback. JACK connects audio apps with minimal latency, but keep in mind that it still works on top of ALSA, so if your app would be the only audio app running (i.e. if you are making a drum machine from an old Raspberry Pi) - then ALSA would be easier to use and would have a better performance.

Making a beeper function with ALSA is actually not so hard. We need to open the default audio device, configure it to use a well-supported sampling rate and sample format, and start writing data into it. Audio data can be a sawtooth wave, as described in the previous audio article.

int beep(int freq, int ms) {
  static void *pcm = NULL;
  if (pcm == NULL) {
    if (snd_pcm_open(&pcm, "default", 0, 0)) {
      return -1;
    }
    snd_pcm_set_params(pcm, 1, 3, 1, 8000, 1, 20000);
  }
  unsigned char buf[2400];
  long frames;
  long phase;
  for (int i = 0; i < ms / 50; i++) {
    snd_pcm_prepare(pcm);
    for (int j = 0; j < sizeof(buf); j++) {
      buf[j] = freq > 0 ? (255 * j * freq / 8000) : 0;
    }
    int r = snd_pcm_writei(pcm, buf, sizeof(buf));
    if (r < 0) {
      snd_pcm_recover(pcm, r, 0);
    }
  }
  return 0;
}

Here we use synchronous API and we don’t check for errors to keep the function small and simple. Synchronous blocking I/O is probably not the best option for serious audio applications, and fortunately ALSA comes with various transfer methods and various modes of operation: https://www.alsa-project.org/alsa-doc/alsa-lib/pcm.html. But for our simple use case it’s totally fine.

If in doubt - use ALSA; if you have to cooperate with other audio apps - use JACK.

macOS

This one is rather simple, but not easy at all. MacOS has CoreAudio framework, responsible for audio functionality on both, desktop and iOS. CoreAudio itself is a low-level API, tightly integrated with OS to optimize latency and performance. To play some sound with CoreAudio one would have to create an AudioUnit (audio plugin).

AudioUnit API is a little verbose, but easy to understand, here’s how to create a new AudioUnit:

AudioComponent output;
AudioUnit unit;
AudioComponentDescription descr;
AURenderCallbackStruct cb;
AudioStreamBasicDescription stream;

descr.componentType = kAudioUnitType_Output,
descr.componentSubType = kAudioUnitSubType_DefaultOutput,
descr.componentManufacturer = kAudioUnitManufacturer_Apple,

// Actual sound will be generated asynchronously in the callback tone_cb
cb.inputProc = tone_cb;

stream.mFormatID = kAudioFormatLinearPCM;
stream.mFormatFlags = 0;
stream.mSampleRate = 8000;
stream.mBitsPerChannel = 8;
stream.mChannelsPerFrame = 1;
stream.mFramesPerPacket = 1;
stream.mBytesPerFrame = 1;
stream.mBytesPerPacket = 1;

output = AudioComponentFindNext(NULL, &descr);
AudioComponentInstanceNew(output, &unit);
AudioUnitSetProperty(unit, kAudioUnitProperty_SetRenderCallback,
										 kAudioUnitScope_Input, 0, &cb, sizeof(cb));
AudioUnitSetProperty(unit, kAudioUnitProperty_StreamFormat,
										 kAudioUnitScope_Input, 0, &stream, sizeof(stream));
AudioUnitInitialize(unit);
AudioOutputUnitStart(unit);

This code only creates and starts a new AudioUnit, the actual sound generation would happen asynchronously in the callback:

static OSStatus tone_cb(void *inRefCon,
                        AudioUnitRenderActionFlags *ioActionFlags,
                        const AudioTimeStamp *inTimeStamp, UInt32 inBusNumber,
                        UInt32 inNumberFrames, AudioBufferList *ioData) {
  unsigned int frame;
  unsigned char *buf = ioData->mBuffers[0].mData;
  unsigned long i = 0;
  for (i = 0; i < inNumberFrames; i++) {
    buf[i] = beep_freq > 0 ? (255 * theta * beep_freq / 8000) : 0;
    theta++;
    counter--;
  }
  return 0;
}

This callback generates sound similarly to how we did it with ALSA, but it is called asynchronously when CoreAudio thinks that audio buffer is almost empty and needs to be filled with new audio samples.

Such asynchronous approach to sound generation is very common, and almost every modern audio library supports it. If you want to build a music app - you should probably design it with asynchronous playback in mind.

If in doubt - use CoreAudio.

Sounds like too much work

If you are building a music app - you may follow the same path, implement audio backend for WASAPI, ALSA and CoreAudio. Actually it’s not that hard. You may see the full sources of beep, it’s roughly 100 lines of code for all three platforms.

However, there is a number of good cross-platform libraries, such as:

RtAudio + RtMidi (very easy to use, a single .cpp and .h file).
PortAudio + PortMiidi (written in C and a bit larger), has tons of various backends.
SoundIO - https://github.com/andrewrk/libsoundio - a wonderful little library from the creator of Zig.

Some poeple prefer using JUCE for cross-platform audio apps, but it has its own limitations. Audio ecosystem may appear complex, and there are too many choices, but most of them are good ones. So keep playing!

I hope you’ve enjoyed this article. You can follow – and contribute to – on Github, Mastodon, Twitter or subscribe via rss.

Dec 13, 2020