Pennybase: a Pound-Shop BaaS

So, you want to build a web app, but you are not a backend developer. In that case, it’s very tempting to use one of many Backend-as-a-service (BaaS) platforms, such as Firebase, Supabase, or even PocketBase if you are on a budget.

But here’s a story of yet another toy BaaS, built exclusively for educational purposes (the world needs another side project where the author is the only user, too). It’s called Pennybase, and like many others – it’s a single-file backend-as-a-service written in Go.

What makes it different is the size (~700 lines of code, 35x smaller than PocketBase), zero external dependencies, and nevertheless – feature completeness.

Resources, Records, Schemas

We start with data modelling. As in most BaaS, we don’t know ahead what kind of data we are going to store. We need a flexible data model. In traditional backends data is stored in relational databases, where each table has a fixed schema and each data model is represented by a structure/record with known fields.

In Pennybase, we choose a very radical approach: we will be storing data in plain CSV files. This means that our records may only contain string fields, because it’s the only data type available to CSV.

type Record []string

However, it’s not the most convenient way to work with data. Our backend should be able to serve data in a more structured way, such as JSONs. So we treat Record as a low-level storage primitive (like a database table row) and introduce a Resource, which is a collection of named fields, each having a certain type and validation rules.

To convert Records and Resources we should define Schemas, telling how each field of a Resource becomes a string in Record array (and back).

type Resource map[string]any
type FieldType string

const (
  Number FieldType = "number"
  Text   FieldType = "text"
  List   FieldType = "list"
)

type FieldSchema struct {
  Resource string    
  Field    string   
  Type     FieldType
  Min      float64 
  Max      float64
  Regex    string 
}
type Schema []FieldSchema

We limit ourselves to only strings, numbers and string lists. This roughly corresponds to JSON types, if we replace boolean with 0/1 and disallow nested objects.

Let’s also introduce basic validation rules: min/max for numbers and regular expressions for strings. These are optional but they can be useful to validate “booleans” (min=0, max=1), enumerations (regex=^(foo|bar)$), and other things.

We can now use Schema to convert Records to Resources and back:

func (field FieldSchema) Validate(v any) bool {
  if v == nil { return false }
  switch field.Type {
  case Number:
    n, ok := v.(float64)
    return ok && ((field.Min == 0 && field.Max == 0) || (n >= field.Min && n <= field.Max))
  case Text:
    s, ok := v.(string)
    return ok && (field.Regex == "" || regexp.MustCompile(field.Regex).MatchString(s))
  case List:
    _, ok := v.([]string)
    return ok
  }
  return false
}

func (s Schema) Record(res Resource) (Record, error) {
  rec := Record{}
  for _, field := range s {
    v := res[field.Field]
    if v == nil {
      v = map[FieldType]any{Number: 0.0, Text: "", List: []string{}}[field.Type]
    }
    if !field.Validate(v) {
      return nil, fmt.Errorf("invalid field \"%s\"", field.Field)
    }
    switch field.Type {
    case Number: rec = append(rec, fmt.Sprintf("%g", v))
    case Text: rec = append(rec, v.(string))
    case List: rec = append(rec, strings.Join(v.([]string), ","))
    }
  }
  return rec, nil
}

func (s Schema) Resource(rec Record) (Resource, error) {
  res := Resource{}
  for i, field := range s {
    switch field.Type {
    case Number:
      n, err := strconv.ParseFloat(rec[i], 64)
      if err != nil {
        return nil, err
      }
      res[field.Field] = n
    case Text: res[field.Field] = rec[i]
    case List:
      if rec[i] != "" {
        res[field.Field] = strings.Split(rec[i], ",")
      } else {
        res[field.Field] = []string{}
      }
    default: return nil, fmt.Errorf("unknown field type %s", field.Type)
    }
  }
  return res, nil
}

Now we can freely convert Resources and Records, while being sure that our data fields are more or less valid. Here’s an example to play with: https://go.dev/play/p/BCh5kIY_Aih

CSV storage

Now that we have a data model, we should find a way to store it on disk. We go with CSV files, a format that everyone likes to write, but hates to read.

We start with an interface to our “database”. Later we might want to choose a proper storage, like SQLite or some document store, or introduce caching and indexing without breaking the rest of the code:

type DB interface {
  Create(r Record) error
  Update(r Record) error
  Get(id string) (Record, error)
  Delete(id string) error
  Iter() func(yield func(Record, error) bool)
  Close() error
}

Nothing unusual here, regular CRUD operations and an iterator. Implementation can also be trivial: for each operation we open the file, read it, modify in memory and write back atomically (i.e. write to temporary file and rename it back).

However, in this case I think we should take an extra step and try to make it a bit more optimal. First, we may choose to only append data to a CSV file. In this case writes would be very fast, but reads would require scanning the full file to find the last written record. We can fix it by caching last known record offsets in memory. Thankfully, Go CSV reader has a method to return current offset in the file after a record has been read.

So, when a DB is created - we read the file and store offsets of each record in a map. When we write a record - we append a row and update an offset. To read a record we check the offset, seek to it and read the record.

What stays unclear is how to handle deletes and how to identify records. A typical way is to use optimistic concurrency. Each record has a unique string ID and a numeric version (starting with 1). These two fields would be mandatory to all records and resources, we would have to update Schema to include them all the time and validate properly.

At the database level we increase the version each time the record is updated and set version to zero once it’s deleted. We also cache latest versions in memory to avoid concurrent writes. The client should pass the latest version they have, and if two clients are writing at the same time – the first one would succeed and increase the version, while the other one will fail because the versions wouldn’t match anymore.

All together it gives us a “database” like this:

type db struct {
  mu      sync.Mutex
  f       *os.File
  w       *csv.Writer
  index   map[string]int64
  version map[string]int64
}

func Open(path string) (DB, error) {
  f, _ := os.OpenFile(path, os.O_RDWR|os.O_CREATE|os.O_APPEND, 0644)
  if err != nil { return nil, err }
  db := &db{f: f, w: csv.NewWriter(f), index: map[string]int64{}, version: map[string]int64{}}
  r := csv.NewReader(f)
  r.FieldsPerRecord = -1 // deleted records have only 2 field: id and version=0
  for {
    pos := r.InputOffset()
    rec, err := r.Read()
    if errors.Is(err, io.EOF) { break }
    if err != nil { return nil, err }
    if len(rec) > 0 {
      db.index[rec[0]] = pos
      db.version[rec[0]], _ = strconv.ParseInt(rec[1], 10, 64)
    }
  }
  return db, nil
}

func (db *db) append(r Record) error {
  pos, _ := db.f.Seek(0, io.SeekEnd)
  err := db.w.Write(r)
  if err != nil {
    return err
  }
  db.w.Flush()
  db.index[r[0]] = pos
  db.version[r[0]], err = strconv.ParseInt(r[1], 10, 64)
  return err
}

// Now we can call db.append() from db.Create(), db.Update(), and db.Delete()
// Create: ensure that version=="1" and record doesn't exist in cache
// Update: ensure that version==current+1 and record exists in cache
// Delete: ensure that record exists in cache
// For simplicity these methods are omitted here.

func (db *db) Get(id string) (Record, error) {
  db.mu.Lock()
  defer db.mu.Unlock()
  if db.version[id] < 1 {
    return nil, errors.New("record not found")
  }
  offset, ok := db.index[id]
  if !ok {
    return nil, nil
  }
  if _, err := db.f.Seek(offset, io.SeekStart); err != nil {
    return nil, err
  }
  r := csv.NewReader(db.f)
  rec, err := r.Read()
  if err != nil {
    return nil, err
  }
  if len(rec) > 0 && rec[0] != id {
    return nil, errors.New("corrupted index")
  }
  return rec, nil
}

func (db *db) Iter() func(yield func(Record, error) bool) {
  return func(yield func(Record, error) bool) {
    db.mu.Lock()
    defer db.mu.Unlock()
    if _, err := db.f.Seek(0, io.SeekStart); err != nil {
      yield(nil, err)
      return
    }
    r := csv.NewReader(db.f)
    r.FieldsPerRecord = -1
    for {
      rec, err := r.Read()
      if errors.Is(err, io.EOF) {
        break
      }
      if err != nil {
        yield(nil, err)
        return
      }
      if len(rec) < 2 {
        continue
      }
      id, version := rec[0], rec[1]
      if version == "0" || version != strconv.FormatInt(db.version[id], 10) {
        continue // deleted items or outdated versions
      }
      if !yield(rec, nil) {
        return
      }
    }
  }
}

Despite its simplicity this CSV storage is quite fast and powerful. I think Bitcask uses a similar approach of having an append-only log with an index holding the keys mapped to the necessary lookup information. Quick benchmarks show that both writes and reads take a few microseconds, so we can proceed building abstractions on top of this poor man’s “database”.

You can play with it here: https://go.dev/play/p/q4PtWb9SvvA

Store

A store is the least interesting part of Pennybase, because it’s essentially a collection of DB instances and a global cache for Schemas. We assume that Schemas are immutable and can only be loaded once when the store is created:

type Store struct {
  Dir       string
  Schemas   map[string]Schema
  Resources map[string]DB
}

func NewStore(dir string) (*Store, error) {
  s := &Store{Dir: dir, Schemas: map[string]Schema{}, Resources: map[string]DB{}}
  schemaDB, err := NewCSVDB(s.Dir + "/_schemas.csv")
  if err != nil {
    return nil, err
  }
  for rec, err := range schemaDB.Iter() {
    if err != nil {
      return nil, err
    }
    if len(rec) != 8 {
      return nil, fmt.Errorf("invalid schema record: %v", rec)
    }
    schema := FieldSchema{Resource: rec[2], Field: rec[3], Type: FieldType(rec[4]), Regex: rec[7]}
    schema.Min, _ = strconv.ParseFloat(rec[5], 64)
    schema.Max, _ = strconv.ParseFloat(rec[6], 64)
    s.Schemas[schema.Resource] = append(s.Schemas[schema.Resource], schema)
    if _, ok := s.Resources[schema.Resource]; !ok {
      db, err := NewCSVDB(s.Dir + "/" + schema.Resource + ".csv")
      if err != nil {
        return nil, err
      }
      s.Resources[schema.Resource] = db
    }
  }
  return s, nil
}

Here the store loads all resource schemas, and for each resource defined in a schema catalogue it opens a new CSV database. The rest of the Store operations simply delegates the Resource to the underlying DB, converting it into a Record first:

func (s *Store) Create(resource string, r Resource) (string, error) {
  db, ok := s.Resources[resource]
  if !ok {
    return "", fmt.Errorf("resource %s not found", resource)
  }
  newID := ID()
  r["_id"] = newID
  r["_v"] = 1.0
  rec, err := s.Schemas[resource].Record(r)
  if err != nil {
    return "", err
  }
  if err := db.Create(rec); err != nil {
    return "", err
  }
  return newID, nil
}
func (s *Store) Update(resource string, r Resource) error { ... }
func (s *Store) Delete(resource, id string) error { ... }
func (s *Store) Get(resource, id string) (Resource, error) { ... }
func (s *Store) List(resource, sortBy string) ([]Resource, error) { ... }

The List method takes an optional sortBy because otherwise the order of iteration would be inconsistent if the records get updated: the last updated item would appear the last in the list, even if it was the first one created.

This is a terribly inefficient sorting approach, as it keeps all data in memory, but since DB is an interface – might consider building a search index to optimise listing, filtering and sorting later.

Anyway, we now have a generic store, capable of serving JSON-like Resources. Time to add HTTP API to it.

HTTP API

This is where Go shines, especially with the new ServeMux. We can create a full REST API for our Store in a few lines of code:

type Server struct {
  store  *Store
  mux    *http.ServeMux
}

func NewServer(store *Store, tmplDir, staticDir string) *Server {
  s := &Server{
    store:  store,
    mux:    http.NewServeMux(),
  }
  s.mux.HandleFunc("GET /api/{resource}/", s.handleList)
  s.mux.HandleFunc("POST /api/{resource}/", s.handleCreate)
  s.mux.HandleFunc("GET /api/{resource}/{id}", s.handleGet)
  s.mux.HandleFunc("PUT /api/{resource}/{id}", s.handleUpdate)
  s.mux.HandleFunc("DELETE /api/{resource}/{id}", s.handleDelete)
  if tmplDir != "" {
    if tmpl, err := template.ParseGlob(filepath.Join(tmplDir, "*")); err == nil {
      for _, t := range tmpl.Templates() {
        s.mux.Handle(fmt.Sprintf("GET /%s", t.Name()), s.handleTemplate(tmpl, t.Name()))
      }
    }
  }
  if staticDir != "" {
    s.mux.Handle("GET /static/", http.StripPrefix("/static/", http.FileServer(http.Dir(staticDir))))
  }
  return s
}

func (s *Server) Handler() http.Handler {
  return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
    // we can run some middleware here, if needed
    s.mux.ServeHTTP(w, r)
  })
}

func (s *Server) handleList(w http.ResponseWriter, r *http.Request) {
  res, err := s.store.List(r.PathValue("resource"), "")
  if err != nil {
    http.Error(w, err.Error(), http.StatusInternalServerError)
    return
  }
  _ = json.NewEncoder(w).Encode(res)
}

// similarly, we implement handleCreate, handleGet, handleUpdate, handleDelete

Finally, we can run our server and try it out:

store, err := NewStore("data")
if err != nil {
  log.Fatal("Error initializing store:", err)
}
server := NewServer(store, "web/templates", "web/static")
if err != nil {
  log.Fatal("Error initializing server:", err)
  return
}
log.Fatal(http.ListenAndServe(":8080", server.Handler()))

It should be able to serve static files, such as JS or CSS, Go templates, that can render data from Store and provide a REST API to manipulate collections of resources:

curl -X POST --json '{"title": "foo", "tags": ["bar", "baz"]}' http://localhost:8080/api/posts/

TODO

Authentication and Authorization

So far we have a working BaaS, as long as you trust your users not to erase each other’s data. Time to improve the security.

We introduce two more special files: _users.csv and _permissions.csv.

User list is a simple CSV file with a list of users and their hashed passwords. We also keep a list of roles for each user to implement some minimal role-based access control (RBAC).

Having a list of users we can authenticate them using a simple HTTP Basic Auth. It shouldn’t be too hard to also implement OAuth and user sessions later, but let’s start simple.

func (s *Store) authenticate(username, password string) (string, []string, error) {
  db, _ := s.Resources["_users"]
  rec, err := db.Get(username)
  if err != nil {
    return "", nil, errors.New("user not found")
  }
  userRes, _ := s.Schemas["_users"].Resource(rec)
  storedHash := userRes["password"].(string)
  salt := userRes["salt"].(string)
  if hashPassword(password, salt) != storedHash {
    return "", nil, errors.New("invalid credentials")
  }
  roles, _ := userRes["roles"].([]string)
  return username, roles, nil
}

Permissions are a bit more complex. Each row of the _permissions collection describes a single access rule, telling that a certain resource can be accessed by a certain role or by a certain owner.

We try to support two use cases here, similar to the UNIX file permissions. The first case is owner-based permissions. When a resource is created its owner username is stored in the record. Permission check then compares the owner field of a resource with the username and if they are the same - permission is granted:

id,version,resource,action,ref,role

1,1,posts,create,*,
2,1,posts,read,*,
3,1,posts,delete,owner,
4,1,posts,update,owner,

In this example “posts” resource can be created and read by anyone, but can only be updated or deleted by the username stored in the “owner” field of the post.

The second use case is a role-based approach, similar to “group” permissions in UNIX. In this case we check if the user has a certain role and if so - grant permission to access the resource:

1,1,posts,create,,"*"
1,1,posts,read,,"viewer"
1,1,posts,delete,,"editor,moderator"
1,1,posts,update,,"editor,moderator"

Here we say that anyone with the role “viewer” can see the posts, but only editor and moderators can update them.

Permission check remains simple:

func (s *Store) checkPermissions(resource, action, user string, roles []string, res Resource) bool {
  permissions, _ := s.List("_permissions", "")
  for _, p := range permissions {
    act := p["action"].(string)
    if p["resource"].(string) != resource || (act != "*" && act != action) {
      continue
    }
    if ref := p["ref"].(string); ref != "" {
      if res[ref] == user {
        return true
      }
      continue
    }
    role := p["role"].(string)
    if role == "*" || slices.Contains(roles, role) {
      return true
    }
  }
  return false
}

Although it’s far from being great (we scan all permissions all the time), there are many ways to speed it up.

Real-time database

Most BaaS-es position themselves as real-time databases, meaning that one can watch all the updates happening to the collection of resources as they happen, typically over a websocket.

We can implement a similar mechanism, but using a simpler alternative - server-side events.

Let’s add a local in-process pub/sub broker to our server:

type Broker struct {
	channels map[string]map[chan Event]bool // resource -> channels
	mu       sync.RWMutex
}

func (b *Broker) Subscribe(resource string, ch chan Event) {
	b.mu.Lock()
	defer b.mu.Unlock()
	if b.channels[resource] == nil {
		b.channels[resource] = make(map[chan Event]bool)
	}
	b.channels[resource][ch] = true
}

func (b *Broker) Unsubscribe(resource string, ch chan Event) {
	b.mu.Lock()
	defer b.mu.Unlock()
	if subs := b.channels[resource]; subs != nil {
		delete(subs, ch)
	}
}

func (b *Broker) Publish(resource string, evt Event) {
	b.mu.RLock()
	defer b.mu.RUnlock()
	if subs := b.channels[resource]; subs != nil {
		for ch := range subs {
			select {
			case ch <- evt:
			default:
			}
		}
	}
}

type Event struct {
	Action string   `json:"action"`
	ID     string   `json:"id"`
	Data   Resource `json:"data"`
}

type Server struct {
	Store  *Store
	Broker *Broker
	Mux    *http.ServeMux
}

We can now do srv.Broker.Publish() to emit events when a resource has been modified. We can also create an endpoint /api/events/{resource} that would use server-side events to listen to resource changes:

func (s *Server) handleEvents(w http.ResponseWriter, r *http.Request) {
  flusher, ok := w.(http.Flusher)
  if !ok {
    http.Error(w, "SSE not supported", http.StatusBadRequest)
    return
  }
  w.Header().Set("Content-Type", "text/event-stream")
  w.Header().Set("Cache-Control", "no-cache")
  w.Header().Set("Connection", "keep-alive")
  resource := r.PathValue("resource")
  user, err := s.Store.Authenticate(r)
  if err != nil {
    http.Error(w, "unauthenticated", http.StatusUnauthorized)
    return
  }
  events := make(chan Event, 10)
  s.Broker.Subscribe(resource, events)
  defer s.Broker.Unsubscribe(resource, events)
  for {
    select {
    case e := <-events:
      if e.Action == "delete" || s.Store.Authorize(resource, e.ID, "read", user) == nil {
        data, _ := json.Marshal(e.Data)
        fmt.Fprintf(w, "event: %s\ndata: %s\n\n", e.Action, data)
        flusher.Flush()
      }
    case <-r.Context().Done():
      return
    }
  }
}

This SSE mechanism is trivial to use from JavaScript and many frameworks such as htmx support it out the box to update view contents when an event arrives. This is how one can build a real-time chat or a microblog using Pennybase.

Hooks

Of course, built-in capabilities of Pennybase are questionable: validation is a joke, authNZ does not support relations between the resources, integrating it with other systems is impossible. Some BaaSes solve it by embedding a scripting language, such as JavaScript. Others add cloud functions. We choose hooks written in Go, so that one could import Pennybase as a module and use Go to extend its functionality.

Or, more precisely we only support one global hook function. It is invoked whenever a resource is about to be created, updated or deleted. Perhaps there is some use cases to invoke hooks on Get or List requests, but it’s omitted for now.

A hook function can modify the resource, or perform additional validation as well as authorisation:

server, _ := pennybase.NewServer("data", "templates", "static")
server.Hook = func(trigger, resource string, user pennybase.Resource, res pennybase.Resource) error {
  log.Printf("Hook triggered: %s on %s by user %v: %v", trigger, resource, user, res)
  if trigger == "create" && resource == "messages" {
    // we can inject or modify resource fields based on relations between the resources
    r["author"] = user["_id"]
    r["created_at"] = time.Now().UTC().Format("2006-01-02T15:04:05Z07:00")
  }
  // we can also return an error if validation or authorisation fails
  return nil
}

Chaining hook functions like middleware is a way to keep them modular, but Pennybase leaves this as an exercise to the reader.

Conclusion

Pennybase is a simple, yet powerful BaaS, that can be used to prototype web apps without building a separate backend each time. It has a simple data model, a simple storage, a REST API, authentication and authorization, real-time updates and “admission hooks”. Of course, it’s still a toy, not a production level BaaS solution, but I hope it has some educational value and can be used as a conceptual starting point for some projects.

The final code with some examples can be found here: https://github.com/zserge/pennybase

I hope you’ve enjoyed this article. You can follow – and contribute to – on Github, Mastodon, Twitter or subscribe via rss.

Jun 17, 2025

See also: Poor Man's Web and more.