HOW TO REINVENT THE WHEEL - DEVELOPING A CMS

I rolled my own content management system for my website -- and I know you shouldn't. 🀭

I like solving complicated problems, and it was an opportunity to advance my skills in database design, and to tinker with ideas of composable data structures and autogenerated editing environments ('projectional editing') in a web-based environment. I am also a deep believer that content outlives technology, and should therefore be stored independent of the current platform being used. A headless CMS provides many of these benefits, but I think the most platform-agnostic storage format is plain text stored in files and folders. I wasn't concerned about performance for my small site, but to upscale the system, you could simply create a performance-oriented view of the content in a relational database and use that to serve a website.

The rest of this post is mostly just implementation details, but read on if you are interested. I applied some of the design patterns found in object-oriented programming, and tried to employ the composability of functional programming, striving to make each part of the system independent of the others to give flexibility in how the solution could be deployed.

The database

I started with defining my file structure. The general idea is that

  1. Every node in the database has a unique name (the path)

  2. Each node in the database has a type. More types can be added by the user as needed, but I implemented

        Text 
        Markdown
        Image
        PHP
        Date
        Number
        Reference
        Record
  3. No type is particularly special, meaning that any type could be 'plugged in' to the system, making it very extensible. The Record type is the only type I implemented that allows other types to be included in it (ie. has child nodes), but it would be trivial to implement other data structures that contain child nodes, such as ordered lists. Another node can be referenced via a Reference.

  4. Because the storage format is normal files, each type could conceivably be edited by existing software; eg. you could open up images in Photoshop and you would be directly editing the database. This allows leverage of existing technologies, while still allowing those nodes to be constrained by a database schema (coming soonβ„’).

  5. At this level, each data type has no knowledge of how it can be edited.

This results in a file structure similar to the following:

    root
    β”œβ”€β”€β”€blog.record
    β”‚   β”‚   created.date
    β”‚   β”‚
    β”‚   β”œβ”€β”€β”€blackbaud-crm-email-management.record
    β”‚   β”‚   β”‚   excerpt.txt
    β”‚   β”‚   β”‚   published.date
    β”‚   β”‚   β”‚   status.txt
    β”‚   β”‚   β”‚   title.txt
    β”‚   β”‚   β”‚
    β”‚   β”‚   └───content.markdown
    β”‚   β”‚       β”‚   content.md
    β”‚   β”‚       β”‚
    β”‚   β”‚       └───assets
    β”‚   β”‚               profile.jpg
    β”‚   β”‚               location.jpg
    β”‚   β”‚
    β”‚   └───hello-world.record
    β”‚       β”‚   excerpt.txt
    β”‚       β”‚   published.date
    β”‚       β”‚   status.txt
    β”‚       β”‚   title.txt
    β”‚       β”‚
    β”‚       └───content.markdown
    β”‚           β”‚   content.md
    β”‚           β”‚
    β”‚           └───assets
    β”‚                   php-api.jpg
    β”‚
    β”œβ”€β”€β”€pages.record
    β”‚   β”œβ”€β”€β”€experience.record
    β”‚   β”‚       content.php
    β”‚   β”‚       title.txt
    β”‚   β”‚
    β”‚   β”œβ”€β”€β”€intro.record
    β”‚   β”‚       content.php
    β”‚   β”‚       title.txt
    β”‚   β”‚
    β”‚   β”œβ”€β”€β”€page_not_found.record
    β”‚   β”‚       content.php
    β”‚   β”‚       title.txt
    β”‚   β”‚
    β”‚   β”œβ”€β”€β”€post.record
    β”‚   β”‚       content.php
    β”‚   β”‚       title.txt
    β”‚   β”‚
    β”‚   └───writing.record
    β”‚           content.php
    β”‚           title.txt
    β”‚
    └───post-ideas.record
                    building-a-datawarehouse-from-blackbaud-crm.txt
                    extracting bulk data from Blackbaud CRM using Python.txt

The software library

Because deploying my content onto this website, I needed to develop the software library that understands the Data format in PHP. I gave it a fluent interface, and it makes querying data simple and effective:

Using the filesystem has some awesome advantages. The PHP code that powers the pages on this website is considered part of this data, and can then live in the same folder as other page metadata. It means everything is still queryable using the library, while still being able to be run as a standard PHP file.

The editing interface

Once this basic API was defined, I started working on the Content Management interface. This used the software library to access the database.

I had already identified the different data types that I wanted to support in the Data format, but importantly, some data types were stored in the same way, but were edited using different interfaces. This was the Text data type, with the single-line and multi-line editing variants. Creating the appropriate architecture for this was a little challenging, but it was important to keep all related concerns together (data-type related information, including the definition of the data type itself and the definition of how the data was saved to and read from the filesystem), while keeping the editing interface definition in this separate editing interface component.

Because I had decided on web-technology, I had access to a wide range of pre-built editing components. Specifically, for the markdown content, I integrated EasyMDE. Any images that are embedded in the markdown are stored in the same folder - it was bit of a nightmare to get working, but a pleasure to use.

Because each node in the database knows what type it is, it's fairly easy to generate an interface to edit a particular node, and this makes the most sense for editing Record-type nodes. For example, an editing component would be generated for each Node in the following Record (representing this post): This means that, if you wanted to, you could add a new piece of data into this folder, and the system would generate a component for editing that data.

Next steps - developing a schema

You will note that this system is schemaless. Typically, data needs to be consistent in order to be useful; the data needs to follow a particular schema.

Each type of data could have its corresponding type of schema; and this would produce a very powerful and flexible model. A date field could be limited to within certain date ranges; a text field may need to conform to a specific regex string; or an image may need to be of a certain size. Likewise, a record may need to have specific fields; or a it's children may only be allowed to contain text values.

Once this next part of the system is implemented, records or values could then be rejected if they do not meet the schema -- regardless of whether they come from the editing interface, or from the database (filesystem) itself. Because if a user provides information, I think it is valuable to store that information, even if it is wrong, and then reject the record from entering a production on an as-needed basis until the user has time to correct the error.

Long-term goals // blue-sky dreaming

Ideally, I'd like to develop a simple version of the language workbenches and meta-programming systems that are floating around; but targeted more at data management using a more wholistic approach that manages data and operations in the same system. It would be best to combine some of the concepts of immutability and event-sourcing into this model, to acheive some of the commit / diff / branching / rollback of Git and historical query abilities of bi-temporal databases like Crux. If operations and business processes were stored with data, we could even query how those have changed over time, too.