Part of a new GenR series for 2020 ‘Open Science Pro Tip’ where Open Science researchers share their digital know-how.
Researcher and developer Daniel Jackson shares his experiences of using flat file web technologies that can take the headaches out of running a research website by reducing maintenance tasks, lowering costs, avoid security headaches, and helping with archiving and keeping a site online long-term. The article covers a number of research site examples from running a personal site, for a research project, or archiving a site at the end of a project. Flat file approaches came about because of the long-standing security vulnerability of websites built on PHP/MySQL which continually run the risk of opening up a whole web server computer to being hijacked. The solution to this ‘vulnerability’ problem is quite simple, remove the machine from the equation, just serve HTML/CSS and any other assets needed — hence the name ‘flat file’ sites.
The PHP CMSs that we use today were invented in the 1990s, with the major open source projects kicking off in the early 2000s, Drupal 2001 and WordPress 2003. Websites using these technologies underlie many millions of websites, with WordPress reportedly accounting for 35% of the web. (Netcraft 2019)
The challenges of the LAMP Stack
Problem of scope
Drupal and WordPress are examples of monolithic structures based on the LAMP stack (Linux, Apache, MySQL, PHP). They are monolithic because they control both the data and the presentation layers and with an extensive library of plugins can do pretty much anything. The problem with this huge scope is that some things are not done so well. For example Drupal has very good multi-lingual support built into core but WordPress does not. WordPress has a good shopping solution with the free WooCommerce plugin — however if you want subscriptions and memberships you need to pay for plugins; Drupal’s Commerce modules are complicated to set up and maintain so developers often opt for a 3rd party hosted solution such as Shopify. Some WordPress developers will also use Shopify above WooCommerce. If you would be best served with a 3rd party hosted service for your e-commerce, do you really need a PHP CMS to run your site, controlling all aspects of the front and back ends? And importantly most websites do not need to do everything.
Security and maintenance
As a website admin, the security alert emails can be a niggling stress. Miss one by a week (on that long awaited break inland to the hills), a few days or hours and your website could be compromised. Recovering a site or at worst rebuilding is a cost — your time and someone’s money. You should really have a protected dev version or local copy but it can be hard keeping these all up to date and in-synch. Drupal CMS has no automatic updating of core so you do need to be on the ball when those critical security alerts hit your inbox.
Performance and server management
Website performance, accessibility and all the other metrics that Google Lighthouse makes audits of will have an impact on your site visibility and bounce rate. Search algorithms will take these into account when determining page ranking and if your site is slow visitors will leave. Mobile traffic often represents the majority of users so performance on slower networks is critical. It is possible to address performance with various types of caching but it can be tricky to set-up correctly and will not get the same fast speeds as static HTML with image optimisation.
Enter the Jamstack, flat files and static site generators
Why the Jamstack?
Below are some of the key components of a Jamstack website which combined deliver the listed advantages:
- APIs (application program interface) are a set of rules, protocols and functions used to retrieve data from cloud based CMSs, other data services or hosted functionality such as an e-commerce solution. These APIs are provided by the data services or cloud based CMSs. An increasingly popular option is GraphQL, which is a query language for APIs.
- ‘Flat files’ is how the data of your website is stored — in plain simple text files — as opposed to storing your data in a MySQL database. Flat files are single files and should not be confused with a flat-file database, which stores all the data in a single file with no relationships. Using flat files is a bit of a time-warp back to the beginning of the world wide web in the early 1990s, but with the advantage that frameworks will automatically generate HTML pages and optimise your content — more on this later. The flat files comprising the content of your site are usually stored as Markdown.
- Git is the most widely used version control system and is designed for collaborative working. GitHub and GitLab are Git hosting services. Typically your Jamstack website will be a git repository on your local machine that you push and pull to a remote repository on GitHub.
- Markdown is a markup language with a simple formatting syntax. It can be entered as plain text or using a WYSIWYG interface. Markdown is used with static site generators but can also be used with single-page applications or more advanced frameworks. Markdown may often be substituted with plain HTML or other plain text formats.
Markdown files can simply be stored in your website project folders on your development machine or they may be hosted on a remote Git-based CMS such as Netlify CMS. There are a myriad of Git based CMSs to choose from: https://headlesscms.org/. They work by storing the data on GitHub or GitLab with a client side web wysiwyg interface for editing markdown. Netlify CMS has a config file so that you can create your own custom content types.
- Cloud CMSs are closed source and make data accessible with APIs. Contentful is one of the most popular, Sanity is another interesting contender.
- Static Site Generators (SSGs) do the building and outputting of your static website, taking your markdown, your templates and layout code to generate plain HTML, js and CSS. SSGs are written in various programming languages and you will install them on your development machine and deployment service. The number of SSGs to choose from is bewildering but we will focus on 3 popular contenders: Hugo, Jekyll and Gatsby. Check out the alternative SSGs here: https://www.staticgen.com/ — some have quite specific remits so could be a good fit if you have a specific requirement.
- Hugo and Jekyll are relatively lightweight SSGs, with simple templating and enhanced functionionality. They both have a good eco-system of themes. Hugo is extremely fast in building the static pages, it is versatile with an enthusiastic community. Jekyll is widely used and integrated with GitHub Pages, it is simple and described as ‘blog-aware’. Both are developing rapidly, so evaluate your requirements carefully and see which fits best. GitHub Pages is a free hosting service which can be used for both Jekyll and Hugo sites.
Depending on your needs you may want to have continuous deployment of your site for content updates and other changes to your git repository. Netlify is a hosting and build service for automatic deployment to their CDN network, but with frequent updates your build time could push you into their paid-for plans. Any errors in your code will cause builds to fail and build times can be quite long if there are a lot of images to process. As a point of interest Netlify’s CEO Matt Biilmann coined the nested acronym ‘Jamstack’ – ‘a’ is the nested bit for API.
What you do not get with SSGs is functionality like user-authorisation, comments and memberships and this is where the idea of the ‘Content Mesh’ comes in. Search with Algolia, image processing with Cloudinary, comments with Disqus, authorization with Auth0, SnipCart for shopping, and Netlify Forms for forms and anything else you may need.
What to do if you are still committed to your LAMP Stack CMS?
Stay on the PHP island but think a bit smarter about hosted services — Saas (Software as a Service)
There is a cloud solution for pretty much any service you may need. If you have a complex requirement for your site, evaluation of competing solutions could save substantial development time and deliver an improved user experience. The business model for Drupal modules is almost non-existent and the one for WordPress challenged. This is important because beyond the core open source project specific complex functionality can require paid-for support and continuous development to deliver the expected user experience. E-commerce is a good example with reliable Saas solutions ranging from the full enterprise shop, a digital downloads store or a donate button. It is worth spending the time evaluating different options and testing the free trials.
Half on half off the island — the headless approach
Both WordPress and Drupal can be used as headless or decoupled CMS. JSON:API is part of Drupal Core, and WordPress has the REST API plugin.
If you opt for a Jamstack site with static page generation your site will be faster, easier to maintain and more fun to build with more control of page layout and design. Your data will also be clearly separated from your presentation layer so you could easily upgrade from a Jekyll to a Gatsby site in the future.
But… if you want a blog with commenting but don’t want to pay for Disqus or Auth0 then you may decide to stick with WordPress or Drupal. It may be quicker and cheaper to build a site with Drupal or WordPress if you need functionality which come as standard with these CMSs. Another consideration is that site moderators do like the WordPress editing interface and may feel limited by some of the Jamstack equivalents. So no clear answers…
Maybe it is just going to be a lot more fun with Jamstack and static sites both for the end user and the developer. We won’t have to wait for 30secs or longer every time we save a post or see a CSS change and yes we won’t need to worry about spam and security — it will be someone else’s problem.
Addendum — Get out of PHP jail free card:
If you missed a security update whilst on that two week break and a spambot is busy exploiting PHPmailer, chewing up your bandwidth, whatever you try just doesn’t fix it and your hosting company is going to close your account then the utility Wget could save you. Wget is a cross platform web crawler that retrieves web pages and can generate a flattened HTML version of your CMS based website. Follow the instructions here:
Stanford, Web Services Blog: https://swsblog.stanford.edu/blog/creating-static-copy-website
Personal site for free
GitHub Pages is a simple and free way to setup a website, almost immediately, including free subdomain. GitHub Pages is a good way to run a blog, personal profile, or project site and can be extended in functionality and design. There is a setup wizard on the GitHub Pages site below where you can select from existing templates.
GitHub Pages: https://pages.github.com/
Example: Invest in Open Infrastructure
The campaign and network for open academic infrastructures ‘Invest in Open Infrastructure’ uses the GitHub Pages theme called minimal, with blogging and a custom domain.
Research project sites
Jekyll is a flat file site generating framework made by GitHub and is the same technology used by GitHub Pages. You can run Jekyll locally for site building, on GitHub Pages, or on your own web server.
Example: AcademicPages theme
Stuart Geiger a researcher from Berkeley Institute for Data Science has made a well structured academic profile site using Jekyll and very kindly released the theme, called ‘Academic Pages’, and testing environment for easy reuse.
AcademicPages theme: https://academicpages.github.io/
Hugo is a leading flat file framework from Google and very closely compares to Jekyll, selling its advantage is fast build times. A key difference for researchers is the use of AsciiDoc and reStructuredTex as markup languages. If you are looking to choose between the two this 2019 comparison on Slant is helpful.
Example: Modern Publishing
Modern Publishing is a research project developing collaborative writing workflows for Open Access publishing. Hugo is used to build their project site and host documentation for connecting tools like GitLab, Open Journal Systems (OJS), Hypothesis, and pandoc-scholar. The project is from TU Hamburg (TUHH) and the Hamburg State and University Library (SUB).
Modern Publishing: https://oa-pub.hos.tuhh.de/en/
Flat file site from WordPress
With WordPress it is possible to have the best of both worlds, a WordPress site that users are familiar with and then a flat file output of the WordPress site that is put online for public serving. There are many ways to achieve this mix and below is one example.
Shifter is a hosted managed service for generating a flat file site from WordPress. It is mentioned here as an example as it takes all the headaches out of setting up such a system, but also allows you to migrate away and combine it with other frameworks. One disadvantage is that many features that require dynamic updates or user interaction don’t work with Shifter.
Archive a WordPress site
Another use of flat file systems can be for when you want to close a research project and not have to update your site anymore. Again there are many options and this guide takes you through the options.
Example: Simple Static plugin
Simple Static plugin will generate a flat file copy of your site, which you can use to either replace your WordPress site with online or/and place an archived backup copy on GitHub, Archive.org, or on another repository.
Simple Static: https://WordPress.org/plugins/simply-static/
Netcraft. ‘2019 Web Server Survey’, 2019. https://news.netcraft.com/archives/category/web-server-survey/.