Diet bengoldsworthy.net

Same old wordcounts; now with half the calories!

Screenshot by the author

~1,700 words

Published:

Last modified:

Author: 

Summary

In migrating my site to Hugo, I’ve put a lot of work into trimming it down and making its various resources more efficient.

Way back in 2023, I talked about my long-, long-, long-awaited site migration to Hugo: a static site generator (SSG).

Beyond a certain ideological motivation to do so, I also talked about how I wanted to make my site more lightweight. Now, two long years later, I have finally managed to finish this part of the work, which gives me the chance to start making some comparisons and quantify the changes.

The Old Way

My old site ran on WordPress. Using WordPress, every file you upload is added to a Media Library that is shared across the site. In addition, any image files you upload are automatically duplicated in various different sizes. So, for example, if I upload a file (hobbit-1.jpg in this case) I end up with the following files:

Five copies of the same files, ranging from full-size (225 KiB) down to 150×150 (9.4 KiB)

Screenshot by the author

As you can see, the combined size of all the resized images (324.2 KiB) can, and often does, end up larger than the original file (255 KiB) — all for files that might never be needed.

Of course, the cost of storage these days is effectively negligible, so this is by no means an unreasonable compromise to make in favour of having very small files available pre-processed for display when necessary.

Having a central Media Library (which stores all uploaded files in a single wp-content/uploads/<year-uploaded>/<month-uploaded>/ directory structure) also represents a trade-off in terms of whether media items are treated as objects that can be linked to from multiple posts, or whether they are bundled up with the post content itself (which in WordPress is saved as text fields within a database). For a site like mine, however, there are very few instances of media files that I want to refer to from multiple places, and generally I prefer the organisation of each post into a single discrete bundle with all of its resources (even through this does, in those rare cases, mean I will be duplicating those files for each post that needs them1).

Hugo

Hugo works differently, by virtue of its being an SSG. Whereas in WordPress you upload all the bits of a post separately so they can be combined dynamically (i.e., fried) when a visitor requests a page, in Hugo you have a separation between your raw materials and the resulting site files, which you generate all in one go during the build step and then upload for a visitor to access.

Central to this in Hugo is the concept of a page bundle: a directory containing the article content (in a Markdown file) and any page resources. By putting those resources in the page bundle, you can also reference them in your article’s metadata and add any additional information, such as credits or alt text. This was another reason for wanting to reorganise my content, because I wanted to both a) include attribution and licensing information for all the media I’ve used over the years and b) add alt text for accessibility reasons.

A Long Job

I migrated my site in August 2022, so this is where my WordPress Media Library directory finishes. However, I’ve been at this blogging game for a long while, and that left me with just over 6,000 files (1.5 GiB) to reorganise; even discounting the various resized duplicates mentioned above, that’s still ~1,200 individual files.

If I was only moving the files into new directories, I likely could have automated the process, but because I was also identifying sources and adding text descriptions for everything, there was always going to be a manual element; and a manual element means a lot of time. So, over the past few years (in fits and starts; it certainly wasn’t a constant effort) I’ve been chipping away, a section of the site at a time.

Most of this has been very monotonous, but I could always put on a film or podcast or just zen out to some music and crack on. Also, tracking down some of the sources and licensing information for random images I picked up online over a decade ago was interesting, and going through the CV section let me check in on what’s happened to my past organisations, through mergers, acquisitions and bankruptcies.

But now, I’ve finally finished and deleted my old wp-content/uploads/ directory. So let’s crunch some numbers.

Numbers

As mentioned, my WordPress Media Library at time of migration came to 6,043 files (1.5 GiB).

Also as mentioned, the SSG approach means I effectively now have two distinct collections of media to compare: the raw material contained in the page bundles of the site source files; and the resulting Web site files generated for upload to my server.

My comparisons are not necessarily going to be 100% accurate, because several years have passed during this project and I have continued to add new content (and, thus, resources) to the site. I can roughly compare the size of the same media by limiting the new site’s content to those posts up to July 2022, but this is slightly muddied by the fact that I have also gone back and added additional media files to old posts, particularly those in the Appearances and Portfolio sections. These files (usually mirroring things like software applications, videos or audio files uploaded elsewhere) are generally much larger than image files.

So, for my site material, and limiting myself to posts predating July 2022, the new size of my combined resources is… 1.6 GiB, or 1,281 files. BUT, 149 of those files are things other than images which, despite the small number, account for 1 GiB of the total. So looking at only images, I’ve reduced the Media Library from 1.5 GiB to around 0.55 GiB; a reduction by roughly two thirds.

That’s the files on my disk shrunk; how about the resulting files on my server (where storage is slightly more costly)?

Well, applying the same search parameters to the resulting output files, I get a total of 2,945 files (again, 1.6 GiB), of which 154 are non-images (accounting for 1 GiB), leaving 0.55 GiB for the image files. So pretty much identical.

But this has certainly been a lot of faff just to save small amounts of storage which, as mentioned, is basically a free resource. So we come to the third and final part of the equation: just how heavy these files are as they are sent over the wires to my dear reader(s)?

You may have noticed that the number of files almost doubled in the rendered output, without no perceptible bump in the overall filesize. And here we can see the starkest improvement. In my page templates, I have a partial for rendering images that uses the <picture> element rather than the more basic <img>. The <picture> element takes a series of <source> elements, with links to various different files and the conditions under which to choose them, and will attempt to show the best option for any user based on the size of their browser window, the file types their browser supports, etc. (and it also includes a simple <img> as a fallback for those browsers that don’t support <picture>).

Within the template, I also resize each image file to widths of 1,200 and 800 px, alongside (most impactfully) converting it to WebP format. WebP is a Web-optimised image format that can produce some astonishing compression with no discernible impact on image quality (when viewed on a device).2 For example, one of these images is raw and one is WebP-compressed, but I doubt you can tell which is which:

Answer: The first image is the raw PNG (385.74 KiB); the second is the converted WebP (36.17 KiB).

Along with this automated resizing and format conversion, I also replaced many images with SVG equivalents; in effect, this means replacing an image file with a small piece of code that describes how to produce the same image, which requires basically no space at all (but which is, of course, mostly limited to geometic shapes). One post that particularly benefitted from this was this flag-heavy post: the raw files were 77.5 KiB, whilst the SVG equivalents are only 13.7 KiB (with the added bonus of being able to effortlessly render at any size with the same quality, even if someone, for some reason, decided to project that post onto the side of a building).

So of my 2,945 produced image files:

  • 1,696 are WebPs, SVGs and AVIFs, for a grand total of 92 MiB in size;
  • the remaining 1,189 are JPEGs and PNGs, which account for the other 497 MiB.
You must enable Javascript to view this chart.

Comparison of file sizes (prior to Jul 2022 only)

What Could Have Been

So, evidently, this approach has drastically reduced both my local and remote storage requirements and my network efficiency for all my pre-migration posts. But as a speculative case, how much do I think has been saved by having made this migration in the first place?

Well, taking into account all of the various posts on this site as of today, I have around 2,086 media files (3.7 GiB) in my raw material and output 3,993 files (3.4 GiB) in my generated site content. Compared to the WordPress site, my Hugo site has one fifth as many image files in its source material and half in its output, and around half of the file size, respectively.

Converting those reductions into multipliers, then, gives us the following:

You must enable Javascript to view this chart.

Comparison of resource size for all site content, including prediction of equivalent WordPress resource size

Licensing

Lastly, I mentioned that one of my parallel goals was to add any missing licensing information to the media I’ve used over the years. Having now done so, I can review the different licenses used across the site:

You must enable Javascript to view this chart.

Resource licence share


  1. Actually, if you were desparate to avoid any duplication, you can put multi-use resources in a global assets/ directory↩︎

  2. AVIF is a similar Web-optimised format, and if Hugo ever adds the conversion I will add that to my templates. ↩︎

Appendices

Methodology

  1. Get files with find . -print > files.txt
  2. Filter with cat files.txt | grep -E '\.(exe|csv|flac|mp3|gz|jar|ods|pdf|webm|xlsx|zip|avif|gif|jpeg|jpg|png|svg|webp)$' | grep -E '/(organizations|20(0|1|2[012]))' | grep -v -E '/2022-(08|09|10|11|12)' > files-filtered.txt
  3. Calculate number of files with cat files-filtered.txt | wc -l
  4. Sum file sizes with cat files-filtered.txt | xargs -d \\n du -ch