Random Thoughts

This blog development process

How to include a sitemap.xml in a create-react-app site

Friday, March 16th, 2018

This blog is evolving and its content is getting bigger. It's still hosted on GitHub pages, even though it's DNS now points to a custom domain.

SEO was one reason that weighted on the custom domain move. Well, it also looks better to have a custom domain. Anyway, for the SEO aspects, I also included pre-rendering feature and head meta tags. Both things that could have a dedicated post for each one.

What I'm going to explain here is how I included a sitemap.xml file on this site. This is a create-react-app based project, therefore the implementation is somehow affected by that library defaults.

The solution was to create a script that would generate the sitemap file, and this script is included in the build task.

The how to

Alright, first things first. My CRA setup was using default Babel configuration, and because of that I haven't installed nor configured Babel for this project previously. I could write my generator script in plain old JS, but there were some main concerns preventing me to do that:

One concern is that I wanted to use the existing selector methods to iterate the content, for the content-based pages, and they were already written with latest JS features.

The other concern was that I simply wanted to write the script with modern JS.

Setup Babel

Install all the requirements:

yarn add babel-cli babel-preset-env babel-preset-stage-0 babel-preset-react --dev

Create a .babelrc file in the root:

{
  "presets": ["env", "react", "stage-0"]
}

That's it, Babel is ready and we can move forward to our modern JS script!

References:

Create the sitemap generator script

To create the sitemap XML I used https://github.com/ekalinin/sitemap.js, it's quite simple and the instructions in the README are quite good for many different cases.

yarn add sitemap --dev

The script looks like this:

import path from 'path'
import sm from 'sitemap'
import fs from 'fs'

import config from '.'
import data from '../data.json'
import {
  getAllPostsForListing,
  getAllCategoriesForListing,
} from '../selectors/data'

const OUTPUT_FILE = path.resolve(__dirname, '..', '..', 'public', 'sitemap.xml')

const postsUrls = getAllPostsForListing({data})
  .map(post => {
    const handle = [
      post.handle.substring(0, 4),
      post.handle.substring(5, 7),
      post.handle.substring(8, 10),
      post.handle.substring(11),
    ].join('/')
    return {
      url: `${config.PUBLIC_URL}/${handle}`,
      changefreq: 'weekly',
      priority: 0.8,
    }
  })

const categoriesUrls = getAllCategoriesForListing({data})
  .map(category => ({
    url: `${config.PUBLIC_URL}/category/${category.handle}`,
    changefreq: 'weekly',
    priority: 0.8,
  }))

const sitemap = sm.createSitemap({
    hostname: 'https://bernardodiasdacruz.com',
    cacheTime: 600000, //600 sec (10 min) cache purge period
    urls: [
      { url: '/', changefreq: 'weekly', priority: 1 },
      { url: '/archive', changefreq: 'weekly', priority: 0.5 },
      { url: '/search', changefreq: 'weekly', priority: 0.5 },
      { url: '/about-me', changefreq: 'monthly', priority: 0.5 },
      ...postsUrls,
      ...categoriesUrls,
    ]
})

fs.writeFileSync(OUTPUT_FILE, sitemap.toString())

console.log(`Sitemap written at ${OUTPUT_FILE}`)

Yeah, I know many resources out there suggests that I load a routes file and iterate with it. You could really go for that approach if you'd prefer. By reading the script above I hope you get the idea of the how to. In my opinion, since this is a fairly small project, hardcoding the pages of the sitemap is good enough.

Workflow

On package.json script, I included a new task called sitemap and appended it in the prebuild task, see:

"scripts": {
  "sitemap": "./node_modules/.bin/babel-node src/config/sitemap.js",
  "prebuild": "npm run content && npm run sitemap"
}

The final round

After making the commit and deploy of these changes, I went to https://www.google.com/webmasters/ and tested the sitemap there, successfully. Check out the result https://bernardodiasdacruz.com/sitemap.xml.

This solution was integrated on this website in this commit.


[Edit 2018-03-21]

Oops, the URLs of posts have /:year/:month/:day/:title format but the filenames are in :year-:month-:day-:title format. Little mistake, that explained why my indexing wasn't working.

I'm aware of other things that could be improved, and I would definitely work on that as much as my availability allows me. Scripts like this could get a huge benefit from testing suites. As a matter of fact, on client projects, especially the big ones, I usually write scripts simultaneously running the test suites. As in TDD, coding to get all specs passing. A mistake like this would have been spotted before publishing. As they say: "fix bugs before they exist".

const postsUrls = getAllPostsForListing({data})
-  .map(post => ({
-    url: `${config.PUBLIC_URL}/${post.handle}`,
-    changefreq: 'weekly',
-    priority: 0.8,
-  }))
+  .map(post => {
+    const handle = [
+      post.handle.substring(0, 4),
+      post.handle.substring(5, 7),
+      post.handle.substring(8, 10),
+      post.handle.substring(11),
+    ].join('/')
+    return {
+      url: `${config.PUBLIC_URL}/${handle}`,
+      changefreq: 'weekly',
+      priority: 0.8,
+    }
+  })