Adding Spell Checking As A Github Action

Author: Chad Feeser

It’s the same old story…you go to a blog trying to find a recipe for potato salad, and you wind up having to slog through a fifteen paragraph essay about the author’s left-handed parakeet and the respective merits of competitive pie sitting. But let me just say this about myself: prior to switching careers to the wonderful world of tech, I was a high school English teacher for thirteen years. Believe me when I say I’ve absolutely had my fill of completely avoidable spelling errors in “finished” documents.

Being a training company, it is certainly unbecoming for Alta3 to put grammatically flawed content in front of students. Certainly, you may wish to avoid similar faux pas with content that will be placed in front of clients. Therefore, let’s implement a check as part of an established CI/CD pipeline to guard against it. In this post, we are going to explore how GitHub Actions can make use of the PySpelling tool to run automated checks against any content currently in a repository at:

- the click of a button
- a push event
- a requirement of a pull request
- and more!

Part 1: Configuration File

The first document we’ll be using is described below. According to GitHub Action documentation, this file must be named .spellcheck.yml, .spellcheck.yaml, spellcheck.yml, or spellcheck.yaml, and will be searched for in that order. The gist of the document is to provide settings on how you’d like the tool to execute when running spellcheck. Put the document in your repository’s root directory.

Here is the config file in its entirety. We’ll break down each part afterwards.

matrix:
- name: Markdown
  aspell:
    lang: en
    ignore-case: true
  dictionary:
    wordlists:
    - .wordlist.txt
    encoding: utf-8
  pipeline:
  - pyspelling.filters.html:
      comments: false
      ignores:
      - code
      - pre
  sources:
  - '**/*.md'
  default_encoding: utf-8

Let’s begin by take a look at the first five lines:

matrix:
- name: Markdown
  aspell:
    lang: en
    ignore-case: true
  • name- The value ‘Markdown’ is arbitrary; pick any unique name to describe the “rule” you are creating.
  • aspell- GNU Aspell is an establishing spell checking tool that is utilized by this action.
    • lang- What language appears in the document being checked (what dictionary should we use?)
    • ignore-case- Option to specify that upper/lower case not be considered when looking for errors.

  dictionary:
    wordlists:
    - .wordlist.txt
    encoding: utf-8
  • dictionary.wordlists- .wordlist.txt, which may be named whatever else you wish, is a simple text document containing all the words that you would like the SpellChecker utility to skip (tech terms like Kubernetes will be marked as “misspelled” otherwise)

  pipeline:
  - pyspelling.filters.html:
      comments: false
      ignores:
      - code
      - pre
  • pyspelling.filters.html- the PySpelling tool is configurable, which is desirable as you may enable it to skip checking sections of your documents that should not be spell checked. For instance, the values code and pre below is informing the tool to ignore spellchecking any blocks of code.

  sources:
  - '**/*.md'
  default_encoding: utf-8
  • sources- Here is where either a static list of files or a wildcard expression can be used to specify which file(s) receive spellchecking. At Alta3, all our courses are written in Markdown format, so all files ending in .md are targeted.

Part 2: Workflow Script

The following is a workflow document that GitHub will read and interpret whenever a specific trigger has been made to occur. Once again here is the document presented in its entirety, followed by an explanation of each portion.

This file MUST be created in the .github/workflows directory and may be called whatever you wish as long as it has a .yml/.yaml file extension.

name: Spellcheck Workflow      
on: [workflow_dispatch]

jobs:                                            
  build:
    name: Spellcheck
    runs-on: ubuntu-latest
    steps:                                         
    - uses: actions/checkout@v3                    
    - uses: rojopolis/spellcheck-github-actions@v0 
      name: Spellcheck
      with:                                        
        task_name: Markdown
        output_file: spellcheck-output.txt

Now let’s break down what each part of this code is actually doing:

name: Spellcheck Workflow                          # optional name of the workflow
on: [workflow_dispatch]                            # trigger(s) for this workflow; put inside [brackets] if listing multiple
  • name- This will be the name of your workflow and is totally arbitrary. Call it what you like!
  • on- This is where you specify what trigger(s) will result in the workflow being executed. There are many to choose from listed on the Events that Trigger Workflows documentation on the GitHub docs. workflow_dispatch is a trigger that is done manually; once set, a button will appear on the workflow’s browser page (though it may be triggered through GitHub’s API or CLI as well).

jobs:                                              # groups all the jobs that run in this workflow
  build:
    name: Spellcheck
    runs-on: ubuntu-latest
  • jobs.build.name- An arbitrary name for the build you are running
  • jobs.build.runs-on- Specifies the platform on which your test will run. In this case, a container running a small Ubuntu-distro will run our test code.

    steps:                                         # all steps in the "Spellcheck" job, executed in this order
    - uses: actions/checkout@v3                    # specifies this step uses v3 of the actions/checkout action
    - uses: rojopolis/spellcheck-github-actions@v0 # specifies this step uses v0 of the rojopolis/spellcheck-github-action
      name: Spellcheck
      with:                                        
        task_name: Markdown                                 # this is the name of a rule in the config file .spellcheck.yml
        output_file: spellcheck-output.txt                  # if specified, defines the name of a generated file that can be stored as workflow artifact
  • jobs.build.steps[0].uses- This action checks-out your repository under the variable $GITHUB_WORKSPACE so your workflow can access it.
  • jobs.build.steps[1].uses- This is the action that runs PySpelling against the source files in your repository!
  • jobs.build.steps[1].with.task_name- IMPORTANT: this name is not arbitrary! This is the name of the rule we wrote earlier in the .spellcheck.yml configuration file at the beginning of the blog.
  • jobs.build.steps[1].with.output_file- Sets the filename of an artifact that can collect workflow output. If you have a lot of errors, this can be handy to use if you’re accessing using the API or CLI!

Part 3: Words List (Optional)

You may name this file what you wish, though wordlist.txt seems to be the convention. Whatever name you choose, it must match what is in our configuration file discussed previously under matrix[0].dictionary.wordlists. This document couldn’t be more simple…just write out what words you do not want the spellchecker to flag. Here is a verbatim example of what you may find in this file:

ansible
basepath
busybox
daemonset
hashicorp
kubernetes
microservice
netfilter

…and so on! If your check finds spelling errors, you must either find and fix them or add them to this document.

Happy spell checking! May all your documentation be free of errors and headaches!