LaTeX documents accessible in the repository with CI

Andrzej Wojciechowski

LaTeX is a very popular and in my opinion an optimal tool for creating documents larger than a couple of pages. It is often used for technical documentation and kept in a version control system, such as git. Usually, only LaTeX source files are stored in a repository, unlike the final PDF files. But is there a way to make a generated PDF document always up to date and accessible in the repository, without versioning the actual file? It turns out that it’s possible with continuous integration (CI).

Why even bother?

You may wonder if this is an actual problem that’s worth solving. After all, if someone uses LaTeX and version control system, in most cases he’s more than capable to install the LaTeX tools for their system, if these are not already installed. That’s true, but I’d argue it’s still worth to have the generated documentation available in the repository. Here is why:

1. Convenience

You can quickly get the document while browsing the repository via GitLab, GitHub, or similar. No need to clone or pull the local copy of the repository and generating the documentation locally. No need to worry if your version of LaTeX and used packages are compatible with the document. You can also view the repository on a device where you don’t have LaTeX tools installed, and still have access to the final document. You can even potentially give access to a non-technical person (or at least not familiar with LaTeX) to the repository, show them where they can find the document and spend less time sending them updated versions.

2. Speed

In a lot of cases it’s just quicker to get the documentation from the GitLab/GitHub/other repository service, than generating it locally. Especially if the document is fairly complex, contains references or tables of context. In this case it needs at least two recompiles in order to make sure the generated version is up to date. Let alone needing to clone the remote repository (i.e. a new person, a new machine or both).

3. Reliability

You always know where to get the latest and the most up to date version of the document. No need to think when the local copy of the repository was updated. You don’t need to worry about generating the document locally at all, unless you actually want to do so. Additionally, you are certain that everyone has access to the latest version of the document. Of course you can automate the local generation of the document, but… why? Why would you want to do so locally, if you can do so in a remote repository?

I hope this convinced you there are a lot of benefits in having a document available in a repository.

Why not keep the generated PDF file versioned in the repository?

Some people may be tempted to keep the final PDF file version in version control system (such as git). Generally it’s not a good idea as the PDF files are treated as binary files. Therefore, the version control system doesn’t track actual changes in the files. Instead, it always stores entire copy of the file. Additionally, the document is generated form the sources stored in the repository, so the historical version can always be generated when it’s needed.

Most of the time, only the latest, most up to date version of the document is important and should be easily available. So why make the repository unnecessarily bigger?

Additionally, how can you make sure that the document in the last commit has been regenerated with updated sources? You either have to remember to manually generate a the final PDF before each commit, or automate this task. In the latter case – once again, why would you want to do so locally, if you can do so in a remote repository?

Continuous integration (CI) for LaTeX documents

As I’ve stated in the beginning, we can have a final document always available in a repository, up to date and not versioned (and not taking up space in the repository with every version), by using continuous integration (CI). I’ll explain how to achieve out goal on GitLab, but on other development platforms the performed actions are very similar.

1. Set up a GitLab Runner

First you need to make sure you have a machine where you can run your CI jobs. It is called a GitLab Runner. I suggest configuring a Docker executor, as it runs each build in a separate and isolated container. The machine itself does not need to have a lot of performance. In my experience a Raspberry Pi works great for this purpose as it has enough computing power and relatively low power consumption.

2. Configure the CI jobs for LaTeX documents

Next you need to create the CI jobs. Simply create a .gitlab-ci.yml in the root directory where the local copy of your repository exists and define the CI stages and jobs you need. Here is an example of CI configuration file I use:

stages:
   - document

########################################
# hidden jobs used as templates
.LaTeX_template:
   stage: document
   needs: []
   tags:
      - docker
      - RPi
   rules:
      - changes:
         - "$FILEPATH"
      - if: $CI_COMMIT_REF_NAME == $CI_DEFAULT_BRANCH
   image: mfisherman/texlive-full:2022
   script:
      - pdflatex "$FILEPATH"
   artifacts:
      when: always
      expire_in: 1 week
      paths:
         - "*.pdf"

########################################
Example document:
   extends: .LaTeX_template
   variables:
      FILEPATH: "doc/Example document.tex"

In the beginning I’ve just created a CI stage document. Then I’ve defined a job template called .LaTeX_template, so I don’t need to copy-paste the entire configuration each time I add a new job for a new document. Lastly, I’ve defined an actual job to generate the document.

As you can see at line 16 above, instead of using a generic docker image and installing LaTeX tools, I tend to use an existing docker image created by mfisherman. It works great and it’s available for both AMD64 as well as ARM platforms! I don’t want to worry about the LaTeX packages – that’s why I use the texlive-full image. But if you need only the basic ones, there is also a texlive image and even texlive-minimal image. Choose the one that fits your needs.

The line 21 above suggests that the generated document will be deleted after 7 days, but that’s not entirely true. The documents from the latest jobs are stored until a new commit is pushed.

3. Use it!

That’s it! Nothing more to configure. In case you are wondering where can you find the generated documents, here is a hint:

1 thought on “LaTeX documents accessible in the repository with CI”

Comments are closed.