Simple LaTeX wordcount – Abitha K Thyagarajan

I like using wc to keep track of how much I’ve written and to keep track of my wordcounts. However, wc is not the best choice to find out how many words you’ve written in a LaTeX document – it’s better to use something like texcount.

There are a lot of ways to accomplish this, but I wanted to highlight a simple CLI approach I’ve been using:

#!/bin/bash

texcount {$1} -total | awk '{print $NF}' | tail -n 8 | sed ':a;/[0-9]$/{N;s/\n/+/;ba}' | sed 's/$/0/g' | qalc | head -n 3 | tail -n 1 | awk '{print $NF}'

Let’s look at what this does, shall we?

texcount is the main character here. Its typical output is a bit long-winded for my tastes:

File: latexworkshop.tex
Encoding: utf8
Words in text: 210
Words in headers: 3
Words outside text (captions, etc.): 0
Number of headers: 1
Number of floats/tables/figures: 0
Number of math inlines: 0
Number of math displayed: 0
Subcounts:
  text+headers+captions (#headers/#floats/#inlines/#displayed)
  1+0+0 (0/0/0/0) _top_
  209+3+0 (1/0/0/0) Section: LaTeX workshop proposal}\label{latex-workshop-proposal}

This is a script, so {$1} gets the first command-line argument; #!/bin/bash is the hashbang, and tells the shell which program it should use to execute the script.
I want my output to only have the number of words, so let’s use -total with texcount.
awk '{print $NF}' gets the last word on each line – which is a number.
tail -n 8 gets the last 8 lines – skipping the filename.
sed ':a;/[0-9]$/{N;s/\n/+/;ba}' puts all the numbers on the same line, with a + between each pair of numbers.
sed 's/$/0/g' appends a 0 to the end (to take care of the trailing +).
qalc is my preferred CLI caclulator. We feed the equation obtained in the previous step into qalc and get the sum.
head -n 3 and tail -n 1 isolate the line with the sum. Come to think of it, I think you could grep for the = sign.
Finally, awk '{print $NF}' gets the last word – which is the total wordcount.