I like using wc
to keep track of how much I’ve written and to keep track of my
wordcounts. However, wc
is not the best choice to find out how many words
you’ve written in a LaTeX document – it’s better to use something like
texcount
.
There are a lot of ways to accomplish this, but I wanted to highlight a simple CLI approach I’ve been using:
#!/bin/bash
texcount {$1} -total | awk '{print $NF}' | tail -n 8 | sed ':a;/[0-9]$/{N;s/\n/+/;ba}' | sed 's/$/0/g' | qalc | head -n 3 | tail -n 1 | awk '{print $NF}'
Let’s look at what this does, shall we?
texcount
is the main character here. Its typical output is a bit long-winded for my tastes:
File: latexworkshop.tex
Encoding: utf8
Words in text: 210
Words in headers: 3
Words outside text (captions, etc.): 0
Number of headers: 1
Number of floats/tables/figures: 0
Number of math inlines: 0
Number of math displayed: 0
Subcounts:
text+headers+captions (#headers/#floats/#inlines/#displayed)
1+0+0 (0/0/0/0) _top_
209+3+0 (1/0/0/0) Section: LaTeX workshop proposal}\label{latex-workshop-proposal}
- This is a script, so
{$1}
gets the first command-line argument;#!/bin/bash
is the hashbang, and tells the shell which program it should use to execute the script. - I want my output to only have the number of words, so let’s use
-total
withtexcount
. awk '{print $NF}'
gets the last word on each line – which is a number.tail -n 8
gets the last 8 lines – skipping the filename.sed ':a;/[0-9]$/{N;s/\n/+/;ba}'
puts all the numbers on the same line, with a+
between each pair of numbers.sed 's/$/0/g'
appends a0
to the end (to take care of the trailing+
).qalc
is my preferred CLI caclulator. We feed the equation obtained in the previous step intoqalc
and get the sum.head -n 3
andtail -n 1
isolate the line with the sum. Come to think of it, I think you couldgrep
for the=
sign.- Finally,
awk '{print $NF}'
gets the last word – which is the total wordcount.