Automatically creating an index in LaTeX

When I wrote my doc­toral the­sis I decided that I wanted an index. I even­tu­ally man­aged to cre­ate one with the help of a Tcl script that I wrote. You can see the final result in the pic­ture below.

Example of index created with my script lixtcl.tcl.

Exam­ple of index cre­ated with my script lixtcl.tcl.

Why write my own script?

Although I’d known for a long time that I wanted an index, I decided not to start with the index­ing until my text was fin­ished. Once I got work­ing on the actual index, I realised that the task of man­u­ally insert­ing \index tags would be over­whelm­ing. Some sort of auto­mated solu­tion was needed. I wanted to index all occur­rences of 350 words and phrases in about 100 pages of text.  After search­ing around for a while, I found some scripts and pro­grams to do the job, but none served my needs completely.

What I really would have wanted is a script that inte­grates with my edi­tor Texnic­cen­ter. But in the end I decided to write a Tcl script to do the job. I’m now mak­ing the script avail­able under the GPL license in the hope that other peo­ple may find it useful.

Please note that I only improved the script to the level that it served my pur­pose with a min­i­mum amount of pro­gram­ming work. If you make fur­ther improve­ments, please post the code back to me so that I can pub­lish an updated ver­sion here.

How to use the script

In order to use the script you must:

  1. List the LaTeX files you want to index in a file called indexfiles.txt.
  2. Pre­pare a file called indexwords.txt that con­tain all the words you want to include in the index.
  3. Make sure you make back­ups of all your source files before you exe­cute the script.
  4. Run the script.
  5. Copy the indexed files from the out­put direc­tory to the orig­i­nal location.
  6. Re–compile your LaTeX doc­u­ment with makein­dex enabled.

I’ll describe these steps in more detail below.

1. List­ing the files to index

Cre­ate a file named indexfiles.txt in the work­ing direc­tory. It should con­tain a list of all files to be indexed, like this:

Ch1_Intro.tex
Ch2_Steel_production.tex
Ch3_Strategy.tex
Ch4_Framework.tex
Ch5_Models.tex
Ch6_Results.tex
Ch7_Discussion.tex
App_Reheating.tex

2. List­ing the words to index

Then cre­ate another file called indexwords.txt. This is where you should list the words that you want to go into your index.

List one index entry per line.  Lixtcl.tcl sup­ports three dif­fer­ent types of entries:

Sim­ple entries, typ­i­cally a word or a phrase to include in the index.

Sub–entries, where an occur­rence of a com­pos­ite word is listed under a root entry.

Cross–references, where the occur­rence of an index entry refers to another entry with­out list­ing any page numbers.

Here’s an example:

backlog adjustment -> backlog!adjustment
desired backlog -> backlog!desired
realised strategy -> strategy!realised
charging rate
charging temperature
cluster
TPS | Toyota production system

3. Backup your files

Remem­ber to backup your LaTeX source files before run­ning the script. I can­not stress this enough! If you don’t, don’t blame me If some­thing goes wrong. Don’t blame me any­way by the way, there are no guarantees.

4. Run­ning the script

Note that I’m run­ning on Win­dows. If you use another OS, you may have to fig­ure out how to use the script your­self. It shouldn’t be any prob­lem though, the script is writ­ten in Tcl which is a cross plat­form script­ing language.

If you don’t already have Tcl installed, down­load and install ActiveTcl from Active State [here]. You may also want to con­sult the Tcl home page at www.tcl.tk.

Next, run the script. On win­dows, the best way is to do this from a com­mand prompt as shown in the pic­ture below. Run­ning the script may take a minute or two (I didn’t make any opti­mi­sa­tions, ok). You’ll see a mes­sage each time the script starts to process another file included in indexfiles.txt.

Running the script from a command prompt.

Run­ning the script from a com­mand prompt.

The script is fin­ished when con­trol is returned to the com­mand prompt.

5. Copy files

The orig­i­nal files should be unal­tered, and the result­ing indexed files should be found in a sub­di­rec­tory named ./out. Enter the out­put direc­tory and take a look to make sure that there is a copy of each file in there.

You can now copy the processed files from the out­put direc­tory to your orig­i­nal LaTeX project direc­tory. Keep your back­ups until you’ve seen that your project recom­piles as it should.

6. Recom­pile your LaTeX project

Recom­pile your project (with makein­dex). You will prob­a­bly have to do this sev­eral times due to how makein­dex works. Hope­fully every­thing com­piles as it should. You will then want to review your out­put DVI, PS or PDF, depend­ing on your con­fig­u­ra­tion, to make sure every­thing looks fine. If things worked as they should, you will now have an indexed document.

Clean­ing up your files

After you’ve exe­cuted the script, your files may be so clut­tered with \index entries that you find it dif­fi­cult to con­tinue to read and edit your doc­u­ment source. Instead of work­ing in the indexed files, I sug­gest that you clean out all \index entreis with the script lixtclean.tcl below.

Just like when you gen­er­ated the index, the cleaned up files will be cre­ated in the out­put direc­tory called ./out. I don’t need to remind you to make sure to keep a backup of all files, do I?

Down­loads

The source scripts are saved with a .txt file type since my server side secu­rity set­tings do not allow me to upload .tcl files. Down­load and change the file end­ing from .txt to .tcl.

Feed­back?

Did you try my script? Please use the com­ment field below and tell me how it went.

12 Comments to “Automatically creating an index in LaTeX”

  1. avatar

    By fred, January 10, 2011 @ 12:53

    Great idea ! but I had two errors : first, close-brace was miss­ing in your code. Sec­ond : i have an error :
    wrong # args: should be “fore­ach varList list ?varList list …? com­mand“
    while exe­cut­ing
    “fore­ach word $wordlist{“
    (“fore­ach” body line 6)
    invoked from within
    “fore­ach fnam $filelist {
    puts “Pro­cess­ing file $fnam“
    set sf [open $fnam r]
    set src [read $sf]
    close $sf
    fore­ach word $wordlist{
    …“
    (file “lixtcl.tcl” line 121)

  2. avatar

    By Joakim S., January 11, 2011 @ 16:44

    Hi Fred!

    I’ll try to look into the index script prob­lem as soon as I can. Could you please mail me your indexwords.txt and your indexfiles.txt for debug­ging pur­poses? I’ve sent you an email, please reply to me with the files attached.

  3. avatar

    By Joakim S., January 15, 2011 @ 22:42

    Prob­lem fixed.

  4. avatar

    By JOEL, June 12, 2011 @ 23:31

    Hi Joakim

    I’m Joel, MEng stu­dent in Ire­land. I’m try­ing 2 add index for my the­sis. I have gone thru ur intruc­tion in the link “http://www.manufacturology.com/2010/03/automatically-creating-an-index-in-latex/” but in the step 4, I can’t see the com­mand prompt for run­ning the script.
    Please help me.
    Thank­ing you
    Joel

  5. avatar

    By Joakim S., June 12, 2011 @ 23:41

    Hi Joel!

    For some rea­son the screen­shot of the com­mand win­dow dis­ap­peared. I updated the post and inserted the pic­ture again, hope you get the script to work.

    /Joakim

  6. avatar

    By Ramon, November 12, 2011 @ 17:39

    Hi,

    it seems a great tool. But I have some prob­lems with Uni­code text. I’m prepar­ing a crit­i­cal edi­tion of greek math­e­mat­i­cal texts. I need an exhaus­tive index and I thought that this tool would be per­fect to me. But I wrote ancient greek and it seems there are some prob­lems when the file con­tains uni­code text. For exam­ple: I put this file:

    http://dl.dropbox.com/u/6690829/prova2.tex

    in indexfiles.txt, and these words in indexwords.txt (they are greek words)

    πρώτη
    ὅθεν

    δὲ
    ῾Η->ἡ!ὁ
    τὰ->τὰ!ὁ

    and the result file is:

    http://dl.dropbox.com/u/6690829/prova2result.tex

    and the pdf:

    http://dl.dropbox.com/u/6690829/prova2.pdf

    as you can see, there are some words that don’t appear (πρώτη), and all words that are suben­tries with root word, doesn’t appear.

    thanks

    Ramon

  7. avatar

    By Joakim S., November 14, 2011 @ 23:41

    Hello Ramon!

    Thanks for your feed­back. I’m not very famil­iar with uni­code but I’ll see if I can do some­thing about the problem.

    Joakim

  8. avatar

    By Anna, February 24, 2012 @ 16:00

    Hello,

    I was try­ing to cre­ate an index for my the­sis and the script runs,
    how­ever, it does not pick up words which have comas or dots after them i.e
    if I want to index say “Peter” if it is Peter, or Peter. it does not pick
    it up. I can­not fix it. I made sure there is no space after the actual
    entry.

    Please help

    Anna

  9. avatar

    By Ed Fox, June 29, 2012 @ 01:35

    I’m fin­ish­ing up a book and have been insert­ing \index{} entries but fin­ish­ing that work is going to be a much big­ger job than I had planned. Does your pro­gram take each of a list of files, e.g., ch1.tex, and cre­ate a new ver­sion of ch1.tex that has lots of \index{} com­mands added, one for each occur­rence of a word in the list of words to be indexed? I’m hop­ing I don’t have to remove all the \index{} entries already there … My main ques­tion is what the goes into ./out, pre­cisely. Many thanks, Ed (fox@vt.edu)

  10. avatar

    By Marek, August 4, 2012 @ 18:42

    Thank you very much for this use­ful script!

    To dis­play UTF-8 characters

    1. I changed [a-z] to [[:alpha:]]

    2. added: encod­ing sys­tem utf-8

    To make it work make sure to encode indexwords.txt with UTF-8.

    Thank you again!

  11. avatar

    By Joakim S., August 6, 2012 @ 00:39

    Thank’s Marek, this may solve some of the prob­lems reported by oth­ers.
    Unfor­tu­nately I’m not cur­rently using LaTeX so I have lit­tle pos­si­bil­ity to make cor­rec­tions and updates to the code myself.

  12. avatar

    By ramon, August 7, 2012 @ 23:09

    thank you Marek for update the script to dis­play utf8 characters

    Ramon

RSS feed for comments on this post. TrackBack URI

Leave a Reply