Automatically creating an index in LaTeX

When I wrote my doctoral thesis I decided that I wanted an index. I eventually managed to create one with the help of a Tcl script that I wrote. You can see the final result in the picture below.

Example of index created with my script lixtcl.tcl.

Example of index created with my script lixtcl.tcl.

Why write my own script?

Although I’d known for a long time that I wanted an index, I decided not to start with the indexing until my text was finished. Once I got working on the actual index, I realised that the task of manually inserting \index tags would be overwhelming. Some sort of automated solution was needed. I wanted to index all occurrences of 350 words and phrases in about 100 pages of text.  After searching around for a while, I found some scripts and programs to do the job, but none served my needs completely.

What I really would have wanted is a script that integrates with my editor Texniccenter. But in the end I decided to write a Tcl script to do the job. I’m now making the script available under the GPL license in the hope that other people may find it useful.

Please note that I only improved the script to the level that it served my purpose with a minimum amount of programming work. If you make further improvements, please post the code back to me so that I can publish an updated version here.

How to use the script

In order to use the script you must:

  1. List the LaTeX files you want to index in a file called indexfiles.txt.
  2. Prepare a file called indexwords.txt that contain all the words you want to include in the index.
  3. Make sure you make backups of all your source files before you execute the script.
  4. Run the script.
  5. Copy the indexed files from the output directory to the original location.
  6. Re–compile your LaTeX document with makeindex enabled.

I’ll describe these steps in more detail below.

1. Listing the files to index

Create a file named indexfiles.txt in the working directory. It should contain a list of all files to be indexed, like this:

Ch1_Intro.tex
Ch2_Steel_production.tex
Ch3_Strategy.tex
Ch4_Framework.tex
Ch5_Models.tex
Ch6_Results.tex
Ch7_Discussion.tex
App_Reheating.tex

2. Listing the words to index

Then create another file called indexwords.txt. This is where you should list the words that you want to go into your index.

List one index entry per line.  Lixtcl.tcl supports three different types of entries:

Simple entries, typically a word or a phrase to include in the index.

Sub–entries, where an occurrence of a composite word is listed under a root entry.

Cross–references, where the occurrence of an index entry refers to another entry without listing any page numbers.

Here’s an example:

backlog adjustment -> backlog!adjustment
desired backlog -> backlog!desired
realised strategy -> strategy!realised
charging rate
charging temperature
cluster
TPS | Toyota production system

3. Backup your files

Remember to backup your LaTeX source files before running the script. I cannot stress this enough! If you don’t, don’t blame me If something goes wrong. Don’t blame me anyway by the way, there are no guarantees.

4. Running the script

Note that I’m running on Windows. If you use another OS, you may have to figure out how to use the script yourself. It shouldn’t be any problem though, the script is written in Tcl which is a cross platform scripting language.

If you don’t already have Tcl installed, download and install ActiveTcl from Active State [here]. You may also want to consult the Tcl home page at www.tcl.tk.

Next, run the script. On windows, the best way is to do this from a command prompt as shown in the picture below. Running the script may take a minute or two (I didn’t make any optimisations, ok). You’ll see a message each time the script starts to process another file included in indexfiles.txt.

Running the script from a command prompt.

Running the script from a command prompt on Windows.

The script is finished when control is returned to the command prompt.

5. Copy files

The original files should be unaltered, and the resulting indexed files should be found in a subdirectory named ./out. Enter the output directory and take a look to make sure that there is a copy of each file in there.

You can now copy the processed files from the output directory to your original LaTeX project directory. Keep your backups until you’ve seen that your project recompiles as it should.

6. Recompile your LaTeX project

Recompile your project (with makeindex). You will probably have to do this several times due to how makeindex works. Hopefully everything compiles as it should. You will then want to review your output DVI, PS or PDF, depending on your configuration, to make sure everything looks fine. If things worked as they should, you will now have an indexed document.

Cleaning up your files

After you’ve executed the script, your files may be so cluttered with \index entries that you find it difficult to continue to read and edit your document source. Instead of working in the indexed files, I suggest that you clean out all \index entreis with the script lixtclean.tcl below.

Just like when you generated the index, the cleaned up files will be created in the output directory called ./out. I don’t need to remind you to make sure to keep a backup of all files, do I?

Feedback?

Did you try my script? Please use the comment field below and tell me how it went.

Source code for lixtcl.tcl

# LiXTcl.tcl
# LaTeX indexer in Tcl
# Copyright 2009 Joakim Storck
# E-mail: joasto@gmail.com
#
#   This program is free software: you can redistribute it and/or modify
#   it under the terms of the GNU General Public License as published by
#   the Free Software Foundation, either version 3 of the License, or
#   (at your option) any later version.
#
#   This program is distributed in the hope that it will be useful,
#   but WITHOUT ANY WARRANTY; without even the implied warranty of
#   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#   GNU General Public License for more details.
#
#   You should have received a copy of the GNU General Public License
#   along with this program.  If not, see .
 
set flf [open "indexfiles.txt" r]
set filelist [read $flf]
close $flf
set wdf [open "indexwords.txt" r]
set wordlist [split [read $wdf] \n]
close $wdf
 
file mkdir out
 
set noindex [list caption chapter cite includegraphics index item label pageref ref section subsection]
 
set indexbefore [list gls glsfirst glsfirstplural glslink glsplural]
 
proc addidx {src word outvar} {
    upvar $outvar target
    global _see
    if {[regexp {([[:print:]]+)(?: *-> *)([[:print:]]+)} $word match first second]} {
        set first [string trim $first]
        set second [string trim $second]
        regsub -all $first $src \\index{$second}& target
    } elseif {[regexp {([[:print:]]+)(?: *\| *)([[:print:]]+)} $word match first second]} {
        set first [string trim $first]
        set second [string trim $second]
        if { [catch {set _see($first)}] } {
            if {[regsub $first $src \\index{$first\|see{$second}}& target]} {
                set _see($first) 1
            }
        } else {
            set target $src
        }
    } else {
        regsub -all $word $src \\index{&}& target
    }
}
 
proc addidxbefore {macrosrc src word outvar} {
    upvar $outvar target
    global _see
    set target $macrosrc
    if {[regexp {([[:print:]]+)(?: *-> *)([[:print:]]+)} $word match first second]} {
        set first [string trim $first]
        set second [string trim $second]
        if {[string first $first $src]>=0} {
            set target \\index{$second}$macrosrc
        }
    } elseif {[regexp {([[:print:]]+)(?: *\| *)([[:print:]]+)} $word match first second]} {
        # Cross references ("x, see y")
        set first [string trim $first]
        set second [string trim $second]
        if {[string first $first $src]>=0} {
            if { [catch {set _see($first)}] } {
                set target \\index{$first\|see{$second}}$macrosrc
                set _see($first) 1
            } else {
                set target $macrosrc
            }
        }
    } elseif {[string first $word $src]>=0} {
        set target \\index{$word}$macrosrc
    }
}
 
proc matchparen {srcvar from to} {
    upvar $srcvar src
    upvar $to where
    set level 0
    set where $from
    while {true} {
        set leftmatch [regexp -indices -start $where -- {\{} $src nxtleft]
        set rightmatch [regexp -indices -start $where -- {\}} $src nxtright]
        set nxtleft [lindex $nxtleft 0]
        set nxtright [lindex $nxtright 0]
        if {$leftmatch && $nxtleft < $nxtright} {
            incr level
            set where [expr $nxtleft+1]
        } else {
            incr level -1
            set where [expr $nxtright+1]
        }
        if {$level <= 0} {
            incr where -1
            break
        }
    }
}
 
proc nextmacro {src where macroname} {
    upvar $macroname keywd
    set keywd ""
    lassign [regexp -inline -nocase -start $where -- \
        {\\([a-z]+?)(\[([[:print:]]*?)\])?([\{[:space:]])} $src] match keywd param paramval leftpar
    regexp -indices -nocase -start $where -- {\\([a-z]*?)(\[([[:print:]]*?)\])?([\{[:space:]])} $src where
    set xpbegin [lindex $where 0]
    set parbegin [lindex $where end]
    if {[string equal $leftpar "\{"]} {
        matchparen src $parbegin parend
    } else {
        set parend $parbegin
    }
    return [list $xpbegin $parbegin $parend]
}
 
foreach fnam $filelist {
    puts "Processing file $fnam"
    set sf [open $fnam r]
    set src [read $sf]
    close $sf
    foreach word $wordlist {
        set result ""
        set where 0
        while {true} {
            lassign [nextmacro $src $where keywd] macrobegin parbegin parend
            if {[string length $keywd]>0} {
                addidx [string range $src $where [expr $macrobegin-1]] $word outstr
                append result $outstr
                set where $macrobegin
                if {[string first [string tolower $keywd] $noindex]>=0} {
                    # Do not index inside this macro!
                    # Output source until end of macro
                    # and forward search point to there.
                    append result "[string range $src $where $parend]"
                    set where [expr $parend+1]
                } elseif {[string first [string tolower $keywd] $indexbefore]>=0} {
                    # Put index outside (before) this macro!
                    append result "[string range $src $where $macrobegin-1]"
                    set macrosrc [string range $src $macrobegin $parend]
                    set parsrc [string range $src $parbegin $parend]
                    addidxbefore $macrosrc $parsrc $word outstr
                    append result $outstr
                    set where [expr $parend+1]
                } else {
                    # Index inside this macro.
                    # Forward to inside left parenthesis
                    addidx [string range $src $where $parbegin] $word outstr
                    append result $outstr
                    set where [expr $parbegin+1]
                }
                continue
            }
            append result [string range $src $where end]
            break
        }
        set src $result
    }
    set of [open "out/$fnam" w]
    puts $of $result
    close $of
}

Sourcecode for lixtclean.tcl

# LiXTClean.tcl
# Script to automatically clean out \index entries from LaTeX source
# Copyright 2009 Joakim Storck
# E-mail: joasto@gmail.com
#
#   This program is free software: you can redistribute it and/or modify
#   it under the terms of the GNU General Public License as published by
#   the Free Software Foundation, either version 3 of the License, or
#   (at your option) any later version.
#
#   This program is distributed in the hope that it will be useful,
#   but WITHOUT ANY WARRANTY; without even the implied warranty of
#   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#   GNU General Public License for more details.
#
#   You should have received a copy of the GNU General Public License
#   along with this program.  If not, see .
 
set flf [open "indexfiles.txt" r]
set filelist [read $flf]
close $flf
 
file mkdir out
 
proc cleanidx {src outvar} {
    upvar $outvar target
    regsub -all {\|see\{([^\}]+)\}} $src "" src
    regsub -all {\\index\{([^\}]+)\}} $src "" target
}
 
foreach fnam $filelist {
    puts "Processing file $fnam"
    set sf [open $fnam r]
    set src [read $sf]
    close $sf
    cleanidx $src result
    set of [open "out/$fnam" w]
    puts $of $result
    close $of
}
Share/recommend:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • Blogplay
  • LinkedIn
  • Technorati
  • Twitter
  • Yahoo! Buzz
  • email

RSS feed for comments on this post. TrackBack URI

Leave a Reply