Installation

There are two packages to install, the engine itself, and the training data for a language.

Linux

Tesseract is available directly from many Linux distributions. The package is generally called ‘tesseract’ or ‘tesseract-ocr’ – search your distribution’s repositories to find it. E.g. on a recent ubuntu or debian system, simply


sudo apt-get install tesseract-ocr

will install the program.

Packages are also generally available for language training data (search the repositories,) but if not you will need to download the appropriate training data at code.google.com/p/tesseract-ocr/downloads/list, unpack it, and copy the .traineddata file into the ‘tessdata’ directory, probably /usr/share/tesseract-ocr/tessdata or /usr/share/tessdata, depending on your distribution.

If Tesseract isn’t available for your distribution, or you want to use a newer version than is available, you can compile your own (cf. http://code.google.com/p/tesseract-ocr/wiki/Compiling).

Note that older versions of Tesseract only supported processing TIFF files and their language training data format is incompatible with the one which is used in 3.0.x.

Mac OS X

The easiest way to install Tesseract is through homebrew (http://brew.sh) . Once homebrew is installed, you can install Tesseract by running the command:  brew install tesseract.

If you want to use language training data not included with the homebrew package, download the appropriate training data, open it with Finder, and copy the .traineddata file into the /usr/local/Cellar/tesseract/<version>/share/tessdata directory.

Windows

An installer is available for Windows from our download page. This includes the English training data.

If you want to use another language, download the appropriate training data, unpack it using 7-zip (http://www.7-zip.org/) , and copy the .traineddata file into the ‘tessdata’ directory, probably


C:\Program Files\Tesseract OCR\tessdata.

Other Platforms

Tesseract may work on more exotic platforms too. You can either try compiling it yourself, or take a look at the list of other projects using Tesseract.

System requirements

Tesseract has a small footprint and will run on most recent hardware, even on mobile devices.

Documentation

Most relevant documentation can be found at the project website,http://code.google.com/p/tesseract-ocr/.