There are two packages to install, the engine itself, and the training data for a language.
Tesseract is available directly from many Linux distributions. The package is generally called ‘tesseract’ or ‘tesseract-ocr’ – search your distribution’s repositories to find it. E.g. on a recent ubuntu or debian system, simply
sudo apt-get install tesseract-ocr
will install the program.
Packages are also generally available for language training data (search the repositories,) but if not you will need to download the appropriate training data at code.google.com/p/tesseract-ocr/downloads/list, unpack it, and copy the .traineddata file into the ‘tessdata’ directory, probably /usr/share/tesseract-ocr/tessdata or /usr/share/tessdata, depending on your distribution.
If Tesseract isn’t available for your distribution, or you want to use a newer version than is available, you can compile your own (cf. http://code.google.com/p/tesseract-ocr/wiki/Compiling).
Note that older versions of Tesseract only supported processing TIFF files and their language training data format is incompatible with the one which is used in 3.0.x.
Mac OS X
The easiest way to install Tesseract is through homebrew (http://brew.sh) . Once homebrew is installed, you can install Tesseract by running the command: brew install tesseract.
If you want to use language training data not included with the homebrew package, download the appropriate training data, open it with Finder, and copy the .traineddata file into the /usr/local/Cellar/tesseract/<version>/share/tessdata directory.
An installer is available for Windows from our download page. This includes the English training data.
If you want to use another language, download the appropriate training data, unpack it using 7-zip (http://www.7-zip.org/) , and copy the .traineddata file into the ‘tessdata’ directory, probably
C:\Program Files\Tesseract OCR\tessdata.
Tesseract may work on more exotic platforms too. You can either try compiling it yourself, or take a look at the list of other projects using Tesseract.
Tesseract has a small footprint and will run on most recent hardware, even on mobile devices.
Most relevant documentation can be found at the project website,http://code.google.com/p/tesseract-ocr/.