Installation

Installation

Installation package

The installation package consists of a zip file. The unpacked archive should contain two components: a PHP directory and a Perl directory. We will see how to deal with it in the ‘PHP/Perl’ section.

Apache

A webserver is needed to handle the PHP request. Apache is by no means the only suitable web server around, but it is very widely spread and the tool was developed and tested on it.

Apache settings

1. To ensure the tool will process diacritics and special characters properly, the right character set has to be set.

With Apache this can be done by setting the ‘AddDefaultCharset’ directive in the server’s httpd.conf file:

AddDefaultCharset UTF-8

2. Sometimes POST requests can be very large due to very big amounts of data being sent to and from the server. Therefore it might be necessary to increase the limit the web server imposes on such requests.

This can be done by setting the ‘LimitRequestLine’ directive in the server’s httpd.conf file:

LimitRequestLine 100000

3. With Apache it is necessary to restart the web server for these changes to have effect.

PHP/Perl

As noted above, a webserver should be running that can handle PHP requests. The source code of the tool should be placed in a directory for which PHP support is turned on.
The tool also needs to run some Perl code, so the webserver needs to support that as well.

The tool was tested on PHP 5.1.6 running on Red Hat Enterprise Linux Server release 5.2. It was running Perl v5.10.1 (*) built for x86_64-linux-gnu-thread-multi.

Source code

The source code is usually delivered in a zip file. The unpacked archive should contain two components: a PHP directory and a Perl directory.

The PHP directory contains the main code. Copy the whole directory to a location at which PHP support is turned on. This is the location users will have to go to if they want to work with CoBaLT.

The Perl directory contains the code of the tokenizer. Copy the whole directory to a location at which Perl can be run, and make sure that the PHP code can access this Perl directory (as the PHP application will need to run the Perl tokenizer from the command line).
NOTE that later on, you will have to declare the location of the tokenizer in the CoBaLT configuration, in such a way that the tool can find and use it. We will see that in the ‘CoBaLT configuration’ section.

PHP settings

1. First of all, we need to enable output buffering in PHP. It decreases the amount of time it takes to render information in the tool, among others. This can be done by setting the ‘output_buffering’ variable in the php.ini file:

php_value output_buffering 4096

2. Second, we need to set the maximum size allowed for files to be uploaded, as working with CoBaLT starts with uploading a corpus, which might consist of large files. For some systems the maximum size allowed for a file to be uploaded is only 2 Mb. If you have files larger than this, you might want to increase this limit somewhat. This can be done by setting the ‘upload_max_filesize’ in the php.ini file:

php_value upload_max_filesize = 30M

Of course, the 30M can be any value you like.

To make this work, the php.ini variable ‘post_max_size’ must be set higher than ‘upload_max_filesize’ (or at least equal):

php_value post_max_size 30M

This setting is on its turn depending on another setting. The ‘memory_limit’ variable in the php.ini file must generally speaking be set higher than the ‘post_max_size’:

php_value memory_limit = 256M

Of course, the 256M can be any value you like.

3. Third, it can be convenient to set an error log, for debug purposes in case some problem occurs. This also can be done in the php.ini file:

php_value error_log /some_directory/some_subdirectory/php-scripts.log

NB: full information about php.ini directives can be found at http://php.net/manual/en/ini.core.php

Uploading multiple files, PHP zip support

PHP can not read directory listings, so it is not possible to upload an entire directory in one go. As a workaround, zip files can be used. When a directory is zipped, it can be uploaded at once to CoBaLT.

In order for this to work, PHP has to have its zip functionality enabled. Please refer to the PHP documentation for your platform and version.

As a reference, our phpinfo() shows this:

zip

Zip

enabled

Extension Version

$Id: php_zip.c,v 1.95.2.6 2007/05/19 22:35:49 pajoye Exp $

Zip version

1.8.10

Libzip version

0.7.1

MySQL support for PHP

The PHP installation needs to be configured with MySQL support. Usually the standard installations provide this. If not, please refer to the PHP documentation on your system.

MySQL

In order to set up the MySQL database for the tool, root access is needed to the MySQL server.

The tool was tested on MySQL 5.0.45 (and following versions), running on Red Hat Enterprise Linux Server release 5.2.

MySql settings

1. The tool runs quite some heavy queries, most notably the ones that give the overview of the type-frequency list, that make extensive use of MySQL’s built in GROUP_CONCAT() function. The result of this function can quite easily become larger than the default length that is allowed. There are two MySQL server variables that control how much data can be sent by the server: @@max_allowed_packet and @@group_concat_max_len. To see what they are currently set to, the following queries can be used:

mysql> SELECT @@group_concat_max_len;

mysql> SELECT @@max_allowed_packet;

It is advisable to set these values to 64 Mb or higher. This can be done by these statements:

mysql> SET GLOBAL max_allowed_packet = 67108864;

2. Additionally, we need to set the ‘group_concat_max_len’ and ‘max_allowed_packet’ variables in the my.cnf file. Otherwise large query results concatenated by the GROUP_CONCAT function will be trunked and cause malfunction of the tool:

group_concat_max_len=102400;

max_allowed_packet=300000;

# max_allowed_packet must be higher than group_concat_max_len

Restart the MySql server after that.

Create databases

The tool expects at least two MySQL databases to be there with the right table structure. The first one is used as an index to the word forms in the corpora and is called the ‘token database’. The second one is used for storing all the lexicon data and is called ‘lexicon database’.

In a normal installation, each CoBaLT project has its own lexicon database and token database instances, distinct from the database instances of other CoBaLT projects.

NOTE that it is possible to configure the tool in such a way that the token database has to be created only once per database host (so usually once in the lifetime of a distribution of CoBaLT). In such an installation, each project will normally get its own lexicon database instance, but one single token database instance will be shared among all projects.
However, this type of installation (which was originally the default one) is nowadays heavily discouraged since it makes moving a CoBaLT project to another server/computer more difficult, so we do not describe this type of installation here.

The distribution of the tool comes with some SQL files in the ‘sql’ directory of the installation.

image001

These files, called ‘emptyLexiconDatabase.sql’ and ‘emptyLexiconTokenDatabase.sql’ contain the data structures of both databases. These files need to be loaded into MySQL.

Before this can be done, the databases need to be created. This is done in MySQL by running these queries:

mysql> CREATE DATABASE myLexiconDb;

mysql> CREATE DATABASE myLexiconTokenDb;

In these queries the italic part should be replaced by your own database name.

NOTE that you’ll later on need to declare in the configuration of CoBaLT which token database belongs to a given lexicon database. We will see how to do that in the ‘CoBaLT configuration’ section.

For now, the SQL files that come with the distribution (in the ‘sql’ directory) should be loaded into the databases just created. On the command line you can do this by executing the following commands:

mysql –h<server> –u<user> –p<password>myLexiconTokenDb <emptyLexiconTokenDatabase.sql

mysql –h<server> –u<user> –p<password> myLexiconDb <emptyLexiconDatabase.sql

When a database has been created, a MySQL user should be added. This can be done by running the following statements:

mysql> GRANT ALL ON myLexiconDb.* TO ‘newUser‘@localhost IDENTIFIED BY ‘password‘;

mysql> GRANT ALL ON myLexiconDb.* TO ‘newUser‘@’%’ IDENTIFIED BY ‘password‘;

mysql> FLUSH PRIVILEGES;

Again, the italic parts in the queries should be set to your own desired values.

The entire database can be empty at start up, but for the users table which needs at least one row (that is: one redactor entitled to log in). New users can be added with the following statement:

mysql> INSERT INTO users (name) VALUES (‘Amy’), (‘Billy’), (‘Duffy’), (‘Ella’);

Now Amy, Billy, Duffy and Ella will be able to log in just by typing their names in the Log-in screen of the tool (see the ‘User interface’ section about that).

The language table

If analyses should allow for languages (see below), the different options should be listed in this table.