Configuration

CoBaLT configuration

The tool needs a bit of configuration before you can use it. The configuration is set in the globals.php file, which can be found in the PHP directory the your installation:

image001

that in previous versions of the tool, some configuration was also stored in the lexicontool.php file in the root directory of the application, but this is no longer the case. All configuration is now exclusively stored in php/globals.php.

globals.php

1. The first thing to set in this file is the database host and the username and password needed to access it. Put here the username and password of the MySql user you created before in the ‘Create a MySql user’ section.

$sDbHostName = “yourhost.com“;

$sDbUserName = “username_of_your_host“;

$sDbPassword = “password_of_your_host“;

2. Secondly, you have to declare at least one lexicon project to be run in the tool (which is what you actually use the tool for).

Say you have some Dutch lexicon building project. As stated in the ‘Create databases’ section, each project consists of a lexicon database and a token database. Let’s say that the lexicon database of your project is called ‘LexiconTool_NL’ and its token database is called ‘LexiconToolTokenDb_NL’.

You first have to declare the lexicon database, together with a user friendly project description (this description will be shown to users in the log-in screen).

$asProject[‘LexiconTool_NL’] = ‘My Dutch lexicon building project';

Then you need to declare which token database belongs to this lexicon database:

$asTokenDbName[‘LexiconTool_NL’] = ‘LexiconToolTokenDb_NL';

For any new project to be run in the tool, just repeat the two lines above with the relevant database names:

$asProject[‘LexiconTool_NL’] = ‘My Dutch lexicon building project';

$asProject[‘LexiconTool_DE’] = ‘Some German project is cool too';

$asTokenDbName[‘LexiconTool_NL’] = ‘LexiconToolTokenDb_NL';

$asTokenDbName[‘LexiconTool_DE’] = ‘LexiconToolTokenDb_DE';

If more projects are indeed declared, it can be convenient to have the project being mostly used at the moment selected by default in the log-in screen. This can be done by giving the $sChecked variable the name of the relevant lexicon database as a value:

$sChecked = ‘LexiconTool_DE'; // the German project will be selected by default

If no project must be preselected, give this variable NULL as a value:

$sChecked = NULL;

3. Third, the corpus files uploaded to the tool can be stored in two ways:

  • on a file server, or
  •  in the lexicon database (in the ‘documents’ table).

 Saving the file in the ‘documents’ table of the lexicon database has a strong advantage. If somehow you need to move your CoBaLT project to another server, having the files stored in the lexicon database makes moving the project much easier (because only the database will need to be moved, and no separate files). So, choosing file storage in the lexicon database is recommended.

This has to be set in the $aFullDatabaseMode array:

// save the German files into the lexicon database

$aFullDatabaseMode[‘LexiconTool_DE’] = true; // default and recommended

// save the Dutch files on the file server

$aFullDatabaseMode[‘LexiconTool_NL’] = false;

BEWARE if you choose file storage on the file server! When some project run in CoBaLT has reached its end and you want to export the results by enriching your original (uploaded) corpus files with the analyses created in CoBaLT, you’ll need to have these original upload files at your disposal. So do not throw these files away or you will not be able to export the results of your work.

4. Whatever mode you choose for the file storage, you will need to state some document root CoBaLT will (at least temporarily) put the documents into when those are uploaded into the tool.

This document root can be set to any available directory, as long as this directory is readable and writable for the tool. If you’ve chosen for file storage on the file system (‘false’ setting in section [3]), it is advisable to avoid having this directory in the same directory the application is in, because it might in that case accidentally be thrown away when a new version of the tool is installed. The document root can be set this way:

$sDocumentRoot = “/servername/some_directory/uploadedDocuments”;

If you choose to upload your documents in a zip file, it is needed to declare a subdirectory in which CoBaLT will unpack the archive. This subdirectory will be created automatically if it doesn’t exist yet, but you must declare a name for it:

$sZipExtractDir = ‘zipExtractDir'; // subdirectory in the document root

5. When corpus files are being uploaded into CoBaLT, the tool will need to tokenize them as a first step before anything else (see chapter about tokenizing). To be able to do that, the tool needs to know where to find the tokenizer:

The default tokenizer can do things differently depending on the language it deals with. The proper language can be declared the following way:

If this setting gives unexpected results in your language, the tokenizer.pm file of the default tokenizer will need to be given some extra specific code for proper processing (check the section about ‘Tokenizing’ if you wish to configure the tokenizer for your own language anyway).

6. For the tool to be able to call the tokenizer the right permissions should be set. The web server runs as a certain user (e.g. ‘www-data’). It is necessary for this user to have execute permissions in the tokenizer directory and for the tokenizer executable itself.

For Ubuntu it has proven necessary to add the web server user to the sudoers file and to prepare the call to the tokenizer with a ‘sudo’ command.

There is a setting for this in the php/globals.php file, called $sSudo. Just give the variable an empty string as value if this not required.

7. In some projects you might need to use a different character set. To do so, you need to follow the following steps:

  • if you’re using a special font type stored in .eot and .otf files, copy those files to the ‘fonts’ directory of the application (if you’re not, just skip to the next step).
  • now go to the ‘css’ directory of the application.
  • make a copy of the ‘lexiconTool.css’ file.
  • open your copy of the file in a text editor.
  • now, if you’ve copied .eot and .otf files to the ‘fonts’ directory as a first step, you’ll need to declare those files here (if you’re not using such files, just skip to the next step). To do so, add the following lines on top of the file:

     @font-face {
font-family: YourFontNameHere;
src: url(‘../fonts/YourFontNameHere.eot’);
}
@font-face {
font-family: YourFontNameHere;
src: url(‘../fonts/YourFontNameHere.otf’);
}

  • now look for blocks of lines starting with ‘td.wordCol’, ‘.contextWord’, ‘.contextWord_’ and ‘.matchedPart’. If you don’t find some of them, add those with curly brackets like this:

     td.wordCol {     }

  • in each of the blocks of lines just quoted, make sure that the curly brackets contain a declaration of your font type, in such a way that it looks like this:

     td.wordCol {
font: 8px YourFontNameHere;
}

NOTE: If the curly brackets already contain a font declaration, remove the font names declared on those lines and put the font name you need instead.

Now, tell the tool to use your copy of the ‘lexiconTool.css’ file in the relevant project by adding the following line to the globals.php file:

$asCustomCss[‘your_lexicon_database_name’] = ‘your_copy_of_lexiconTool.css';

Other options of the php/globals.php file are less likely to need customization, so we will not discuss them here.