Troubleshooting

Failed to execute goal on project jate: Could not resolve dependencies for project uk.ac.shef.dcs:jate:jar:2.0-alpha-SNAPSHOT: Could not find artifact edu.drexel:dragontool:jar:1.3.3 in oss-sonatype (https://oss.sonatype.org/content/repositories/snapshots/)

To install the dragon library into your repository, you have to run mvn clean and then mvn install

' You cannot set an index-time boost on an unindexed field, or one that omits norms'

This may happen if you choose to index filtered candidate terms with score as boosting value in plugin mode (i.e., indexTerm = true and boosting = true). You need to turn on 'omitNorms' (set to 'false') in schema before you set the boosts.

Example setting as follows:

<field name="jate_domain_terms" type="string" indexed="true" stored="true" required="false" omitNorms="false" multiValued="true"/>

'org.apache.solr.common.SolrException: Error while creating field ...' when posting HTTP request for term ranking and indexing

You may encounter with the NullPointerException error similar to the following stack traces:

Error while creating field 'jate_cterms{type=jate_text_2_terms,properties=indexed,tokenized,termVectors}' from value 'null'</str><str name="trace">org.apache.solr.common.SolrException: Error while creating field 'jate_cterms{type=jate_text_2_terms,properties=indexed,tokenized,termVectors}' from value 'null'
    at org.apache.solr.schema.FieldType.createField(FieldType.java:263)
    at org.apache.solr.schema.SchemaField.createField(SchemaField.java:114)
    at uk.ac.shef.dcs.jate.util.SolrUtil.copyFields(SolrUtil.java:50)
    at uk.ac.shef.dcs.jate.solr.CompositeTermRecognitionProcessor.candidateExtraction(CompositeTermRecognitionProcessor.java:43)
    at uk.ac.shef.dcs.jate.solr.TermRecognitionRequestHandler.handleRequestBody(TermRecognitionRequestHandler.java:230)
...
Caused by: java.lang.NullPointerException
    at org.apache.solr.schema.FieldType.createField(FieldType.java:261)
    ... 30 more

Please check your 'text' field. It usually happens when 'stored' is set to 'false'. This will cause no content will be further copied and processed.

"uk.ac.shef.dcs.jate.JATEException: Cannot find expected field: jate_ngraminfo"

First, make sure you have expected field configured in your schema as well as corresponding configures in jate.properties. JATE2.0 needs the additional ngram field to retrieve frequency (i.e., TF, idf) information for various ATE algorithms. This field must be indexed and stored in index-time. For plugin mode, you need to check whether you have auto copy fields correctly configured so that your text field can be automatically copied into candidate term field (default as jate_cterms) and term ngram field (default as jate_ngraminfo).

Another common cause of this problem is that the input format is not correct/supported by JATE2.0. If you are using out-of-box embedded mode of JATE2.0, the app will detect and parse the input file using Tika. For your domain-specific format (e.g., json), current version of app may not be able to convert and index your text correctly. If you are using Solr plugin mode, document can be handled and indexed by Solr. For all the supported formats and how to index, please check Solr documentation "Uploading Data with Index Handlers".

To handle your own data, you can follow the practice in our unit test (e.g., ACLRDTECTest.java): 1) first, load all the raw files and convert them into JATEDocument. 2) make sure required files (typically "text") can be indexed and stored correctly (see addNewDoc in JATEUtil.java). JATE2 can be used as a library to your project in this case.

It is highly recommended to check your indexes with tools like luke(latest release). Alternatively, if you are on plugin mode, you use the Solr administration user interface to check your indexed data. For example, you can use Schema Browser to review schema data in a browser window. This will help you check consistency between actual indexes and configurations as well as easy investigation of possible problem.

java.lang.UnsupportedClassVersionError: uk/ac/shef/dcs/jate/app/AppCValue : Unsupported major.minor versi on ...

Make sure you are using JDK1.8 or above if you find this error.

App crashes without any error

Please report if you've encountered this problem. However, most of cases are due to memory issue. Most of ATE algorithm (typically like Chisquare, CValue and ATTF) consume lots of memory. Depending on the corpus size you need to analyse, you may always need to allocate more memory for your task. Please find a reference setting from our LREC paper. Note that it depends on your local environment, JVM version, etc. For example, we've found that to process >105,000 terms, Chisquare (in JATE2.0-beta.0 version) consumes more than 100GB memory as expected. So, we recommended you to consider a suitable pre-filtering solution to reduce the size of your candidate terms, e.g., stopwords, frequency.

There are also many options in JVM than you can turn. The following script provides an example:

#!/bin/sh

# http://www.jvmhost.com/articles/what-is-java-lang-outofmemoryerror-gc-overhead-limit-exceeded
# http://www.oracle.com/technetwork/tutorials/tutorials-1876574.html

rm -rf jate/testdata/solr-testbed-chisquare/ngramCore/data
nohup <jdk1.8.x_DIR>/bin/java -Xms1024m -Xmx106000m -XX:-UseGCOverheadLimit -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:ConcGCThreads=100 -cp jate-2.0.2.jar uk.ac.shef.dcs.jate.app.AppChiSquare -pf.mttf 2 -prop jate.properties -corpusDir my-corpus/txt -o my_corpus_chisquare_terms_ngram1-5_mttf2.json jate/testdata/solr-testbed-chisquare ngramCore > my_corpus_chisquare.log &

solr.allow.unsafe.resourceloading=true is required to load JATE cores

Solution: add a line to 'solr.cmd' for Windows or 'solr' for Linux

'solr.cmd':

...
REM Used to report errors before exiting the script
set SCRIPT_ERROR=
set NO_USER_PROMPT=0

set SOLR_OPTS=%SOLR_OPTS% -Dsolr.allow.unsafe.resourceloading=true

REM Allow user to import vars from an include file
REM vars set in the include file can be overridden with
REM command line args
IF "%SOLR_INCLUDE%"=="" set "SOLR_INCLUDE=%SOLR_TIP%\bin\solr.in.cmd"
IF EXIST "%SOLR_INCLUDE%" CALL "%SOLR_INCLUDE%"
...

no output of candidate terms when adapting to different language ?

This happens particularly when you replace your PoS tagger with different tag bank. In this case, you have to change your candidate pattern file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly