Skip to content

HTTP Interface

WinnowTag edited this page Sep 14, 2010 · 1 revision

The Classifier includes an embedded HTTP server that is used to control classification jobs and add new items to the classifier’s Item Cache.

Classification Job Control

The HTTP interface for classification job control allows a user to start, query and stop a classification job. This is all done through a REST style interface where jobs are represented as resources within the classifier. All resources are represented in XML. All operations may return HTTP status codes in the case of something going wrong during the processing of the request.

Starting a job

To start a job, a caller needs to seed a job description to the classifier using a HTTP post to /classifier/jobs. A reference to the resource URL for the job is then returned when the classifier successfully creates the job.

For example, the follow POST request with create a job for the tag defined at http://example.org/tag.atom:


POST /classifier/jobs HTTP/1.1
Host: localhost

<?xml version="1.0" ?>
<job>
	<tag-url>http://example.org/tag.atom</tag-url>
</job>

If the job is successfully created the classifier will return a 201 Created response like this:


HTTP/1.1 201 Created
Date: Fri, 7 Oct 2005 17:17:11 GMT
Content-Length: nnn
Content-Type: application/xml
Location: http://localhost/classifier/jobs/JOBID

Job Status Reporting

The URL provides in the location header returned by classifier is the resource URL for the job. This URL can be used to query the classifier on the status of the job using the HTTP GET method.

The classification job description shows the job’s progress using a state attribute. The state attribute can be one of the follow values:

  • Waiting: The job is in the queue waiting for a worker thread to start it.
  • Training: The tagger for the job is being trained with the examples provided in the tag definition. Training involves combining the features in each of the positive and negative examples with the random background to end up with an index of feature identifiers and probabilities for the tagger. The details of how this computed and combined is left up to the classifier implementation.
  • Classifying: The job is classifying each item in the item cache.
  • Inserting: The job is sending the resulting items and probabilities to the callback URL provided by the tagger.
  • Complete: The job is complete!
  • Error: An error occurred while processing the job.

The description also includes these attributes:

  • progress: Percentage of job completed.
  • duration: Number of seconds the job has been running for.
  • error-msg: A human readable error message when the job state is Error. (Optional)

For example, the follow GET request:


GET /classifier/jobs/JOBID
Host: localhost

Will return an XML document describing the status of the job identified by JOBID.

For example:


HTTP/1.1 200 Ok
Date: Fri, 7 Oct 2005 17:17:11 GMT
Content-Length: nnn
Content-Type: application/xml

<?xml version="1.0" ?>
<job>
	<id>JOBID</id>
	<progress>10.0</progress>
	<status>Training</status>
	<duration>0.1</duration>
	<error-msg></error-msg>
</job>

Cancelling and Deleting Jobs

When a job is complete or a user wants to cancel it, the client should send a delete request to the job’s URL. For example:


DELETE /classifier/jobs/JOBID HTTP/1.1
Host: localhost

This will cancel the job if it is still running or if it is complete it will delete the job. After a DELETE has been send to a job URL, subsequent GETs on that URL will return a HTTP 404 response.

Parameters

The follow command line parameters are relevant to the HTTP server.


-p, --port N     the port to run the HTTP server on
    -a, --allowed_ip IP_ADDRESS
                     An IP address to allow to make HTTP requests
                     Default: any

Clone this wiki locally