Skip to content

Overview

WinnowTag edited this page Sep 14, 2010 · 2 revisions

The classifier is built in the C programming language and is targeted at the POSIX platform. Typically, the classifier will run as a daemon process. The daemon process includes an embedded HTTP server providing a web interface for controlling classification jobs.

A typical job processing chain consists of a front end sending a REST request to the classifier to perform the classification for a given tag. This job is added to the classifiers internal queue and a reference to the job’s progress status URL is returned to the front-end. While the job is being processed the front-end can perform GET requests on the returned URL to get updates to the status of the job.

When the classifier starts processing the job, it first fetches the training document for the tag. This document contains the items used the user has provided as manual examples for the tag. The classifier then builds a tagger that is used internally to perform classification. The tagger is run over each item in the classifier’s internal cache to produce a probability that the item is within that tag. For each item where the probability is over a certain threshold, the item is added to a classifier taggings document which is then sent back to the front-end for storage and updating of the display.

There are 4 main components to the classifier:

  • HTTP server
  • Item Cache
  • Tagger Cache
  • Classification Engine

The following diagram shows their relationships:

Next this document provides details on each of these components, then details of the protocols used, and finally, a discussion of some of the scaling issues we will face.

Clone this wiki locally