-
Notifications
You must be signed in to change notification settings - Fork 0
Home
For context necessary to understanding these repositories, first try out http://winnowtag.org by signing up for an account, and also scan the documentation at http://doc.winnowtag.org.
winnowTag is a suite of three major components that communicate via REST API. Each has its own repository (see below).
This documentation is being actively updated and completed during August 2011. If you'd like to get started in the meantime, please email [email protected].
A Rails 2.3.8 application implementing the public-facing UI of http://winnowtag.org. Users with the admin role see a few more features (e.g. importing and exporting tags) and have an Admin tab for managing users, etc.
Repository: https://github.com/WinnowTag/winnowTag
A Rails 2.3.8 application serving as the back end for collecting items from feeds. The admin UI of the collector is older and more rough than the UI of winnowTag (e.g., you create users via the Rails console) but the collection of feeds is robust and reliable, having been very well tested by production use over time.
Repository: https://github.com/WinnowTag/collector
##winnow text recommendation engine (classifier) A C application implementing high-performance tuned bayesian classification of text items. This is the back end for the smart tags you see in http://winnowtag.org.
Repository: https://github.com/WinnowTag/winnow
##ruby scripts Various scripts managed by god and cron drive the process of getting new feed items into the winnowTag database, classifying them, backing up tags, etc.
##Chef scripts Chef recipes and templates deploy http://winnowtag.org production to Scalarium. Because the base Scalarium configuration provides a simple Ruby/Rails/Passenger/Apache stack, those Chef scripts are also serve to define the requirements for manual setup of a development environment in OS X or any Linux distro.
Currently http://winnowtag.org has a somewhat low-power deployment: one Standard Large EC2 instance to run the suite, and a second Standard Large EC2 instance for MySQL.
http://winnowtag.org production collects from about 8,000 feeds many times each day, collecting and classifying about 10k items each day, and keeps items in a 90 day window, about one million items at any given time. When you change the positive or negatives examples for a tag and press the "Run winnowTagger" button, as the progress bar completes, that one million items are being reclassified according to your newly retrained tag.
The REST architecture permits the configuration to be varied as needed according to the number of feeds, duration items are retained, and number of active users. For example, before 2011 we ran the suite as a cluster with separate slices and load balancing for two winnowTag instances, two collector instances, one winnow instance, MySQL master and slave.
The entire suite, including MySQL, can be run on a single HighCPU Medium EC2 instance. It's probably reasonable to collect 1,000 feeds and retain 100,000 items in that configuration.
The quickest and easiest way to create a functioning version of the entire winnowTag suite is to use Scalarium and the same set of Chef scripts that deploy our production system http://winnowtag.org to Amazon EC2.
We're in the process of reducing this to a "single button press" operation. In the meantime please email [email protected] for help getting set up. This is a complex system, we expect you'll need some help to get it running. Please get in touch!
winnowTag runs on Rails 2.3.8 and MySQL. The Chef recipes and templates in https://github.com/WinnowTag/cookbooks/tree/master/winnowTag govern deployment to Scalarium. Because the base Scalarium configuration provides a simple Ruby/Rails/Passenger/Apache stack, those Chef scripts are also serve to define the requirements for manual setup of a development environment in OS X or any Linux distro. The entire suite is well tested under OS X, Ubuntu 9.10, and CentOS 5.
Due to age of the project, many gems are still vendorized. Some deployment environments will require conversion to Bundler (e.g, EngineYard). It's on our list to dump vendorized gems and only use Bundler.
You can do a lot of development on winnowTag itself without running any other suite elements. It's easy to add the classifier and run that locally, too, making the "Run winnowTagger" button work. Or you can run the entire suite locally, including the collector.
Important: If you use OS X for development ensure the Max Packet Size of MySQL is at least 10MB, as otherwise training and classification will fail. On OS X when MySQL was installed with an installer, create (or edit) /usr/local/mysql/etc/my.cnf to include these lines:
[mysqld]
max_allowed_packet=10M
If you haven't already done so, clone the git repository for Winnow:
git clone [email protected]:WinnowTag/winnowTag.git
Create a development and test database in MySQL for winnow to use. Use database names winnow_development and winnow_test, and winnow/winnow as user/password for both databases. The production database is named "winnow".
NOTE: The name of winnowTag used to be "winnow", which is why we are still using that as a database name, despite the fact that the classifier is now named "winnow". (It would have been very inconvenient to change the name of winnowTag's production database.)
From a shell type the following:
>mysql5 -u <mysql-admin-user> -p
# These commands can then be entered into the MySQL shell
mysql>create database winnow_development;
mysql>create database winnow_test;
mysql>grant all privileges on winnow_development.* to 'winnow'@'localhost' identified by 'winnow';
mysql>grant all privileges on winnow_test.* to 'winnow'@'localhost' identified by 'winnow';
In the config directory copy database.yml.example to create database.yml. Edit database.yml to change username/password to winnow/winnow for both databases.
You can "rake db:migrate" to create a new database from scratch, but then bootstrapping the first user admin user must be done from the Rails console.
It's on our list to provide a minimal database or seeds.rb so that no bootstrapping is required.
To build native gem extensions, do:
rake gems:build
To run the tests, in the Winnow root directory do:
rake
...and:
rake features
If you have trouble getting past any failures, email [email protected].
If it all worked, running script/server in the winnow directory will start up the server, then accessing http://localhost:3000 will display winnowTag's home page.
You should be able to create a winnow account, however sending the activation email may not work depending on your local setup. If you don't get an activation email you can manually activate all accounts using this:
script/console
User.find(:all).each {|u| u.activate; u.save}
Here's how to give any account the admin role:
script/console
user = User.find_by_name(<login_id_of_user>)
user.has_role 'admin'
user.save
If you have any problems, questions or comments, please email [email protected].