README

Introduction
------------

This is a small utility intended for use in extracting a school's
historical data from SchoolBase, ready for loading into iSAMS.

It was written for one particular school and was not intended as
a general-purpose product.  However, that school has given permission
for it to be released under the GPL, and it should greatly reduce
the amount of work needed to be done by other schools in the future.

It dumps some of the existing database tables in SchoolBase, then
re-structures the data as requested by iSAMS.

Because of the unfortunate choice of Excel files as the input
mechanism required by iSAMS, it is not possible fully to automate
the preparation.  (Or at least, I couldn't find a way to do it, and
iSAMS were unable to make any useful suggestions.  Others with more
familiarity with writing to Excel files may be able to suggest a
way.)

This utility writes CSV files which can then be manually injected
into iSAMS's Excel templates - one CSV file per sheet in a workbook.
As you do this, make sure you choose the right encoding (UTF-8) for
each CSV files.  Excel will try to guess the encoding, and whilst
its guess varies, in my experience it absolutely always got it wrong.


There are two further problems with this process, around which you
will have to negotiate with iSAMS.

1) Some of the fields specified in the iSAMS template are too short.
   1000 characters is not enough to hold the text of a report, and
   various other things.

2) There is a bug in Excel which means it fails to handle correctly
   fields containing an embedded line feed, even though the field is
   correctly quoted in the CSV file.  Two different approaches have
   been taken to get around this.

   For some of the affected text fields, this utility simply strips out
   the line feed characters.  The text comes across, but the formatting
   into separate paragraphs is lost.

   Where this was not acceptable (e.g. for academic reports) the utility
   writes the fields unchanged and I did eventually find a workaround
   which let me get them into Excel.  Open the CSV file in LibreOffice,
   being careful to specify the correct encoding (UTF-8).  The CSV importer
   in LibreOffice works correctly and preserves the line feeds.  Then
   save the new sheet in XLS or XLSX format, open it in Excel, and cut
   and paste the whole sheet into your Excel workbook.

   For some odd reason, Excel runs *incredibly* slowly - several minutes
   to open a file - when faced with a spreadsheet which has embedded line
   feeds in a file, but it does at least get there eventually.


Getting started
---------------

You will need a working copy of Ruby on your computer.  This is relatively
straightforward on Linux or MacOS, and not too difficult on Windows.  I would
suggest the use of RVM to get things set up quickly.  The version of Ruby
used for our extraction was 2.1.10, but anything later should be fine too.

Then you need a few Gems installed.  Look at code/extractor.rb for a
definitive list.  The two obvious ones are:

* charlock_holmes - used because our SB d/b at least had text within
                    it in various encodings.  The utility irons this out
                    and makes sure everything is in the same encoding,
                    UTF-8.  If you want to switch to something else, this
                    is probably best done at the stage of writing the
                    final CSV files.  You can specify it when opening the
                    files.

* tiny_tds          For access to the SB database.

Create a user ID on your SB database which has only read access and edit
code/extractor.rb to use that user name.  The password is entered
directly into the utility when it is run.

You also need some extra directories apart
from those created when you checked it out of git.  The relevant
directories are not stored in git because they contain school-specific
confidential information.

In parallel to the code directory you need:

  csv
  output
  yml

The csv directory is used to store the tables dumped from SchoolBase.
These are then processed to create the tables required by iSAMS, which
are written to the output directory.

The extractor works in two phases so that development work can be done
without needing immediate access to the SB database.  You do:

./extractor.rb --extract

to dump the SB tables to intermediate CSV files, and then:

./extractor.rb --generate

to do all the conversion work.  You can also do:

./extractor.rb --extract --generate

to do the whole thing in one go.

Type

./extractor.rb --help

to get information on other options, most of which relate to particular
little tweaks which we needed.

I would suggest setting up your database user, and then simply running
the utility and studying what comes out.  It will be structurally
correct from the point of view of the iSAMS templates, but will
need adjusting to suit your particular school.

You will want to tweak the output to suit your particular requirements.
For instance, the code will do a pretty good job of taking UK addresses
from the SchoolBase address fields and breaking them up appropriately for
the iSAMS fields, but presumably you will need to handle addresses for
whatever country you're dealing with.  Likewise, things like particular
rewards and sanctions which you have set up in SchoolBase will need to be
mapped to how you want them in iSAMS.

If there is an area you have decided not to transfer (e.g. medical records) you can simply ignore the corresponding output files.