A parser for unruly CSVs
Parse CSVs with heirarchical headers and duplicated headers. Skip lines by line number, etc.
Read below to get started, or see the API Documentation for more details.
Add this line to your application's Gemfile:
gem 'uncsv'
And then execute:
bundle
Or install it yourself as:
gem install uncsv
Reading a CSV with Uncsv is similar to using Ruby's built-in CSV class. Create
a new instance of Uncsv
and pass it a String
or IO
. The second argument
is an options hash, see below.
require 'uncsv'
data = "A,B,C\n1,2,3"
csv = Uncsv.new(data, header_rows: 0)
csv.map do { |row| row['B'] }
Uncsv can read directly from the filesystem with the open
method.
Uncsv.open('my_data.csv')
Uncsv is an Enumerable
. All enumerable methods like each
, map
, reduce
,
etc. are supported.
data = "A,B,C\n1,2,3\n4,5,6"
csv = Uncsv.new(data, header_rows: 0)
c_total = csv.reduce do { |sum, row| sum + row['C'] }
The following options can be passed as a hash to the second argument of the Uncsv constructor, or set inside the constructor block.
Uncsv.new(data, skip_blanks: true)
# Is equivalent to
Uncsv.new(data) do |config|
config.skip_blanks = true
end
:expand_headers
: Defaultfalse
. If set totrue
, blank header row cells will assume the header of the row to their left. This is useful for heirarchical headers where not all the header cells are filled in. If set to an array of header indexes, only the specified headers will be expanded.:header_rows
: Default[]
. Can be set to either a single row index or an array of row indexes. For example, it could be set to0
to indicate a header in the first row. If set to an array of indexes ([1,2]
), the header row text will be joined by the:header_separator
. For example, if if the cell (0,0) had the value"Personal"
and cell (1,0) had the value "Name", the header would become"Personal.Name"
. Any data above the last header row will be ignored.:header_separator
: Default"."
. When using multiple header rows, this is a string used to separate the individual header fields.:nil_empty
: Defaulttrue
. Iftrue
, empty cells will be set tonil
, otherwise, they are set to an empty string.:normalize_headers
: Defaultfalse
. If set totrue
, header field text will be normalized. The text will be lowercased, and non-alphanumeric characters will be replaced with underscores (_
). If set to a string, those characters will be replaced with the string instead. If set to a hash, the hash will be treated as options to KeyNormalizer, accepting the:separator
, and:downcase
options. If set to another object, it is expected to respond to thenormalize(key)
method by returning a normalized string.:skip_blanks
: Defaultfalse
. Iftrue
, rows whose fields are all empty will be skipped.:skip_rows
: Default[]
. If set to an array of row indexes, those rows will be skipped. This option does not apply to header rows.:unique_headers
: Defaultfalse
. If set totrue
, headers will be forced to be unique by appending numbers to duplicates. For example, if two header cells have the text"Name"
, the headers will become"Name.0"
, and"Name.1"
. The separator between the text and the number can be set using the:header_separator
option.
See the documentation for Ruby's built-in CSV
class for the following
options.
:col_sep
:field_size_limit
:quote_char
:row_sep
:skip_blanks
After checking out the repo, run bundle
to install dependencies. You
can also run bin/console
for an interactive prompt that will allow you to
experiment.
To check your work, run bin/rake
to check code style and run the tests. To
generate a code coverage report, set the COVERAGE
environment variable when
running the tests.
COVERAGE=1 bin/rake
Bug reports and pull requests are welcome on GitHub at https://github.com/machinima/uncsv.
Copyright 2018 Warner Bros. Entertainment Inc.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.