Skip to content

A low level csv parser implementation for Boost.XML competency test

Notifications You must be signed in to change notification settings

gopi487krishna/turbo-csv-2

Repository files navigation

turbo_csv2

turbo_csv2 is an efficient lightweight pull parser for parsing CSV files.

The parser only supports reading of tokens and maintaining the internal state. It does not offer any kind of caching of records

Features

  • Full conformance to the RFC 4180 standard
  • Support for custom Dialects to read different variations of csv files
  • Support for custom FileReaders( memory_map, stringstream etc). The user only needs to adapt these readers to the FileReader interface.
  • Support for both CRLF and LF end of line sequence

Design

image

The two important concepts in turbo-csv2 are

  • token
  • events

token : The smallest element of parsing in turbo-csv2 is field. Fields basically refer to the value of csv files seperated by a , or some other field seperator

events : Whenever the user calls next in csv_parser, the parser updates/ changes its state to indicate the current events active.

events allows the user to distinguish between field,record etc and also allows the user to notify themselves of errors, end of document etc

Multiple events can be active at the same time helper functions are provided to the user for testing which events are active at the current moment.

Following events are can be generated by turbo-csv2 parser

EVENT Description
FIELD Indicates that a field has been read
START_RECORD Indicates the starting of a new record
END_RECORD Indicates the ending of a record
END_DOCUMENT Indicates that no more input is avaiable from the reader
ERRORED Indicates that the file could not be opened or some internal error has occured

Basic Usage

To count the number of records in the csv file :

#include<turbo-csv2/turbo-csv2.hpp>
int main() {
    
    turbo_csv::csv_parser csv_reader;
    // Buffer management is user's responsiblity
    csv_reader.put_buffer(buffer,length); // Supply buffer and length
    // Do not forget to close stream afterwards


    int row_count = 0;
    
    while (!current_event.is_active(turbo_csv::ERRORED|turbo_csv::END_DOCUMENT)) {
       
        if (current_event.is_active(turbo_csv::END_RECORD)) {
            row_count++;
        }
        current_event = csv_reader.next();
    
    }
    std::cout << "\n Total Rows: " << row_count << '\n';
    return 0;
}

To get a field value:

#include<turbo-csv2/turbo-csv2.hpp>
int main(){

    turbo_csv::csv_parser csv_reader;
    // Buffer management is user's responsiblity
    csv_reader.put_buffer(buffer,length); // Supply buffer and length
    // Do not forget to close stream when the whole data is read

    std::int32_t first_field_data;

    auto current_event= csv_reader.next();

    auto field_value= current_event.value<std::int32_t>();

    std::cout<<"Field Value : "<< field_value<<'\n';
}

event also has a overload of value() method that does not throw. It returns a string view into the token.

Common methods

Method Description
event_active() Takes a event type as an argument and returns whether that perticular event is active or not.
events() Returns the event mask that depicts the events active within the parser( internal state of the parser). Multiple events can be active at the same time (example : START_RECORD and FIELD)
value() Returns a view into the token value read by the parser. Has a noexcept guarantee
value() Returns the token value (field if available) deserialized to the desired type T

event_active: Takes a event type as an argument and returns whether that perticular event is active or not

events: Returns the event mask that depicts the events active within the parser( internal state of the parser). Multiple events can be active at the same time (example : START_RECORD and FIELD)

value : Returns a view into the token value read by the parser. Has a noexcept guarantee

value(): Returns the token value (field if available) deserialized to the desired type T

Dialect

Dialect consists of a number of static methods that are used to identify seperators, ignore characters, escape characters etc. User can provide his own dialect class which can allow the parser to parse different variants of CSV files.

About

A low level csv parser implementation for Boost.XML competency test

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published