turbo_csv2 is an efficient lightweight pull parser for parsing CSV files.
The parser only supports reading of tokens and maintaining the internal state. It does not offer any kind of caching of records
- Full conformance to the RFC 4180 standard
- Support for custom Dialects to read different variations of csv files
- Support for custom FileReaders( memory_map, stringstream etc). The user only needs to adapt these readers to the FileReader interface.
- Support for both CRLF and LF end of line sequence
The two important concepts in turbo-csv2 are
- token
- events
token : The smallest element of parsing in turbo-csv2 is field. Fields basically refer to the value of csv files seperated by a ,
or some other field seperator
events : Whenever the user calls next in csv_parser, the parser updates/ changes its state to indicate the current events active.
events allows the user to distinguish between field
,record
etc and also allows the user to notify themselves of errors
, end of document
etc
Multiple events can be active at the same time helper functions are provided to the user for testing which events are active at the current moment.
Following events are can be generated by turbo-csv2 parser
EVENT | Description |
---|---|
FIELD | Indicates that a field has been read |
START_RECORD | Indicates the starting of a new record |
END_RECORD | Indicates the ending of a record |
END_DOCUMENT | Indicates that no more input is avaiable from the reader |
ERRORED | Indicates that the file could not be opened or some internal error has occured |
To count the number of records in the csv file :
#include<turbo-csv2/turbo-csv2.hpp>
int main() {
turbo_csv::csv_parser csv_reader;
// Buffer management is user's responsiblity
csv_reader.put_buffer(buffer,length); // Supply buffer and length
// Do not forget to close stream afterwards
int row_count = 0;
while (!current_event.is_active(turbo_csv::ERRORED|turbo_csv::END_DOCUMENT)) {
if (current_event.is_active(turbo_csv::END_RECORD)) {
row_count++;
}
current_event = csv_reader.next();
}
std::cout << "\n Total Rows: " << row_count << '\n';
return 0;
}
To get a field value:
#include<turbo-csv2/turbo-csv2.hpp>
int main(){
turbo_csv::csv_parser csv_reader;
// Buffer management is user's responsiblity
csv_reader.put_buffer(buffer,length); // Supply buffer and length
// Do not forget to close stream when the whole data is read
std::int32_t first_field_data;
auto current_event= csv_reader.next();
auto field_value= current_event.value<std::int32_t>();
std::cout<<"Field Value : "<< field_value<<'\n';
}
event also has a overload of value() method that does not throw. It returns a string view into the token.
Method | Description |
---|---|
event_active() | Takes a event type as an argument and returns whether that perticular event is active or not. |
events() | Returns the event mask that depicts the events active within the parser( internal state of the parser). Multiple events can be active at the same time (example : START_RECORD and FIELD) |
value() | Returns a view into the token value read by the parser. Has a noexcept guarantee |
value() | Returns the token value (field if available) deserialized to the desired type T |
event_active: Takes a event type as an argument and returns whether that perticular event is active or not
events: Returns the event mask that depicts the events active within the parser( internal state of the parser). Multiple events can be active at the same time (example : START_RECORD and FIELD)
value : Returns a view into the token value read by the parser. Has a noexcept guarantee
value(): Returns the token value (field if available) deserialized to the desired type T
Dialect consists of a number of static methods that are used to identify seperators, ignore characters, escape characters etc. User can provide his own dialect class which can allow the parser to parse different variants of CSV files.