Skip to content

Latest commit

 

History

History
248 lines (201 loc) · 24.3 KB

README.md

File metadata and controls

248 lines (201 loc) · 24.3 KB

VBA-CSV

Fast and convenient CSV reading and writing for VBA (Excel, Word, PowerPoint) and Excel spreadsheets, inspired by R's fread, and Julia's CSV.jl.

If you like this code please give it a star. I like stars.

Contents

Installation
Acknowledgements
Examples
Documentation
    CSVRead
    CSVWrite
Errors
Testing
Notes
Compatibility
About

Installation

  1. Download modCSVReadWrite.bas from the latest release.

  2. Import modCSVReadWrite.bas into your project (Open VBA Editor, Alt + F11; File > Import File).

  3. Add three references (In VBA Editor Tools > References)

    • Microsoft Scripting Runtime
    • Microsoft VBScript Regular Expressions 5.5
    • Microsoft ActiveX Data Objects 6.1 Library
        
  4. If you plan to call the functions from worksheet formulas then you might like to tell Excel's Function Wizard about them by adding calls to RegisterCSVRead and RegisterCSVWrite to the project's Workbook_Open event, which lives in the ThisWorkbook class module.

Private Sub Workbook_Open()
    RegisterCSVRead
    RegisterCSVWrite
End Sub
  1. Alternatively (or in addition), install Excel-DNA Intellisense and copy the worksheet _Intellisense_ from VBA-CSV-Intellisense.xlsx (in the latest release) into the same workbook into which you copied modCSVReadWrite.

Acknowledgements

I re-worked the parsing code of CSVRead after examining sdkn104's code available here; my approach is now similar to the one employed there, and the performance is similar too.

The documentation borrows freely from that of Julia's CSV.jl, though sadly VBA is not capable of Julia's extremely high performance. For testing CSVRead, I also use many of the test files that the authors of CSV.jl have created here.

Examples

Read from URL

Rdatasets provides 1,752 csv files including one containing names of passengers on the Titanic, their sex, age and passenger class, and whether or not they survived the sinking. In this file, missing data is indicated by the two characters NA.

To see the data in a worksheet, enter this formula1: =CSVRead("https://vincentarelbundock.github.io/Rdatasets/csv/carData/TitanicSurvival.csv",TRUE,,,,,,,,,,,,,"NA",NA())

example3

To load the data into an array in VBA:

Sub Demo()
    Dim TitanicData
    TitanicData = CSVRead(FileName:="https://vincentarelbundock.github.io" & _
                  "/Rdatasets/csv/carData/TitanicSurvival.csv", _
                  ConvertTypes:=True, MissingStrings:="NA")
End Sub

Write and Read "round-tripping"

In the GIF below, strings, numbers, dates and Booleans in the input Data range are written to file with the formula =CSVWrite(A2:D8,"C:\Temp\Test.csv") at cell F2, and the file is read back with the formula =CSVRead(F2,TRUE) at cell J2, so that the original types of the fields are recovered. The formula =CSVRead(F2,,FALSE) at H2 demonstrates how CSVRead can display the "raw" contents of a text file by passing Delimiter as FALSE. Refresh your browser (F5) to replay the GIF.

CSVReadWrite


Footnote 1: Assumes you're using Excel 365 or Excel 2021 with dynamic array formulas. If you're using an older version of Excel, this page explains how to enter a formula that returns an array.

Documentation

CSVRead

Returns the contents of a comma-separated file on disk as an array.

Public Function CSVRead(ByVal FileName As String, Optional ByVal ConvertTypes As Variant = False, _
          Optional ByVal Delimiter As Variant, Optional ByVal IgnoreRepeated As Boolean, _
          Optional ByVal DateFormat As String = "Y-M-D", Optional ByVal Comment As String, _
          Optional ByVal IgnoreEmptyLines As Boolean, Optional ByVal HeaderRowNum As Long, _
          Optional ByVal SkipToRow As Long, Optional ByVal SkipToCol As Variant = 1, _
          Optional ByVal NumRows As Long, Optional ByVal NumCols As Variant = 0, _
          Optional ByVal TrueStrings As Variant, Optional ByVal FalseStrings As Variant, _
          Optional ByVal MissingStrings As Variant, Optional ByVal ShowMissingsAs As Variant, _
          Optional ByVal Encoding As Variant, Optional ByVal DecimalSeparator As String, _
          Optional ByRef HeaderRow As Variant) As Variant
Argument Description
FileName The full name of the file, including the path, or else a URL of a file, or else a string in CSV format.
ConvertTypes Controls whether fields in the file are converted to typed values or remain as strings, and sets the treatment of "quoted fields" and space characters.

ConvertTypes should be a string of zero or more letters from allowed characters NDBETQK.

The most commonly useful letters are:
1) N number fields are returned as numbers (Doubles).
2) D date fields (that respect DateFormat) are returned as Dates.
3) B fields matching TrueStrings or FalseStrings are returned as Booleans.

ConvertTypes is optional and defaults to the null string for no type conversion. TRUE is equivalent to NDB and FALSE to the null string.

Four further options are available:
4) E fields that match Excel errors are converted to error values. There are fourteen of these, including #N/A, #NAME?, #VALUE! and #DIV/0!.
5) T leading and trailing spaces are trimmed from fields. In the case of quoted fields, this will not remove spaces between the quotes.
6) Q conversion happens for both quoted and unquoted fields; otherwise only unquoted fields are converted.
7) K quoted fields are returned with their quotes kept in place.

For most files, correct type conversion can be achieved with ConvertTypes as a string which applies for all columns, but type conversion can also be specified on a per-column basis.

Enter an array (or range) with two columns or two rows, column numbers on the left/top and type conversion (subset of NDBETQK) on the right/bottom. Instead of column numbers, you can enter strings matching the contents of the header row, and a column number of zero applies to all columns not otherwise referenced.

For convenience when calling from VBA, you can pass an array of two element arrays such as Array(Array(0,"N"),Array(3,""),Array("Phone","")) to convert all numbers in a file into numbers in the return except for those in column 3 and in the column(s) headed "Phone".
Delimiter By default, CSVRead will try to detect a file's delimiter as the first instance of comma, tab, semi-colon, colon or pipe found in the first 10,000 characters of the file, searching only outside of quoted regions and outside of date-with-time fields (since these contain colons). If it can't auto-detect the delimiter, it will assume comma. If your file includes a different character or string delimiter you should pass that as the Delimiter argument.

Alternatively, enter FALSE as the delimiter to treat the file as "not a delimited file". In this case the return will mimic how the file would appear in a text editor such as NotePad. The file will be split into lines at all line breaks (irrespective of double quotes) and each element of the return will be a line of the file.
IgnoreRepeated Whether delimiters which appear at the start of a line, the end of a line or immediately after another delimiter should be ignored while parsing; useful for fixed-width files with delimiter padding between fields.
DateFormat The format of dates in the file such as Y-M-D (the default), M-D-Y or Y/M/D. Also ISO for ISO8601 (e.g., 2021-08-26T09:11:30) or ISOZ (time zone given e.g. 2021-08-26T13:11:30+05:00), in which case dates-with-time are returned in UTC time.
Comment Rows that start with this string will be skipped while parsing.
IgnoreEmptyLines Whether empty rows/lines in the file should be skipped while parsing (if FALSE, each column will be assigned ShowMissingsAs for that empty row).
HeaderRowNum The row in the file containing headers. Type conversion is not applied to fields in the header row, though leading and trailing spaces are trimmed.

This argument is most useful when calling from VBA, with SkipToRow set to one more than HeaderRowNum. In that case the function returns the rows starting from SkipToRow, and the header row is returned via the by-reference argument HeaderRow. Optional and defaults to 0.
SkipToRow The first row in the file that's included in the return. Optional and defaults to one more than HeaderRowNum.
SkipToCol The column in the file at which reading starts, as a number or a string matching one of the file's headers. Optional and defaults to 1 to read from the first column.
NumRows The number of rows to read from the file. If omitted (or zero), all rows from SkipToRow to the end of the file are read.
NumCols If a number, sets the number of columns to read from the file. If a string matching one of the file's headers, sets the last column to be read. If omitted (or zero), all columns from SkipToCol are read.
TrueStrings Indicates how TRUE values are represented in the file. May be a string, an array of strings or a range containing strings; by default, TRUE, True and true are recognised.
FalseStrings Indicates how FALSE values are represented in the file. May be a string, an array of strings or a range containing strings; by default, FALSE, False and false are recognised.
MissingStrings Indicates how missing values are represented in the file. May be a string, an array of strings or a range containing strings. By default, only an empty field (consecutive delimiters) is considered missing.
ShowMissingsAs Fields which are missing in the file (consecutive delimiters) or match one of the MissingStrings are returned in the array as ShowMissingsAs. Defaults to Empty, but the null string or #N/A! error value can be good alternatives.

If NumRows is greater than the number of rows in the file then the return is "padded" with the value of ShowMissingsAs. Likewise, if NumCols is greater than the number of columns in the file.
Encoding Allowed entries are ASCII, ANSI, UTF-8, or UTF-16. For most files this argument can be omitted and CSVRead will detect the file's encoding. If auto-detection does not work, then it's possible that the file is encoded UTF-8 or UTF-16 but without a byte order mark to identify the encoding. Experiment with Encoding as each of UTF-8 and UTF-16.

ANSI is taken to mean Windows-1252 encoding.
DecimalSeparator In many places in the world, floating point number decimals are separated with a comma instead of a period (3,14 vs. 3.14). CSVRead can correctly parse these numbers by passing in the DecimalSeparator as a comma, in which case comma ceases to be a candidate if the parser needs to guess the Delimiter.
HeaderRow This by-reference argument is for use from VBA (as opposed to from Excel formulas). It is populated with the contents of the header row, with no type conversion, though leading and trailing spaces are removed.

source

CSVWrite

Creates a comma-separated file on disk containing Data. Any existing file of the same name is overwritten. If successful, the function returns FileName, otherwise an "error string" (starts with #, ends with !) describing what went wrong.

Public Function CSVWrite(ByVal Data As Variant, Optional ByVal FileName As String, _
          Optional ByVal QuoteAllStrings As Variant = True, Optional ByVal DateFormat As String = "YYYY-MM-DD", _
          Optional ByVal DateTimeFormat As String = "ISO", Optional ByVal Delimiter As String = ",", _
          Optional ByVal Encoding As String = "ANSI", Optional ByVal EOL As String = vbNullString, _
          Optional TrueString As String = "True", Optional FalseString As String = "False") As String
Argument Description
Data An array of data, or an Excel range. Elements may be strings, numbers, dates, Booleans, empty, Excel errors or null values. Data typically has two dimensions, but if Data has only one dimension then the output file has a single column, one field per row.
FileName The full name of the file, including the path. Alternatively, if FileName is omitted, then the function returns Data converted CSV-style to a string.
QuoteAllStrings If TRUE (the default) then elements of Data that are strings are quoted before being written to file, other elements (Numbers, Booleans, Errors) are not quoted. If FALSE then the only elements of Data that are quoted are strings containing Delimiter, line feed, carriage return or double quote. In both cases, double quotes are escaped by another double quote. If "Raw" then no strings are quoted. Use this option with care, the file written may not be in valid CSV format.
DateFormat A format string that determines how dates, including cells formatted as dates, appear in the file. If omitted, defaults to yyyy-mm-dd.
DateTimeFormat Format for datetimes. Defaults to ISO which abbreviates yyyy-mm-ddThh:mm:ss. Use ISOZ for ISO8601 format with time zone the same as the PC's clock. Use with care, daylight saving may be inconsistent across the datetimes in data.
Delimiter The delimiter string, if omitted defaults to a comma. Delimiter may have more than one character.
Encoding Allowed entries are ANSI (the default), UTF-8, UTF-16, UTF-8NOBOM and UTF-16NOBOM. An error will result if this argument is ANSI but Data contains characters with code point above 127.
EOL Sets the file's line endings. Enter Windows, Unix or Mac. Also supports the line-ending characters themselves (ascii 13 + ascii 10, ascii 10, ascii 13) or the strings CRLF, LF or CR. The default is Windows if FileName is provided, or Unix if not. The last line of the file is written with a line ending.
TrueString How the Boolean value True is to be represented in the file. Optional, defaulting to "True".
FalseString How the Boolean value False is to be represented in the file. Optional, defaulting to "False".

source

Errors

You can call CSVRead and CSVWrite both from spreadsheets (best with Excel 365 and dynamic array formulas) and from VBA. When an error occurs, the functions return a string starting with # and ending with ! which gives an explanation of what went wrong.

So, to get robust error handling from VBA, you should wrap calls to CSVRead and CSVWrite inside calls to a function ThrowIfError:

FileContents = ThrowIfError(CSVRead("c:\path\filename.csv"))

source

An alternative approach is to change the constant m_ErrorStyle (at the top of module modCSVRead) from , es_ReturnString to es_RaiseError, but in that case calls from spreadsheet formulas will return #VALUE! if any error happens, with no description provided.

Testing

CSVRead is tested prior to release against a large collection of test files with comparisons carried out between expected and observed results. You can look at the test code here, or run it yourself if you download and unzip the source code from the latest version, open the workbook VBA-CSV.xlsm from the workbooks folder, and click the "Run Tests" button on the "Tests" worksheet. The tests cover almost 100% of the code in modCSVReadWrite.bas.

Notes

Line endings

CSVRead automatically detects a file's line endings, whether they be Windows (CRLF), Unix (LF) or (pre-OSX) Mac (CR). Files with mixed line-endings are handled by taking any instance of either CRLF, CR or LF (outside quoted regions) to indicate a line ending.

Fractional seconds in dates and times

During type conversion, CSVRead accepts dates and times where the number of seconds includes a decimal fraction. For example the time10:13:20.500 is half a second later than the time 10:13:20. This contrasts with the VBA function CDate where executing CDate("10:13:20.500") results in an error.

Two-digit years

If you use CSVRead to parse files that represent dates with only two digits for the year then you might wish to change the values of two constants in modCSVReadWrite that determine how such dates are interpreted:

'The two constants below set the range of years to which date strings with two-digit years are converted.
Private Const m_2DigitYearIsFrom As Long = 1930
Private Const m_2DigitYearIsTo As Long = 2029

Array lower bounds

The return from CSVRead is an array with lower bounds of one. If you prefer array lower bounds to be zero, then edit the constant m_LBound (at the top of modCSVReadWrite.bas) to be 0 rather than 1.
source

Excel limits on string length

There is a limit on the total number of characters that an Excel cell can contain of 32,767. Therefore, when CSVWrite is called from a worksheet formula with the FileName argument omitted, the function will return an error string if the return would otherwise be longer than 32,767 characters. Similarly CSVRead, when called from a worksheet formula, will return an error string if any individual field to be returned is longer than 32,767.


Compatibility

VBA-CSV works on Windows, but not on Mac, iOS or Android. The host application must be either Excel, Word or PowerPoint. It has been tested on modern (Microsoft 365) versions of these applications, both 64-bit and 32-bit, and on Excel 2013, 32 bit. It should work on all other Office versions (2007 or later) but has not been tested on them.

About

Author: Philip Swannell
Licence: MIT
Github All Releases