Fast and convenient CSV reading and writing for VBA (Excel, Word, PowerPoint) and Excel spreadsheets, inspired by R's fread, and Julia's CSV.jl.
If you like this code please give it a star. I like stars.
Installation
Acknowledgements
Examples
Documentation
CSVRead
CSVWrite
Errors
Testing
Notes
Compatibility
About
-
Download
modCSVReadWrite.bas
from the latest release. -
Import
modCSVReadWrite.bas
into your project (Open VBA Editor,Alt + F11
; File > Import File). -
Add three references (In VBA Editor Tools > References)
Microsoft Scripting Runtime
Microsoft VBScript Regular Expressions 5.5
Microsoft ActiveX Data Objects 6.1 Library
-
If you plan to call the functions from worksheet formulas then you might like to tell Excel's Function Wizard about them by adding calls to
RegisterCSVRead
andRegisterCSVWrite
to the project'sWorkbook_Open
event, which lives in theThisWorkbook
class module.
Private Sub Workbook_Open()
RegisterCSVRead
RegisterCSVWrite
End Sub
- Alternatively (or in addition), install Excel-DNA Intellisense and copy the worksheet
_Intellisense_
fromVBA-CSV-Intellisense.xlsx
(in the latest release) into the same workbook into which you copiedmodCSVReadWrite
.
I re-worked the parsing code of CSVRead
after examining sdkn104's code available here; my approach is now similar to the one employed there, and the performance is similar too.
The documentation borrows freely from that of Julia's CSV.jl, though sadly VBA is not capable of Julia's extremely high performance. For testing CSVRead
, I also use many of the test files that the authors of CSV.jl have created here.
Rdatasets provides 1,752 csv files including one containing names of passengers on the Titanic, their sex, age and passenger class, and whether or not they survived the sinking. In this file, missing data is indicated by the two characters NA
.
To see the data in a worksheet, enter this formula1:
=CSVRead("https://vincentarelbundock.github.io/Rdatasets/csv/carData/TitanicSurvival.csv",TRUE,,,,,,,,,,,,,"NA",NA())
To load the data into an array in VBA:
Sub Demo()
Dim TitanicData
TitanicData = CSVRead(FileName:="https://vincentarelbundock.github.io" & _
"/Rdatasets/csv/carData/TitanicSurvival.csv", _
ConvertTypes:=True, MissingStrings:="NA")
End Sub
In the GIF below, strings, numbers, dates and Booleans in the input Data
range are written to file with the formula =CSVWrite(A2:D8,"C:\Temp\Test.csv")
at cell F2, and the file is read back with the formula =CSVRead(F2,TRUE)
at cell J2, so that the original types of the fields are recovered. The formula =CSVRead(F2,,FALSE)
at H2 demonstrates how CSVRead
can display the "raw" contents of a text file by passing Delimiter
as FALSE
. Refresh your browser (F5) to replay the GIF.
Footnote 1: Assumes you're using Excel 365 or Excel 2021 with dynamic array formulas. If you're using an older version of Excel, this page explains how to enter a formula that returns an array.
Returns the contents of a comma-separated file on disk as an array.
Public Function CSVRead(ByVal FileName As String, Optional ByVal ConvertTypes As Variant = False, _
Optional ByVal Delimiter As Variant, Optional ByVal IgnoreRepeated As Boolean, _
Optional ByVal DateFormat As String = "Y-M-D", Optional ByVal Comment As String, _
Optional ByVal IgnoreEmptyLines As Boolean, Optional ByVal HeaderRowNum As Long, _
Optional ByVal SkipToRow As Long, Optional ByVal SkipToCol As Variant = 1, _
Optional ByVal NumRows As Long, Optional ByVal NumCols As Variant = 0, _
Optional ByVal TrueStrings As Variant, Optional ByVal FalseStrings As Variant, _
Optional ByVal MissingStrings As Variant, Optional ByVal ShowMissingsAs As Variant, _
Optional ByVal Encoding As Variant, Optional ByVal DecimalSeparator As String, _
Optional ByRef HeaderRow As Variant) As Variant
Argument | Description |
---|---|
FileName |
The full name of the file, including the path, or else a URL of a file, or else a string in CSV format. |
ConvertTypes |
Controls whether fields in the file are converted to typed values or remain as strings, and sets the treatment of "quoted fields" and space characters.ConvertTypes should be a string of zero or more letters from allowed characters NDBETQK .The most commonly useful letters are: 1) N number fields are returned as numbers (Doubles).2) D date fields (that respect DateFormat ) are returned as Dates.3) B fields matching TrueStrings or FalseStrings are returned as Booleans.ConvertTypes is optional and defaults to the null string for no type conversion. TRUE is equivalent to NDB and FALSE to the null string.Four further options are available: 4) E fields that match Excel errors are converted to error values. There are fourteen of these, including #N/A , #NAME? , #VALUE! and #DIV/0! .5) T leading and trailing spaces are trimmed from fields. In the case of quoted fields, this will not remove spaces between the quotes.6) Q conversion happens for both quoted and unquoted fields; otherwise only unquoted fields are converted.7) K quoted fields are returned with their quotes kept in place.For most files, correct type conversion can be achieved with ConvertTypes as a string which applies for all columns, but type conversion can also be specified on a per-column basis.Enter an array (or range) with two columns or two rows, column numbers on the left/top and type conversion (subset of NDBETQK ) on the right/bottom. Instead of column numbers, you can enter strings matching the contents of the header row, and a column number of zero applies to all columns not otherwise referenced.For convenience when calling from VBA, you can pass an array of two element arrays such as Array(Array(0,"N"),Array(3,""),Array("Phone","")) to convert all numbers in a file into numbers in the return except for those in column 3 and in the column(s) headed "Phone". |
Delimiter |
By default, CSVRead will try to detect a file's delimiter as the first instance of comma, tab, semi-colon, colon or pipe found in the first 10,000 characters of the file, searching only outside of quoted regions and outside of date-with-time fields (since these contain colons). If it can't auto-detect the delimiter, it will assume comma. If your file includes a different character or string delimiter you should pass that as the Delimiter argument.Alternatively, enter FALSE as the delimiter to treat the file as "not a delimited file". In this case the return will mimic how the file would appear in a text editor such as NotePad. The file will be split into lines at all line breaks (irrespective of double quotes) and each element of the return will be a line of the file. |
IgnoreRepeated |
Whether delimiters which appear at the start of a line, the end of a line or immediately after another delimiter should be ignored while parsing; useful for fixed-width files with delimiter padding between fields. |
DateFormat |
The format of dates in the file such as Y-M-D (the default), M-D-Y or Y/M/D . Also ISO for ISO8601 (e.g., 2021-08-26T09:11:30) or ISOZ (time zone given e.g. 2021-08-26T13:11:30+05:00), in which case dates-with-time are returned in UTC time. |
Comment |
Rows that start with this string will be skipped while parsing. |
IgnoreEmptyLines |
Whether empty rows/lines in the file should be skipped while parsing (if FALSE , each column will be assigned ShowMissingsAs for that empty row). |
HeaderRowNum |
The row in the file containing headers. Type conversion is not applied to fields in the header row, though leading and trailing spaces are trimmed. This argument is most useful when calling from VBA, with SkipToRow set to one more than HeaderRowNum . In that case the function returns the rows starting from SkipToRow , and the header row is returned via the by-reference argument HeaderRow . Optional and defaults to 0. |
SkipToRow |
The first row in the file that's included in the return. Optional and defaults to one more than HeaderRowNum . |
SkipToCol |
The column in the file at which reading starts, as a number or a string matching one of the file's headers. Optional and defaults to 1 to read from the first column. |
NumRows |
The number of rows to read from the file. If omitted (or zero), all rows from SkipToRow to the end of the file are read. |
NumCols |
If a number, sets the number of columns to read from the file. If a string matching one of the file's headers, sets the last column to be read. If omitted (or zero), all columns from SkipToCol are read. |
TrueStrings |
Indicates how TRUE values are represented in the file. May be a string, an array of strings or a range containing strings; by default, TRUE , True and true are recognised. |
FalseStrings |
Indicates how FALSE values are represented in the file. May be a string, an array of strings or a range containing strings; by default, FALSE , False and false are recognised. |
MissingStrings |
Indicates how missing values are represented in the file. May be a string, an array of strings or a range containing strings. By default, only an empty field (consecutive delimiters) is considered missing. |
ShowMissingsAs |
Fields which are missing in the file (consecutive delimiters) or match one of the MissingStrings are returned in the array as ShowMissingsAs . Defaults to Empty, but the null string or #N/A! error value can be good alternatives.If NumRows is greater than the number of rows in the file then the return is "padded" with the value of ShowMissingsAs . Likewise, if NumCols is greater than the number of columns in the file. |
Encoding |
Allowed entries are ASCII , ANSI , UTF-8 , or UTF-16 . For most files this argument can be omitted and CSVRead will detect the file's encoding. If auto-detection does not work, then it's possible that the file is encoded UTF-8 or UTF-16 but without a byte order mark to identify the encoding. Experiment with Encoding as each of UTF-8 and UTF-16 .ANSI is taken to mean Windows-1252 encoding. |
DecimalSeparator |
In many places in the world, floating point number decimals are separated with a comma instead of a period (3,14 vs. 3.14). CSVRead can correctly parse these numbers by passing in the DecimalSeparator as a comma, in which case comma ceases to be a candidate if the parser needs to guess the Delimiter . |
HeaderRow |
This by-reference argument is for use from VBA (as opposed to from Excel formulas). It is populated with the contents of the header row, with no type conversion, though leading and trailing spaces are removed. |
Creates a comma-separated file on disk containing Data
. Any existing file of the same name is overwritten. If successful, the function returns FileName
, otherwise an "error string" (starts with #
, ends with !
) describing what went wrong.
Public Function CSVWrite(ByVal Data As Variant, Optional ByVal FileName As String, _
Optional ByVal QuoteAllStrings As Variant = True, Optional ByVal DateFormat As String = "YYYY-MM-DD", _
Optional ByVal DateTimeFormat As String = "ISO", Optional ByVal Delimiter As String = ",", _
Optional ByVal Encoding As String = "ANSI", Optional ByVal EOL As String = vbNullString, _
Optional TrueString As String = "True", Optional FalseString As String = "False") As String
Argument | Description |
---|---|
Data |
An array of data, or an Excel range. Elements may be strings, numbers, dates, Booleans, empty, Excel errors or null values. Data typically has two dimensions, but if Data has only one dimension then the output file has a single column, one field per row. |
FileName |
The full name of the file, including the path. Alternatively, if FileName is omitted, then the function returns Data converted CSV-style to a string. |
QuoteAllStrings |
If TRUE (the default) then elements of Data that are strings are quoted before being written to file, other elements (Numbers, Booleans, Errors) are not quoted. If FALSE then the only elements of Data that are quoted are strings containing Delimiter , line feed, carriage return or double quote. In both cases, double quotes are escaped by another double quote. If "Raw" then no strings are quoted. Use this option with care, the file written may not be in valid CSV format. |
DateFormat |
A format string that determines how dates, including cells formatted as dates, appear in the file. If omitted, defaults to yyyy-mm-dd . |
DateTimeFormat |
Format for datetimes. Defaults to ISO which abbreviates yyyy-mm-ddThh:mm:ss . Use ISOZ for ISO8601 format with time zone the same as the PC's clock. Use with care, daylight saving may be inconsistent across the datetimes in data. |
Delimiter |
The delimiter string, if omitted defaults to a comma. Delimiter may have more than one character. |
Encoding |
Allowed entries are ANSI (the default), UTF-8 , UTF-16 , UTF-8NOBOM and UTF-16NOBOM . An error will result if this argument is ANSI but Data contains characters with code point above 127. |
EOL |
Sets the file's line endings. Enter Windows , Unix or Mac . Also supports the line-ending characters themselves (ascii 13 + ascii 10, ascii 10, ascii 13) or the strings CRLF , LF or CR . The default is Windows if FileName is provided, or Unix if not. The last line of the file is written with a line ending. |
TrueString |
How the Boolean value True is to be represented in the file. Optional, defaulting to "True". |
FalseString |
How the Boolean value False is to be represented in the file. Optional, defaulting to "False". |
You can call CSVRead
and CSVWrite
both from spreadsheets (best with Excel 365 and dynamic array formulas) and from VBA. When an error occurs, the functions return a string starting with #
and ending with !
which gives an explanation of what went wrong.
So, to get robust error handling from VBA, you should wrap calls to CSVRead
and CSVWrite
inside calls to a function ThrowIfError
:
FileContents = ThrowIfError(CSVRead("c:\path\filename.csv"))
An alternative approach is to change the constant m_ErrorStyle
(at the top of module modCSVRead
) from , es_ReturnString
to es_RaiseError
, but in that case calls from spreadsheet formulas will return #VALUE!
if any error happens, with no description provided.
CSVRead
is tested prior to release against a large collection of test files with comparisons carried out between expected and observed results. You can look at the test code here, or run it yourself if you download and unzip the source code from the latest version, open the workbook VBA-CSV.xlsm from the workbooks folder, and click the "Run Tests" button on the "Tests" worksheet. The tests cover almost 100% of the code in modCSVReadWrite.bas.
CSVRead
automatically detects a file's line endings, whether they be Windows (CRLF
), Unix (LF
) or (pre-OSX) Mac (CR
). Files with mixed line-endings are handled by taking any instance of either CRLF
, CR
or LF
(outside quoted regions) to indicate a line ending.
During type conversion, CSVRead
accepts dates and times where the number of seconds includes a decimal fraction. For example the time10:13:20.500
is half a second later than the time 10:13:20
. This contrasts with the VBA function CDate
where executing
CDate("10:13:20.500")
results in an error.
If you use CSVRead
to parse files that represent dates with only two digits for the year then you might wish to change the values of two constants in
modCSVReadWrite
that determine how such dates are interpreted:
'The two constants below set the range of years to which date strings with two-digit years are converted.
Private Const m_2DigitYearIsFrom As Long = 1930
Private Const m_2DigitYearIsTo As Long = 2029
The return from CSVRead
is an array with lower bounds of one. If you prefer array lower bounds to be zero, then edit the constant m_LBound
(at the top of modCSVReadWrite.bas
) to be 0 rather than 1.
source
There is a limit on the total number of characters that an Excel cell can contain of 32,767. Therefore, when CSVWrite
is called from a worksheet formula with the FileName
argument omitted, the function will return an error string if the return would otherwise be longer than 32,767 characters. Similarly CSVRead
, when called from a worksheet formula, will return an error string if any individual field to be returned is longer than 32,767.
VBA-CSV works on Windows, but not on Mac, iOS or Android. The host application must be either Excel, Word or PowerPoint. It has been tested on modern (Microsoft 365) versions of these applications, both 64-bit and 32-bit, and on Excel 2013, 32 bit. It should work on all other Office versions (2007 or later) but has not been tested on them.