This is the dataset used in the paper Enhanced indexation using both equity assets and index options.
The dataset comprises the period from 17/03/2017 until 01/08/2025. There are two sections in this documentation?
- Construction of an option strategy: This section provided a detailed example of how we calculate the returns of an illustrative option strategy.
- Data: This section explains all data files provided, including prices for equities, ETFs, indices and option prices calculated with Black-Scholes.
Before describing the dataset, it is important to explain how an option strategy is constructed. The idea is that, from options that either exist or existed in the past, we build a long-term time series of returns. This time series can be interpreted as a synthetic asset which mirrors returns obtained by investing continuously in options following a given policy.
In order to explain how we construct an option strategy, we will make use of an illustration using data actually used in the paper above. The illustration is given in the spreadsheet Option strategy.xlsx, included in this repository. The spreadsheet was built with Google Sheets, hence it might not be entirely compatible with other spreadsheet software. It can also be accessed via this link:
https://docs.google.com/spreadsheets/d/1ejGgL8SAfKnR7vxi8WDHrjzuEAUksXU8sEHWWSJXuuE/edit?usp=sharing
We are going to build the following option strategy (OS): buy or keep a position in 3% Out-of-the-Money (OTM) puts if the return of the S&P500 over the past 30 business days is less than -5%.
In the spreadsheet, we construct the returns of the option strategy above from 03/10/2022 until 25/10/2022. A step-by-step explanation of the spreadsheet is given below:
- Columns A and B are the S&P500 prices 30 days prior to the range above. This information is used to calculate the condition used to create the OS as described above.
- Columns C-F are the main data from which option returns are built. These include S&P500 price, the risk-free rate and implied volatility (using VIX as a proxy).
- Columns G and H indicate whether the condition established for the OS is true or false.
- Columns J-R build a continuous time series for a PUT OTM, with a
$3$ % moneyness target. This is not yet the OS, rather it is a component that will later be used to build the OS. The continuous time series (continuous meaning without interruptions) assumes automatic rollover to a different option whenever one of these two conditions are met:- The current date is less than 20 days before the current expiration,
- The current exercise
$E$ has deviated$\pm 3$ % from the option-implied forward price$F$ of the S&P500, meaning$|\frac{F - E}{F}| \geq 3\%$ .
- In Row 3, the current date is 2022-10-03. Following the rules described in the paper for the most liquid S&P500 options, the next option available has its expiration date on the third Friday of the month, which happened on 2022-10-21. This is only 18 days to expiration, however, so we assume that the valid option to buy/hold is the one expiring at the last day of the month, 2022-10-31.
- In Row 10, we change the expiration date because 2022-10-31 is less than 20 days after 2022-10-12. The interpretation here is that if we were to buy an option on 2022-10-03, we would buy the one with expiration on 2022-10-31.
- With this decision we can calculate
$L = 0.0767$ (time to expiration, in years), column J. - Say
$S$ is the current S&P500 price (column D) and$r$ is the annualised risk-free rate (column E). Having$L$ , we calculate$F$ in column$L$ with the formula:$S\times e^{rL}$ .
- In Row 3, the exercise is straightforward given the rule decided in Step 4. above. We find the multiple of 5 that matches
$F' = F \times 0.97$ (meaning a$3$ % target moneyness, column M) as close as possible. - In Row 4, notice that the exercise
$3575$ has deviated from$F'$ by more than$3$ % (the deviation is shown in column N). Here we then have to change the option from exercise 3575 to 3685 to bring it back to the target moneyness. - This change does not need to happen in Row 5 since
$F'$ did not deviate enough from 3685. Overall the exercise is changed the exercise more often than the expiration date. In practical terms this would liquidating the position in the currently held options and buying the new one. This incurs costs and liquidity penalties, but for simplicity in this paper we ignore these.
-
On 2022-10-03, we bought the option with the expiration/exercise given in Row 3 at the end of the day. According to Black-Scholes, we paid $$72.158$ per unit of that option (Column Q).
- On 2022-10-04, Row 4, we held the option from Row 3 until the end of the day. We sold it for $$35.813$ (Column P), a return of approximately -50%.
- On the same day, we used the proceedings from selling the Row 3 option to buy the Row 4 option for $$68.018$ per unit.
- We did not need to change the option held in Row 5, hence the prices in Columns P and Q are identical.
-
With the logic above, we construct a long-term series of returns for a PUT OTM 3% in column R.
- Columns T and U indicate whether we should buy the option or invest in the risk-free rate, given the condition in column H.
- On 2022-10-18 (Row 14) the condition is false, hence we should hold the risk-free rate.
- On that day, we had kept the option from the day before (Row 13), and sold it on Row 14. Hence the return at the end of the day is -23%, the option return.
- At the end of the day, we "bought" the risk-free rate.
- The return on 2022-10-19 (Row 15) is the risk-free return.
- On 2022-10-19, the condition is true. At the end of the day (Row 15) we buy the option again (with expiration 2022-11-18 and exercise 3620).
- The return on Row 16 is the option return, 8.96%.
- With this logic, we construct a long-term series of returns for the OS, which alternates between risk-free investment and the PUT OTM 3%.
All the price data used in the paper is located inside folder dataFiles. The following files are available.
This file contains market data for three benchmarks: the S&P500, the VIX index and the IRX. All three are used as input in the Black-Scholes formula for calculating the prices of options.
The IRX data is provided as annualised yields in percentage form (i.e., the annual rate multiplied by 100). In Black-Scholes these values are interpreted directly as yearly interest rates expressed in percent.
This file contains market data for all equities and ETFs. All stocks that were in the S&P500 during the spanned period, either fully or partially, are present in this file.
For accounting for survivorship bias, we use negative prices to indicate the days when that asset had a price history but was not part of the S&P500 index. Empty prices mean that the prices are missing, e.g. did not exist or was not found for that asset at that time.
We consider "accounting for survivorship bias" as choosing for the optimisation only the assets that were part of the S&P500 at the "day" when we are running the optmiser. The idea is to prevent unrealistic positive out-of-sample results by choosing assets before they joined the S&P500. Since "in the future" they will meet the requirement for joining the index, they much probably will grow in value from "now" to "then". Negative prices mark when an asset was not in the index.
For instance, the company DDOG (Datadog) joined the S&P500 on July 9, 2025. The company was listed in the US market on September 19, 2019. In the equitiesAndETFs.csv file, on Github, its time series can be found on column VN. The series is empty until September 19, 2019, when the company was listed. The prices are negative until July 2025, when the company joined the index.
If we choose a portfolio on April 2025, DDOG will not be in the asset universe. But if we choose a portfolio on August 2025, DDOG is in the asset universe. In that case we take the absolute value of its past recent history as input to the optimisation.
For the 12 option strategies used in the paper, the following files are necessary:
In order to generate the time series of each option, we assume the following rules:
- Options exist at strike prices that are multiples of 5,
- Options expire either on the third Friday of the month, or in the last business day of the month.
The option moneyness is calculated with a forward price assuming no dividends, a constant risk-free rate and a frictionless market, and is given by:
where
An exercise price
As an option strategy needs a continuous time series, we simulate the rollover from one option to the next based on the following rules:
- Rollover to the next expiration when the currently held option reaches 20 days to expiration.
- Rollover to a new exercise price when the exercise price
$E$ of the currently held option deviates$\pm 3$ % of the moneyness target.
An option strategy may combine investiment in one or more individual options, the underlying asset and in an asset mimicking the risk-free rate (in our paper we did not employ option strategies that invest in the underlying). Our 12 option strategies make use of one or two of four individual components, each having its own time series of returns: put and call options ATM and put/call options OTM, with a moneyness target of 3%.
For each file, we include everything necessary to calculate the prices with Black-Scholes, as well as the formula results:
- the price of the underlying (SP500),
- the annualised risk-free rate, equivalent to symbol IRX,
- an approximation for the implied volatility (given by the VIX index, regardless of moneyness),
- The expiration of the option to be held at the end of the current day,
- The forward price,
- The exercise price of the option to be held at the end of the current day,
- The price, according to Black-Scholes, of the option to be held at the end of the current day,
- The price today, according to Black-Scholes, of the option held at the end of the previous day (same if the exercise and expiration are the same),
- The return of the component, calculated as (PreviousPrice - Price of the day before) / (Price of the day before).
Each component file provides a continuous time series for a singular option strategy based on a moneyness target and assuming periodic rollover when the necessary conditions are met. With each component, we define the 12 option strategies used in the paper.
Data for each one of the 12 option strategies (OS). For each OS, we include:
- The OS returns, calculated as a weighted sum of the proportions in each individual component,
w_SP500as the weight of the OS invested in the underlying at each day (zero every day in all 12 OS),w_c1as the weight of the OS invested in the first component, the PUT ATM.w_c2as the weight of the OS invested in the second component, the CALL ATM.w_c3as the weight of the OS invested in the third component, the PUT OTM.w_c4as the weight of the OS invested in the fourth component, the CALL OTM.w_RFRas the weight of the OS invested in an asset mimicking the IRX (daily) returns.