Trading-Crab

Predict market conditions, best portfolios, and stock picks

Concepts / Main Approach Outline:

Scrape public datasets and use free APIs to obtain macro financial data over a 50-year period, ensuring these metrics are still available today if I had to score a model now
Assumption: one of the most predictive features in any financial model will be the market conditions... are we in a recession? A market boom? A bubble? A slowly forming top? High/Low inflation? Stagflation? Therefore we want to CLASSIFY (apply unsupervised learning) to our time series datasets on the order of quarters. Idea would be to get roughly equally-sized clusters that have distinct behaviors
Once we have the time-series classified according to variance techniques, we want to PREDICT today's classification using data available to us TODAY. This means we want to construct a SUPERVISED learning model that, given features known only at that time — nothing forward-looking or revised — we have a notion of what market condition regime we are in
Even more powerfully, we can also construct supervised learning models to predict whether certain classifications will occur in the next quarter, next year, next 2 years, etc. For example, if we are in a boom period, what are the chances that we'll experience a recession in the next 2 years?
Once we have good predictions for market conditions and some rough models for predicting future conditions, we can then try to predict the value of various asset classes (or ETFs), either each relative to cash (USD) or relative to each other (e.g. S&P500 priced in $Gold, or TLT bonds priced in USO oil prices). This will give us an idea of what assets do best in each PREDICTED market regime (that is, you should be able to rank the assets according to which out-perform or under-perform the others, including cash). We can use these relative performance models to construct rough portfolio mixes.
Putting it all together (Part I): modeling individual asset performance
- Using predicted market current market conditions, future market conditions, and all historic data and derived data (e.g., smoothed first derivative of oil prices measured in gold, etc.), predict the likelihood of whether a given ETF will be +X% at Y quarters in the future.
- For example, we might be interested in the likelihood that the S&P will at some point in the next 2 years crash 20%, or separately, be 20% higher.
- Note that these models are somewhat independent, particularly in volatile markets. Models need not sum up to 100% — you could simultaneously predict that the S&P500 will crash with 80% probability AND with 80% probability rebound to +20% (actually you won't know the order... it might have a blow-off top and THEN crash).
- Use these models to build a "stoplight" dashboard... for every asset, what are the probabilities of the asset going up or down as measured in dollars (or relative to another asset)
Putting it all together (Part II): Final project conclusion = actual trading recommendations
- Given a portfolio of X assets at Y percentages, the market condition regime, the recommended portfolio mix, the projected performance of each asset (which indicators have recently turned on warning lights), should you buy, sell, or hold that asset?
- Send a weekly email (can use AI for this part!) with the final recommendations on portfolio changes — what assets need traded, bought, or sold THIS WEEK?

To Do:

Reduce number of rows in the initial dataset, as many do not span the right time range
Add historic Gold, Oil, TLT, etc. to datasets — see https://www.macrotrends.net/
Standardize the time range (1950-2025?), infer missing data, throw away or fix anything looking odd
Change all variables that are exponential-looking into something normalized and predictive of regime, like taking a logrithm and/or using the 1st, 2nd, and 3rd derivative... likely CHANGE in a variable, or change RELATIVE to another variable is what will be predictive. For example, the S&P500 itself is not a good signal of regime, but the S&P priced in gold or oil might be.
- Consider using smoothed variables / polynomial fits / or other kinds of parameterized versions of the variable features as needed, as some might be too volatile over even quarterly time series
For the initial unsupervised clustering phase of the project, can consider using adjusted / revised since I'm only trying to get regimes, so backward-looking features MIGHT be ok
Once we have all quarters CLASSIFIED according to k-means or whatever, NOW we can look into SUPERVISED predictive modeling using other features. At this point you CANNOT use any variable that was revised or otherwise had forward-knowledge of the current or future state. All features must be values known at the moment we would have been choosing a portfolio
During the supervised learning phase, good to first find feature importance, then reduce the number of features, then try running a single Decision Tree just to get most of the explanatory power before running a Random Forest / XGBoost or whatever is the best final model

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
images		images
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Trading-Crab

Predict market conditions, best portfolios, and stock picks

Concepts / Main Approach Outline:

To Do:

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Trading-Crab

Predict market conditions, best portfolios, and stock picks

Concepts / Main Approach Outline:

To Do:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages