# Machines

### Netting income

For fundamental equity investors, the financial statement is the launchpad for the search for value. True, quants use financial statements too. But they spend less time on what the numbers mean, than on what they are. To produce a financial statement that adequately captures the economic (not GAAP or IFRS) position of a company is no mean feet and draws upon accounting, domain knowledge, and artistry. Data scientists and machine learning engineers are more than acutely aware of the chore of data processing and cleaning.

### Trees and networks

It’s been over a month since our last post and for that we must apologize. We endeavor to be more prolific, but sometimes work and life get in the way. On the work front, let’s just say we won’t have to spend as much time selling encyclopedias door-to-door, which should free up more time to dedicate to writing value-added blog posts. On the life front, we had the chance to hike several canyons in southern Utah, USA.

### Not so soft softmax

Our last post examined the correspondence between a logistic regression and a simple neural network using a sigmoid activation function. The downside with such models is that they only produce binary outcomes. While we argued (not very forcefully) that if investing is about assessing the probability of achieving an attractive risk-adjusted return, then it makes sense to model investment decisions as probability functions. Moreover, most practitioners would probably prefer to know whether next month’s return is likely to be positive and how confident they should be in that prediction.

### Activate sigmoid!

In our last post, we introduced neural networks and formulated some of the questions we want to explore over this series. We explained the underlying architecture, the basics of the algorithm, and showed how a simple neural network could approximate the results and parameters of a linear regression. In this post, we’ll show how a neural network can also approximate a logistic regression and extend our toy example. What’s the motivation behind showing the link with logistic regression?

### Nothing but (neural) net

We start a new series on neural networks and deep learning. Neural networks and their use in finance are not new. But are still only a fraction of the research output. A recent Google scholar search found only 6% of the articles on stock price price forecasting discussed neural networks.1 Artificial neural networks, as they were first called, have been around since the 1940s. But development was slow until at least the 1990s when computing power rapidly increased.

### Risk-constrained optimization

Our last post parsed portfolio optimization outputs and examined some of the nuances around the efficient frontier. We noted that when you start building portfolios with a large number of assets, brute force simulation can miss the optimal weighting scheme for a given return or risk profile. While optimization finds those weights (it should!), the output can lead to infinitesimal contributions from many assets, which is impractical or silly. Placing a minimum on the weights helps a bit.

### Parsing portfolio optimization

Our last few posts on risk factor models haven’t discussed how we might use such a model in the portfolio optimization process. Indeed, although we’ve touched on mean-variance optimization, efficient frontiers, and maximum Sharpe ratios in this portfolio series, we haven’t discussed portfolio optimization and its outputs in great detail. If we mean to discuss ways to limit our exposure to certain risks (presumably identified in the risk factor model) while still shooting for a satisfactory (or optimal) risk-adjusted return, we’ll need to investigate optimization in more detail.

### More factors, more variance...explained

Risk factor models are at the core of quantitative investing. We’ve been exploring their application within our portfolio series to see if we could create such a model to quantify risk better than using a simplistic volatility measure. That is, given our four portfolios (Satisfactory, Naive, Max Sharpe, and Max Return) can we identify a set of factors that explain each portfolio’s variance relatively well? In our first investigation, we used the classic Fama-French (F-F) three factor model plus momentum.

### Kernel of error

In our last post, we looked at a rolling average of pairwise correlations for the constituents of XLI, an ETF that tracks the industrials sector of the S&P 500. We found that spikes in the three-month average coincided with declines in the underlying index. There was some graphical evidence of a correlation between the three-month average and forward three-month returns. However, a linear model didn’t do a great job of explaining the relationship given its relatively high error rate and unstable variability.

### Corr-correlation

We recently read two blog posts from Robot Wealth and FOSS Trading on calculating rolling pairwise correlations for the constituents of an S&P 500 sector index. Both posts were very interesting and offered informative ways to solve the problem using different packages in R: tidyverse or xts. We’ll use those posts as a launchpad to explore the rolling correlation concept with respect to forecasting returns. But we’ll be using Python to do a lot of the heavy lifting.