Automated Trading System using Machine Learning
MetadataShow full item record
This thesis investigates how machine learning can be applied in automated trading systems. To this end, an automated trading system driven by machine learning algorithms is developed. The system’s design is inspired by techniques presented in “Advances in Financial Machine Learning” by Marco Lopez De Prado (2018). The automated trading system trades based on the predictions made by two random forest classification models. One model to set the side of trades and one to set the size of trades. The automated trading system was tested using a custom backtester, built to simulate real market conditions. The automated trading system was able to outperform the S&P500 index over the 7-year test period from 2012 to 2019 in terms of risk-adjusted return measured by a skew and kurtosis adjusted Sharpe ratio. Compared to a randomly trading model, the system seems to express stock picking ability that significantly exceeds random selection. Also, machine learning is applied to the canonical problem of equity risk premium prediction. Several popular machine learning algorithms are used to regress 86 predictor variables from the literature on monthly equity risk premiums. The algorithms include principal component regression, random forests, and deep neural networks. Out-of-sample 𝑅���2 is used to measure performance. The results indicate that more advanced machine learning algorithms, allowing complex and non-linear interactions among predictors are more successful than linear regressions at modeling equity risk premiums. However, overall, the results are rather unimpressive and fail to demonstrate the added benefit of more complex ML algorithms convincingly. A random forest classifier for equity risk premium signs obtained betters results, yielding an accuracy of 53.33% on out-of-sample data, which statistical tests confirm, with high statistical significance, is an accuracy unlikely to be produced by a random model. All machine learning models were trained on a dataset developed specifically for this project, which includes 86 predictor variables from the literature. The dataset is based on company fundamental data as well as price, volume, and dividend data spanning over 20 years of history from 1998 to 2019 and includes over 14000 U.S. companies.
Master's thesis in Industrial economics