A²I: AI for stock market prediction

  • August 06, 2018

The AI era:

Machine learning and artificial intelligence algorithms have come home to roost. These algorithms, whether we like it or not, will continue to permeate our daily lives. Nowhere is this more evident than in their current uses in self-driving cars, spam filters, movie recommendation systems, credit fraud detection, geo-fencing marketing campaigns, etc. The usage of these algorithms will only expand and deepen going forward. A while ago, renowned Physicist Stephen Hawking issued a forewarning — “the automation of factories has already decimated jobs in traditional manufacturing, and the rise of artificial intelligence is likely to extend this job destruction deep into the middle classes”. Whether we agree or disagree with regard to the virtues of automation, the only way to better utilise its potentials and evade its dangers is to gain a deeper knowledge and appreciation of these algorithms. Moreover, quite nearly upon us is the next big wave called the Internet of Things (IoT) whereby increasingly more devices and common household items will be interconnected and have the ability to stream tera-bytes of data. As our society is deluged with data, the critical question that emerges is whether machine learning algorithms contribute a net benefit or extract a net cost to society.

What is Stock Market?


It is a place where shares of pubic listed companies are traded. The primary market is where companies float shares to the general public in an initial public offering (IPO) to raise capital.

Description: Once new securities have been sold in the primary market, they are traded in the secondary market — where one investor buys shares from another investor at the prevailing market price or at whatever price both the buyer and seller agree upon. The secondary market or the stock exchanges are regulated by the regulatory authority.


A stock exchange facilitates stock brokers to trade company stocks and other securities. A stock may be bought or sold only if it is listed on an exchange. Thus, it is the meeting place of the stock buyers and sellers. India’s premier stock exchanges are the Bombay Stock Exchange and the National Stock Exchange.

Why to INVEST in Stock?

Rather than me writing down the reasons for investing in stocks, hear it from the legend Mr.Warren Buffett himself.

Stock Market Prediction in R

I attempted to implement a Machine Learning algorithm in order to predict stock prices, namely S&P 500 Adjusted Close prices. In order to do this, I used Artificial Neural Networks (ANN) for a number of reasons. ANNs have been known to work well for computationally intensive problems where a user may not have a clear hypothesis of how the inputs should interact. As such, ANNs excel at picking up hidden patterns within the data so well that they often overfit!

Keeping this in mind, I experimented with a technique known as a ‘sliding window’. Rather than training the model with years of S&P 500 data, I created an ANN that would train over the past 30 days (t-30, …, t) to predict the close price at t+1. A 30 day sliding window seemed to make a good fit. It wasn’t so wide as to not capture the current market atmosphere, but also it wasn’t narrow enough to be hypersensitive to recent erratic movements.

Then, I had to decide on the input variables I was going to use. Many stock market models are pure time-series autoregressive functions, but the benefit of ANNs is that we can use them as a more traditional Machine Learning technique, with several inputs (and not only previous prices). I defined several of my own inputs that I thought would be significant predictors such as:

. 28 Day Moving Average

. 14 Day Moving Average

. 7 Day Moving Average

. Previous Day Close Price

In addition, I did some research and introduced several other technical indicators. Since there were so many, I fit a Random Forest model in order to reduce the dimensionality of the problem. After taking only the most important variables, I was left with:



MACD Oscillator


Finally, in order to obtain a stable model, I had to scale the inputs so that all variables lied in the range of [-1, 1]. This is done to avoid oversaturating the individual neurons in the ANN with outliers. There are several ways to do this, and I just created a simple function called ‘myScale’ in order to get the job done.

myScale <- function(x) { (x - min(x)) / (max(x) - min(x)) }

I won’t be sharing the final model, but I did create a simulation of my methodology for the 2013 trading year.

trainPlot <- function(x) { g <- ggplot() + geom_path(aes(x = date, y = Adj.Close), colour = I("blue"), data = sp.2013) g <- g + geom_vline(xintercept = as.numeric(sp.2013$date[x]), colour="green", linetype = "longdash") g <- g + geom_vline(xintercept = as.numeric(sp.2013$date[x + 29]), colour="green", linetype = "longdash") g <- g + annotate("text", x = sp.2013$date[x + 15], y = 1575, label = "TRAIN", col = "blue") g <- g + geom_path(aes(x = date, y = Adj.Close), data = pred[31:(x + 30), ], colour = I("red")) g <- g + theme_bw() } plotit <- function(x) print(trainPlot(x)) library(animation) ani.options(ani.width = 900, ani.height = 650) saveGIF(for (i in 1:40) plotit(i), convert = "gm convert", cmd.fun = shell, clean = T, outdir = getwd())

The animation demonstrates the mechanics of a Sliding Window model. We see that the ‘Training’ Window keeps moving up, regardless of whether the prediction was accurate or not. The RMSE of the model for the 2013 dates is 1.2115 which is extremely small considering the prices of the S&P 500 are over 1000. We know from the RMSE, and from the animation that the fit tends to be good. When the predicted values are slightly off, we can still see that the ANN usually correctly predicts the direction of the S&P 500 for that day. Perhaps in the future I will revisit this problem as a classification instead due to the results we’re seeing.

The beauty of my result is that although this is a naive implementation of a sophisticated model, we can see that the results are really good. I can spend many weeks tuning the model and playing around with the inputs to get a better fit.