Yet another stock price predicting attempt

Daniel Cornea
5 min readApr 18, 2022

Project Definition

This is part of the Udacity’s Data Scientist Nanodegree, you can find all the code discussed in this article here

Project Overview

With this project, I am attempting to build a simple stock predictor based on the power of neural networks.

Problem statement

The problem that I am attempting to solve is given by the hardship in reliably predicting the stock movements with an acceptable accuracy. In order to solve this, I am building a stock predictor model.

I try to solve this by creating a model that looks at the n past closing prices and attempts to predict the next n+1.

In layman terms, we look back for lets say 1 year of dates and prices and attempt to predict the next 1 day. We do this by looking back only three days back for this one year period. A bit confusing, but more details will come.

Metrics

The sole metric used is the accuracy by which the predictor can predict, based on past data, future unseen before stock quotes expressed in the mean of differences between actual and predicted stock prices divided by the last stock price.

In other words, we take the predicted data, we compute the differences between the actual and predicted data and average them. Now this average is divided by the last actual price of the share. This way let’s say the price of a share is always 1000 and the model always predicts 1020 it means that the metric for this model, in this scenario will be 20/1000, or 2%.

Analysis

Data Exploration

The data that I am using in this project is obtained from yahoo via the finance package. It is a simple table with the respective columns.

The data is already curated and cleaned.

Data Visualisation

In the below photo you can see how the dat for Microsoft looks like. Note that this is a time series that goes back for 5 years

Methodology

Data Preprocessing

The entry data for the model will be the scaled stock prices and the day-to-day changes in closing prices. This will be split furthermore in training and test data sets.

The scaled stock prices are scaled on 0 to 1 scale. Meaning that the highest gets a 1, the lowest gets a 0. Everything in between are decimals between one and zero.

When it comes to the changes, first, the percentage changes are computed, afterwards these percentage changes are scaled on a logarithmic scale.

The training and test data are organised in such a way that for every entry that is made of last n scaled closings and log percentage changes there is a corresponding value that is the closing price for the n+1 days of training.

Long story short, we will use the last 3 days to predict the change in the 4th day.

Now, you may ask why do we do this transformation, well, one reason is the stationarity of the data, meaning that across different time periods the mean and variance should not vary too much, this way the statistics is sound for the learning algorithm.

Before feeding the data into the model this how the last 3 days would look like

And this is how the the fourth, the attempted predicted result looks like

Implementation

As an implementation the Sequential’s LSTM model was used with the optimiser ADAM, with a batch size of 16. The number initial number of epochs is 50, as seen below:

The fit parameters will be used for the refinement

Refinement

In order to refine this model we are running the same model multiple times changing parameters. This parameters are the whole period for which the data is retrieved, the options are 1 year, 2 years and 5 years; the number look back days, the options here are 3, 13, and 30 and the number of epochs 50 or 100.

All these are run a loop-in-the-loop simple system.

The loop in the loop system means that a for loop in a for loop is used, I mean this:

As you can see the for loop iterates through all the possible combinations, in the next we discuss the results.

Results

Model Evaluation and Validation

As said in the beginning the single metric on which this system is evaluated is the mean “error” as divided by last actual price times 100, for MSFT this metric is 2.9964, quite good I would say, for other stocks, such as PFE, with all the refinement the metric is not satisfactory.

Justification

The final results that I have are not a model with specific parameters, rather a framework that can extended. I mean here, not necessary the model that I am using but the number of days and number of epochs that are used.

The result is the framework that can be tinkered with.

Conclusion

Reflection

In this project I imported, manipulated and modelled the data in order to be able to predict the next value of the stock price for a given listed company. The hard aspects were to choose the appropriate data representation that is needed for the model.

Improvement

In improving the results, I would need to extend the parameters that are used in the search of the minimum test score. Computational power must be added in order to do this.

--

--