kNN predictability analysis of stock and share closing prices
thesisposted on 16.06.2016, 09:25 by Yanshan Shi
The k nearest neighbor rule or the kNN rule is a nonparametric algorithm that search for the k nearest neighbors of a query set in another set of points. In this thesis, application of the kNN rule in predictability analysis of stock and share returns is proposed. The first experiment tests the possibility of prediction for ‘success’ (or ‘winner’) components of four stock and shares market indices in a selected time period . We have developed a method of labeling the component with either ‘winner’ or ‘loser’. We analyze the existence of information on the winner–loser separation in the initial fragments of the daily closing prices log–returns time series. The Leave–One–Out Cross–Validation with the kNN algorithm is applied on the daily log–returns of components. Two distance measurements are used in our experiment, a correlation distance, and its proximity. By analyzing the error, for the HANGSENG and the DAX index, there are clear signs of possibility to evaluate the probability of long–term success. The correlation distance matrix histograms and 2–D/3–D elastic maps generated from the ViDaExpert show that the ‘winner’ components are closer to each other and ‘winner’/‘loser’ components are separable on elastic maps for the HANGSENG and the DAX index while for the negative possibility indices, there is no sign of separation. In the second experiment, for a selected time interval, daily log–return time series is split into “history”, “present” and “future” parts. The kNN rule is used to search for nearest neighbors of “present” from a set. This set is created by using the sliding window strategy. The nearest neighbors are considered as the predicted “future” part. We then use ideas from dynamical systems and to regenerate “future” part closing prices from nearest neighbors log–returns. Different sub–experiments are created in terms of the difference in generation of “history” part, different market indices, and different distance measurements. This approach of modeling or forecasting works for both the ergodic dynamic systems and the random processes. The Lorenz attractor with noise is used to generate data and the data are used in the kNN experiment with the Euclidean distance. The sliding window strategy is applied in both test and training set. The kNN rule is used to find the k nearest neighbors and the next ‘window’ is used as the prediction. The error analysis of the relative mean squared error RMSE shows that k = 1 can give the best prediction and when k → 100, the average RMSE values converge. The average standard deviation values converge when k → 100. The solution Z(t) is predicted quite accurate using the kNN experiment.