Data Science Posts and Resources

Articles on Data Science

Applying Logistic Regression

A case study on applying logistic regression to stock market

Laxmi K Soni

38-Minute Read

1:Loading Stock data

1.1:Importing libraries and data

import investpy
from datetime import datetime
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
from sklearn import metrics
from scipy import stats 
import numpy as np
import pandas as pd
import seaborn as sn
import matplotlib.pyplot as plt
import talib
import quandl
from sklearn.preprocessing import StandardScaler 
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import MinMaxScaler 
import math
from sklearn.metrics import mean_squared_error
import investpy

1.2:Fetching the data To fetch the stock data we use investpy library. This library fetchs data from investing for example:

1.2.1: Determining position based on volume and close

## C:/Users/slaxm/AppData/Local/r-miniconda/envs/r-reticulate/python.exe:1: SettingWithCopyWarning: 
## A value is trying to be set on a copy of a slice from a DataFrame.
## Try using .loc[row_indexer,col_indexer] = value instead
## 
## See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
##           Date     Open     High      Low    Close  Volume    Pos    weekday
## 961 2022-09-05  53946.0  54247.0  53855.0  54141.0   80941  Short     Monday
## 962 2022-09-06  54466.0  54947.0  53850.0  53868.0  193997  Short    Tuesday
## 963 2022-09-07  53558.0  54759.0  53431.0  54749.0  184602  Short  Wednesday
## 964 2022-09-08  54761.0  55370.0  54565.0  54940.0  186749   Long   Thursday
## 965 2022-09-09  55614.0  55750.0  55110.0  55655.0  183536  Short     Friday

1.2.2: Long positions gains

##          Date     Open     High      Low    Close  Volume    Pos    gain
## 94 2022-06-06  62093.0  63240.0  62093.0  62575.0  177618  Short   482.0
## 95 2022-06-15  59990.0  61440.0  59990.0  61009.0  182289   Long  1019.0
## 96 2022-07-19  56088.0  56536.0  56088.0  56311.0   94818  Short   223.0
## 97 2022-07-28  55750.0  58244.0  55750.0  58156.0  291360   Long  2406.0
## 98 2022-08-31  53893.0  53893.0  53893.0  53893.0       0  Short     0.0

1.2.3: Short positions gains

##          Date     Open     High      Low    Close  Volume    Pos    gain
## 92 2022-07-12  57302.0  57302.0  56547.0  56985.0  161818  Short   317.0
## 93 2022-07-21  56202.0  56202.0  55021.0  56087.0  236522  Short   115.0
## 94 2022-08-16  59550.0  59550.0  57961.0  58274.0  153425  Short  1276.0
## 95 2022-08-19  57100.0  57100.0  56060.0  56265.0  134432  Short   835.0
## 96 2022-08-31  53893.0  53893.0  53893.0  53893.0       0  Short     0.0
print(short['weekday'].value_counts())
print(lng['weekday'].value_counts())

1.2.2: Determining Average monthly closing prices

##                Open     High      Low    Close  Volume
## Date                                                  
## 2018-12-10  37950.0  38495.0  37738.0  38406.0   50671
## 2018-12-11  38351.0  38681.0  38111.0  38360.0   52349
## 2018-12-12  38393.0  38635.0  38179.0  38598.0   43108
## 2018-12-13  38501.0  38549.0  38263.0  38382.0   37475
## 2018-12-14  38399.0  38399.0  38020.0  38086.0   34827
## ...             ...      ...      ...      ...     ...
## 2022-09-05  53946.0  54247.0  53855.0  54141.0   80941
## 2022-09-06  54466.0  54947.0  53850.0  53868.0  193997
## 2022-09-07  53558.0  54759.0  53431.0  54749.0  184602
## 2022-09-08  54761.0  55370.0  54565.0  54940.0  186749
## 2022-09-09  55614.0  55750.0  55110.0  55655.0  183536
## 
## [966 rows x 5 columns]
##               Open    High     Low   Close  Volume Currency  VolChange  \
## Date                                                                     
## 2018-12-10  37.950  38.495  37.738  38.406   50671      INR  13.992427   
## 2018-12-11  38.351  38.681  38.111  38.360   52349      INR   0.033116   
## 2018-12-12  38.393  38.635  38.179  38.598   43108      INR  -0.176527   
## 2018-12-13  38.501  38.549  38.263  38.382   37475      INR  -0.130672   
## 2018-12-14  38.399  38.399  38.020  38.086   34827      INR  -0.070660   
## ...            ...     ...     ...     ...     ...      ...        ...   
## 2022-09-05  53.946  54.247  53.855  54.141   80941      INR  -0.551397   
## 2022-09-06  54.466  54.947  53.850  53.868  193997      INR   1.396770   
## 2022-09-07  53.558  54.759  53.431  54.749  184602      INR  -0.048429   
## 2022-09-08  54.761  55.370  54.565  54.940  186749      INR   0.011630   
## 2022-09-09  55.614  55.750  55.110  55.655  183536      INR  -0.017205   
## 
##             CloseChange  
## Date                     
## 2018-12-10     0.000532  
## 2018-12-11    -0.001198  
## 2018-12-12     0.006204  
## 2018-12-13    -0.005596  
## 2018-12-14    -0.007712  
## ...                 ...  
## 2022-09-05     0.006413  
## 2022-09-06    -0.005042  
## 2022-09-07     0.016355  
## 2022-09-08     0.003489  
## 2022-09-09     0.013014  
## 
## [966 rows x 8 columns]
## Date
## 2018-12-31    38.149133
## 2019-01-31    39.465522
## 2019-02-28    40.064750
## 2019-03-31    38.306857
## 2019-04-30    37.424800
## 2019-05-31    36.895739
## 2019-06-30    37.312350
## 2019-07-31    39.617000
## 2019-08-31    43.942762
## 2019-09-30    47.490810
## 2019-10-31    45.687174
## 2019-11-30    44.712571
## 2019-12-31    44.983905
## 2020-01-31    46.715435
## 2020-02-29    46.769810
## 2020-03-31    42.008682
## 2020-04-30    43.175278
## 2020-05-31    46.064714
## 2020-06-30    48.667409
## 2020-07-31    55.923957
## 2020-08-31    68.698238
## 2020-09-30    65.459455
## 2020-10-31    61.675571
## 2020-11-30    62.012955
## 2020-12-31    65.897045
## 2021-01-31    67.186000
## 2021-02-28    69.205200
## 2021-03-31    66.495957
## 2021-04-30    67.700095
## 2021-05-31    71.556381
## 2021-06-30    70.013318
## 2021-07-31    68.518000
## 2021-08-31    63.957364
## 2021-09-30    62.362182
## 2021-10-31    63.489333
## 2021-11-30    64.592091
## 2021-12-31    61.979783
## 2022-01-31    62.549050
## 2022-02-28    63.473950
## 2022-03-31    68.742739
## 2022-04-30    67.061150
## 2022-05-31    61.769045
## 2022-06-30    61.151818
## 2022-07-31    56.927857
## 2022-08-31    57.248818
## 2022-09-30    54.344857
## Freq: M, Name: Close, dtype: float64
## meanprice is  55.32045962732919

1.2.3: Technical moving averages

##   period  sma_value sma_signal   ema_value ema_signal
## 0      5   55509.60        buy  55565.2468        buy
## 1     10   55490.90        buy  55472.7658        buy
## 2     20   55345.10        buy  55336.8160        buy
## 3     50   54776.46        buy  54922.6271        buy
## 4    100   54330.16        buy  54777.7736        buy

1.2.4: Determining the z-scores

## Date
## 2022-09-05   -0.101970
## 2022-09-06   -0.125573
## 2022-09-07   -0.049406
## 2022-09-08   -0.032893
## 2022-09-09    0.028923
## Name: zscore, dtype: float64
## Date
## 2022-09-05   -0.507578
## 2022-09-06    0.820085
## 2022-09-07    0.709756
## 2022-09-08    0.734969
## 2022-09-09    0.697238
## Name: zscorevolume, dtype: float64

1.2.5: Determining the daily support and resistance levels

##         name       s3       s2       s1  pivot_points       r1       r2  \
## 0    Classic  53741.0  54153.0  54546.0       54958.0  55351.0  55763.0   
## 1  Fibonacci  54153.0  54461.0  54650.0       54958.0  55266.0  55455.0   
## 2  Camarilla  54719.0  54792.0  54866.0       54958.0  55014.0  55088.0   
## 3   Woodie's  53733.0  54149.0  54538.0       54954.0  55343.0  55759.0   
## 4   DeMark's      NaN      NaN  54752.0       55061.0  55558.0      NaN   
## 
##         r3  
## 0  56156.0  
## 1  55763.0  
## 2  55161.0  
## 3  56148.0  
## 4      NaN
##   technical_indicator    value      signal
## 0             RSI(14)   48.884     neutral
## 1          STOCH(9,6)   96.517  overbought
## 2        STOCHRSI(14)  100.000  overbought
## 3         MACD(12,26) -627.928        sell
## 4             ADX(14)   48.050         buy
## name            DeMark's
## s3                   NaN
## s2                   NaN
## s1                 54752
## pivot_points       55061
## r1                 55558
## r2                   NaN
## r3                   NaN
## Name: 4, dtype: object

1.2.6: Fibonnacci retracement levels

## Retracement levels for rising price
##                               0
## 0              {'min': [55.11]}
## 1  {'level5(61.8)': [55.35448]}
## 2       {'level4(50)': [55.43]}
## 3  {'level3(38.2)': [55.50552]}
## 4  {'level2(23.6)': [55.59896]}
## 5             {'zero': [55.75]}
## Retracement levels for falling price
##                               0
## 0             {'zero': [55.75]}
## 1  {'level2(23.6)': [55.26104]}
## 2  {'level3(38.2)': [55.35448]}
## 3       {'level4(50)': [55.43]}
## 4  {'level5(61.8)': [55.50552]}
## 5              {'min': [55.11]}

As we can see, we now have a data frame with all the entries from start date to end date. We have multiple columns here and not only the closing stock price of the respective day. Let’s take a quick look at the individual columns and their meaning.

Open: That’s the share price the stock had when the markets opened that day.

Close: That’s the share price the stock had when the markets closed that day.

High: That’s the highest share price that the stock had that day.

Low: That’s the lowest share price that the stock had that day.

Volume: Amount of shares that changed hands that day.

1.3:Reading individual values

Since our data is stored in a Pandas data frame, we can use the indexing we already know, to get individual values. For example, we could only print the closing values using print (df[ 'Close' ])

Also, we can go ahead and print the closing value of a specific date that we are interested in. This is possible because the date is our index column.

print (df[ 'Close' ][ '2020-07-14' ])

But we could also use simple indexing to access certain positions.

print (df[ 'Close' ][ 5 ])
## 38.197

Here we printed the closing price of the fifth entry.

2:Graphical Visualization

Even though tables are nice and useful, we want to visualize our financial data, in order to get a better overview. We want to look at the development of the share price.

Actually plotting our share price curve with Pandas and Matplotlib is very simple. Since Pandas builds on top of Matplotlib, we can just select the column we are interested in and apply the plot method. The results are amazing. Since the date is the index of our data frame, Matplotlib uses it for the x-axis. The y-values are then our adjusted close values.

2.1:CandleStick Charts

The best way to visualize stock data is to use so-called candlestick charts . This type of chart gives us information about four different values at the same time, namely the high, the low, the open and the close value. In order to plot candlestick charts, we will need to import a function of the MPL-Finance library.

import mplfinance as fplt

We are importing the candlestick_ohlc function. Notice that there also exists a candlestick_ochl function that takes in the data in a different order. Also, for our candlestick chart, we will need a different date format provided by Matplotlib. Therefore, we need to import the respective module as well. We give it the alias mdates .

import matplotlib.dates as mdates

2.2: Preparing the data for CandleStick charts

Now in order to plot our stock data, we need to select the four columns in the right order.

df1 = df[[ 'Open' , 'High' , 'Low' , 'Close' ]]

Now, we have our columns in the right order but there is still a problem. Our date doesn’t have the right format and since it is the index, we cannot manipulate it. Therefore, we need to reset the index and then convert our datetime to a number.

df1.reset_index( inplace = True )
df1[ 'Date' ] = df1[ 'Date' ].map(mdates.date2num)
## C:/Users/slaxm/AppData/Local/r-miniconda/envs/r-reticulate/python.exe:1: SettingWithCopyWarning: 
## A value is trying to be set on a copy of a slice from a DataFrame.
## Try using .loc[row_indexer,col_indexer] = value instead
## 
## See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

For this, we use the reset_index function so that we can manipulate our Date column. Notice that we are using the inplace parameter to replace the data frame by the new one. After that, we map the date2num function of the matplotlib.dates module on all of our values. That converts our dates into numbers that we can work with.

2.3:Plotting the data

Now we can start plotting our graph. For this, we just define a subplot (because we need to pass one to our function) and call our candlestick_ohlc function.

One candlestick gives us the information about all four values of one specific day. The highest point of the stick is the high and the lowest point is the low of that day. The colored area is the difference between the open and the close price. If the stick is green, the close value is at the top and the open value at the bottom, since the close must be higher than the open. If it is red, it is the other way around.

***2.4:Analysis and Statistics ***

Now let’s get a little bit deeper into the numbers here and away from the visual. From our data we can derive some statistical values that will help us to analyze it.

PERCENTAGE CHANGE

One value that we can calculate is the percentage change of that day. This means by how many percent the share price increased or decreased that day.

The calculation is quite simple. We create a new column with the name PCT_Change and the values are just the difference of the closing and opening values divided by the opening values. Since the open value is the beginning value of that day, we take it as a basis. We could also multiply the result by 100 to get the actual percentage.

##        PCT_Change
## count  966.000000
## mean     0.000131
## std      0.016432
## min     -0.107200
## 25%     -0.006861
## 50%      0.000686
## 75%      0.007341
## max      0.062274
##              Close
## Date              
## 2022-09-09  55.655

*** HIGH LOW PERCENTAGE ***

Another interesting statistic is the high low percentage. Here we just calculate the difference between the highest and the lowest value and divide it by the closing value.

By doing that we can get a feeling of how volatile the stock is.

##               Open    High     Low   Close  Volume Currency  VolChange  \
## Date                                                                     
## 2022-09-05  53.946  54.247  53.855  54.141   80941      INR  -0.551397   
## 2022-09-06  54.466  54.947  53.850  53.868  193997      INR   1.396770   
## 2022-09-07  53.558  54.759  53.431  54.749  184602      INR  -0.048429   
## 2022-09-08  54.761  55.370  54.565  54.940  186749      INR   0.011630   
## 2022-09-09  55.614  55.750  55.110  55.655  183536      INR  -0.017205   
## 
##             CloseChange  usdinr  usdsilver  Open-Close  Open-Open    zscore  \
## Date                                                                          
## 2022-09-05     0.006413  79.783     18.087       0.150      0.644 -0.101970   
## 2022-09-06    -0.005042  79.870     17.795       0.325      0.520 -0.125573   
## 2022-09-07     0.016355  79.630     18.137      -0.310     -0.908 -0.049406   
## 2022-09-08     0.003489  79.667     18.442       0.012      1.203 -0.032893   
## 2022-09-09     0.013014  79.635     18.767       0.674      0.853  0.028923   
## 
##             zscorevolume  PCT_Change    HL_PCT  
## Date                                            
## 2022-09-05     -0.507578    0.003615  0.007240  
## 2022-09-06      0.820085   -0.010979  0.020365  
## 2022-09-07      0.709756    0.022238  0.024256  
## 2022-09-08      0.734969    0.003269  0.014652  
## 2022-09-09      0.697238    0.000737  0.011499

These statistical values can be used with many others to get a lot of valuable information about specific stocks. This improves the decision making

*** MOVING AVERAGE ***

we are going to derive the different moving averages . It is the arithmetic mean of all the values of the past n days. Of course this is not the only key statistic that we can derive, but it is the one we are going to use now. We can play around with other functions as well.

What we are going to do with this value is to include it into our data frame and to compare it with the share price of that day.

For this, we will first need to create a new column. Pandas does this automatically when we assign values to a column name. This means that we don’t have to explicitly define that we are creating a new column.

##              Close  5d_ma  20d_ma  50d_ma  100d_ma  200d_ma  5d_ema  20d_ema  \
## Date                                                                           
## 2022-09-01  53.265  54.40   56.87   57.35    60.17    62.28   54.25    56.06   
## 2022-09-02  53.796  54.08   56.63   57.23    60.01    62.23   54.10    55.84   
## 2022-09-05  54.141  53.92   56.44   57.11    59.86    62.19   54.11    55.68   
## 2022-09-06  53.868  53.79   56.15   56.98    59.70    62.14   54.03    55.51   
## 2022-09-07  54.749  53.96   55.92   56.87    59.56    62.11   54.27    55.44   
## 2022-09-08  54.940  54.30   55.70   56.78    59.42    62.07   54.49    55.39   
## 2022-09-09  55.655  54.67   55.53   56.71    59.31    62.04   54.88    55.42   
## 
##             50d_ema  100d_ema  200d_ema  
## Date                                     
## 2022-09-01    57.61     59.49     61.42  
## 2022-09-02    57.46     59.38     61.34  
## 2022-09-05    57.33     59.28     61.27  
## 2022-09-06    57.19     59.17     61.20  
## 2022-09-07    57.10     59.08     61.13  
## 2022-09-08    57.01     59.00     61.07  
## 2022-09-09    56.96     58.93     61.02

Here we define a three new columns with the name 20d_ma, 50d_ma, 100d_ma,200d_ma . We now fill this column with the mean values of every n entries. The rolling function stacks a specific amount of entries, in order to make a statistical calculation possible. The window parameter is the one which defines how many entries we are going to stack. But there is also the min_periods parameter. This one defines how many entries we need to have as a minimum in order to perform the calculation. This is relevant because the first entries of our data frame won’t have a n entries previous to them. By setting this value to zero we start the calculations already with the first number, even if there is not a single previous value. This has the effect that the first value will be just the first number, the second one will be the mean of the first two numbers and so on, until we get to a b values.

By using the mean function, we are obviously calculating the arithmetic mean. However, we can use a bunch of other functions like max, min or median if we like to.

*** Standard Deviation ***

The variability of the closing stock prices determinies how vo widely prices are dispersed from the average price. If the prices are trading in narrow trading range the standard deviation will return a low value that indicates low volatility. If the prices are trading in wide trading range the standard deviation will return high value that indicates high volatility.

## Date
## 2022-09-01    0.889563
## 2022-09-02    0.917957
## 2022-09-05    0.730205
## 2022-09-06    0.541825
## 2022-09-07    0.488597
## 2022-09-08    0.579435
## 2022-09-09    0.813458
## Name: Std_dev, dtype: float64

*** Relative Strength Index ***

The relative strength index is a indicator of mementum used in technical analysis that measures the magnitude of current price changes to know overbought or oversold conditions in the price of a stock or other asset. If RSI is above 70 then it is overbought. If RSI is below 30 then it is oversold condition.

##                   RSI
## Date                 
## 2022-09-05  34.198373
## 2022-09-06  32.078100
## 2022-09-07  44.557529
## 2022-09-08  46.935461
## 2022-09-09  55.053922

*** Average True range ***

##                    ATR     20dayEMA    ATRdiff
## Date                                          
## 2022-09-05  963.570621  1017.372726 -53.802105
## 2022-09-06  973.101291  1013.156399 -40.055108
## 2022-09-07  998.451199  1011.755904 -13.304705
## 2022-09-08  984.633256  1009.172794 -24.539538
## 2022-09-09  972.159452  1005.647714 -33.488262

*** Wiliams %R ***

Williams %R, or just %R, is a technical analysis oscillator showing the current closing price in relation to the high and low of the past N days.The oscillator is on a negative scale, from −100 (lowest) up to 0 (highest). A value of −100 means the close today was the lowest low of the past N days, and 0 means today’s close was the highest high of the past N days.

##             Williams %R
## Date                   
## 2022-09-01   -84.435262
## 2022-09-02   -69.807163
## 2022-09-05   -60.303030
## 2022-09-06   -55.402826
## 2022-09-07   -14.195980
## 2022-09-08   -16.104869
## 2022-09-09    -3.114754

Readings below -80 represent oversold territory and readings above -20 represent overbought.

*** ADX ***

ADX is used to quantify trend strength. ADX calculations are based on a moving average of price range expansion over a given period of time. The average directional index (ADX) is used to determine when the price is trending strongly.

0-25: Absent or Weak Trend

25-50: Strong Trend

50-75: Very Strong Trend

75-100: Extremely Strong Trend

##                   ADX
## Date                 
## 2022-09-01  54.611004
## 2022-09-02  52.250372
## 2022-09-05  50.226973
## 2022-09-06  44.799987
## 2022-09-07  41.597722
## 2022-09-08  35.979409
## 2022-09-09  32.071938
##                    CCI
## Date                  
## 2022-09-01 -140.892347
## 2022-09-02 -104.420108
## 2022-09-05  -76.423662
## 2022-09-06  -57.056702
## 2022-09-07  -41.827823
## 2022-09-08   22.520137
## 2022-09-09   77.018705
##                  ROC
## Date                
## 2022-09-01 -6.897275
## 2022-09-02 -4.388163
## 2022-09-05 -2.762262
## 2022-09-06 -3.245622
## 2022-09-07 -0.845769
## 2022-09-08 -1.509447
## 2022-09-09  0.451223

*** MACD ***

Moving Average Convergence Divergence (MACD) is a trend-following momentum indicator that shows the relationship between two moving averages of a security’s price. The MACD is calculated by subtracting the 26-period Exponential Moving Average (EMA) from the 12-period EMA.

##             MACD_IND
## Date                
## 2022-09-05 -0.263879
## 2022-09-06 -0.214246
## 2022-09-07 -0.107390
## 2022-09-08 -0.014058
## 2022-09-09  0.099377
## Date
## 2018-12-10           NaN
## 2018-12-11           NaN
## 2018-12-12           NaN
## 2018-12-13           NaN
## 2018-12-14           NaN
##                  ...    
## 2022-09-05     -9.817207
## 2022-09-06   -176.380356
## 2022-09-07    105.177302
## 2022-09-08    113.953202
## 2022-09-09    118.509411
## Name: STC, Length: 966, dtype: float64

*** Bollinger Bands ***

Bollinger Bands are a type of statistical chart characterizing the prices and volatility over time of a financial instrument or commodity.

## 58720.0 55530.0 52340.0

In case we choose another value than zero for our min_periods parameter, we will end up with a couple of NaN-Values . These are not a number values and they are useless. Therefore, we would want to delete the entries that have such values.

We do this by using the dropna function. If we would have had any entries with NaN values in any column, they would now have been deleted

3: Predicting the movement of stock

To predict the movement of the stock we use 5 lag returns as the dependent variables. The first leg is return yesterday, leg2 is return day before yesterday and so on. The dependent variable is whether the prices went up or down on that day. Other variables include the technical indicators which along with 5 lag returns are used to predict the movement of stock using logistic regression.

3.1: Creating lag returns

3.2: Creating returns dataframe

3.2: create the lagged percentage returns columns

##                Today      Lag1      Lag2      Lag3      Lag4      Lag5  \
## Date                                                                     
## 2022-08-26 -0.675845  1.025065 -0.824427 -0.007184 -1.041500 -1.653528   
## 2022-08-29 -0.815811 -0.675845  1.025065 -0.824427 -0.007184 -1.041500   
## 2022-08-30 -0.829800 -0.815811 -0.675845  1.025065 -0.824427 -0.007184   
## 2022-08-31 -1.108318 -0.829800 -0.815811 -0.675845  1.025065 -0.824427   
## 2022-09-02 -0.179986 -1.108318 -0.829800 -0.815811 -0.675845  1.025065   
## 2022-09-05  0.641312 -0.179986 -1.108318 -0.829800 -0.815811 -0.675845   
## 2022-09-06 -0.504239  0.641312 -0.179986 -1.108318 -0.829800 -0.815811   
## 2022-09-07  1.635479 -0.504239  0.641312 -0.179986 -1.108318 -0.829800   
## 2022-09-08  0.348865  1.635479 -0.504239  0.641312 -0.179986 -1.108318   
## 2022-09-09  1.301420  0.348865  1.635479 -0.504239  0.641312 -0.179986   
## 
##                 Lag6      Lag7      Lag8      Lag9     Lag10     Lag11  \
## Date                                                                     
## 2022-08-26 -0.721884 -1.110272 -2.409860  1.181036 -0.733365  0.210697   
## 2022-08-29 -1.653528 -0.721884 -1.110272 -2.409860  1.181036 -0.733365   
## 2022-08-30 -1.041500 -1.653528 -0.721884 -1.110272 -2.409860  1.181036   
## 2022-08-31 -0.007184 -1.041500 -1.653528 -0.721884 -1.110272 -2.409860   
## 2022-09-02 -0.824427 -0.007184 -1.041500 -1.653528 -0.721884 -1.110272   
## 2022-09-05  1.025065 -0.824427 -0.007184 -1.041500 -1.653528 -0.721884   
## 2022-09-06 -0.675845  1.025065 -0.824427 -0.007184 -1.041500 -1.653528   
## 2022-09-07 -0.815811 -0.675845  1.025065 -0.824427 -0.007184 -1.041500   
## 2022-09-08 -0.829800 -0.815811 -0.675845  1.025065 -0.824427 -0.007184   
## 2022-09-09 -1.108318 -0.829800 -0.815811 -0.675845  1.025065 -0.824427   
## 
##                Lag12     Lag13     Lag14     Lag15  
## Date                                                
## 2022-08-26 -0.320911  2.601320 -0.939224  0.461486  
## 2022-08-29  0.210697 -0.320911  2.601320 -0.939224  
## 2022-08-30 -0.733365  0.210697 -0.320911  2.601320  
## 2022-08-31  1.181036 -0.733365  0.210697 -0.320911  
## 2022-09-02 -2.409860  1.181036 -0.733365  0.210697  
## 2022-09-05 -1.110272 -2.409860  1.181036 -0.733365  
## 2022-09-06 -0.721884 -1.110272 -2.409860  1.181036  
## 2022-09-07 -1.653528 -0.721884 -1.110272 -2.409860  
## 2022-09-08 -1.041500 -1.653528 -0.721884 -1.110272  
## 2022-09-09 -0.007184 -1.041500 -1.653528 -0.721884

3.3: “Direction” column (+1 or -1) indicating an up/down day

***3.4: Create the dependent and independent variables ***

##                 Lag1      Lag2      Lag3      Lag4      Lag5      Lag6  \
## Date                                                                     
## 2022-09-05 -0.179986 -1.108318 -0.829800 -0.815811 -0.675845  1.025065   
## 2022-09-06  0.641312 -0.179986 -1.108318 -0.829800 -0.815811 -0.675845   
## 2022-09-07 -0.504239  0.641312 -0.179986 -1.108318 -0.829800 -0.815811   
## 2022-09-08  1.635479 -0.504239  0.641312 -0.179986 -1.108318 -0.829800   
## 2022-09-09  0.348865  1.635479 -0.504239  0.641312 -0.179986 -1.108318   
## 
##                 Lag7      Lag8      Lag9     Lag10  usdinr  usdsilver  20d_ma  \
## Date                                                                            
## 2022-09-05 -0.824427 -0.007184 -1.041500 -1.653528  79.783     18.087   56.44   
## 2022-09-06  1.025065 -0.824427 -0.007184 -1.041500  79.870     17.795   56.15   
## 2022-09-07 -0.675845  1.025065 -0.824427 -0.007184  79.630     18.137   55.92   
## 2022-09-08 -0.815811 -0.675845  1.025065 -0.824427  79.667     18.442   55.70   
## 2022-09-09 -0.829800 -0.815811 -0.675845  1.025065  79.635     18.767   55.53   
## 
##             20d_ema  50d_ma  50d_ema  100d_ma  100d_ema  200d_ma  200d_ema  \
## Date                                                                         
## 2022-09-05    55.68   57.11    57.33    59.86     59.28    62.19     61.27   
## 2022-09-06    55.51   56.98    57.19    59.70     59.17    62.14     61.20   
## 2022-09-07    55.44   56.87    57.10    59.56     59.08    62.11     61.13   
## 2022-09-08    55.39   56.78    57.01    59.42     59.00    62.07     61.07   
## 2022-09-09    55.42   56.71    56.96    59.31     58.93    62.04     61.02   
## 
##             250d_ma  250d_ema   Std_dev        RSI  Williams %R  MACD_IND  \
## Date                                                                        
## 2022-09-05    62.43     62.43  0.730205  34.198373   -60.303030 -0.263879   
## 2022-09-06    62.40     62.40  0.541825  32.078100   -55.402826 -0.214246   
## 2022-09-07    62.38     62.38  0.488597  44.557529   -14.195980 -0.107390   
## 2022-09-08    62.36     62.36  0.579435  46.935461   -16.104869 -0.014058   
## 2022-09-09    62.34     62.34  0.813458  55.053922    -3.114754  0.099377   
## 
##             VolChange  CloseChange    zscore        CCI       ROC         DX  \
## Date                                                                           
## 2022-09-05  -0.551397     0.006413 -0.101970 -76.423662 -2.762262  36.625299   
## 2022-09-06   1.396770    -0.005042 -0.125573 -57.056702 -3.245622  23.524860   
## 2022-09-07  -0.048429     0.016355 -0.049406 -41.827823 -0.845769  27.976592   
## 2022-09-08   0.011630     0.003489 -0.032893  22.520137 -1.509447  17.257426   
## 2022-09-09  -0.017205     0.013014  0.028923  77.018705  0.451223  11.028787   
## 
##                   SAR  Open-Close  Open-Open  
## Date                                          
## 2022-09-05  55.081691       0.150      0.644  
## 2022-09-06  52.700000       0.325      0.520  
## 2022-09-07  52.744940      -0.310     -0.908  
## 2022-09-08  52.788981       0.012      1.203  
## 2022-09-09  52.892222       0.674      0.853

3.5: Create training and test sets

##                 Lag1      Lag2      Lag3      Lag4      Lag5      Lag6  \
## Date                                                                     
## 2022-09-05 -0.179986 -1.108318 -0.829800 -0.815811 -0.675845  1.025065   
## 2022-09-06  0.641312 -0.179986 -1.108318 -0.829800 -0.815811 -0.675845   
## 2022-09-07 -0.504239  0.641312 -0.179986 -1.108318 -0.829800 -0.815811   
## 2022-09-08  1.635479 -0.504239  0.641312 -0.179986 -1.108318 -0.829800   
## 2022-09-09  0.348865  1.635479 -0.504239  0.641312 -0.179986 -1.108318   
## 
##                 Lag7      Lag8      Lag9     Lag10  usdinr  usdsilver  20d_ma  \
## Date                                                                            
## 2022-09-05 -0.824427 -0.007184 -1.041500 -1.653528  79.783     18.087   56.44   
## 2022-09-06  1.025065 -0.824427 -0.007184 -1.041500  79.870     17.795   56.15   
## 2022-09-07 -0.675845  1.025065 -0.824427 -0.007184  79.630     18.137   55.92   
## 2022-09-08 -0.815811 -0.675845  1.025065 -0.824427  79.667     18.442   55.70   
## 2022-09-09 -0.829800 -0.815811 -0.675845  1.025065  79.635     18.767   55.53   
## 
##             20d_ema  50d_ma  50d_ema  100d_ma  100d_ema  200d_ma  200d_ema  \
## Date                                                                         
## 2022-09-05    55.68   57.11    57.33    59.86     59.28    62.19     61.27   
## 2022-09-06    55.51   56.98    57.19    59.70     59.17    62.14     61.20   
## 2022-09-07    55.44   56.87    57.10    59.56     59.08    62.11     61.13   
## 2022-09-08    55.39   56.78    57.01    59.42     59.00    62.07     61.07   
## 2022-09-09    55.42   56.71    56.96    59.31     58.93    62.04     61.02   
## 
##             250d_ma  250d_ema   Std_dev        RSI  Williams %R  MACD_IND  \
## Date                                                                        
## 2022-09-05    62.43     62.43  0.730205  34.198373   -60.303030 -0.263879   
## 2022-09-06    62.40     62.40  0.541825  32.078100   -55.402826 -0.214246   
## 2022-09-07    62.38     62.38  0.488597  44.557529   -14.195980 -0.107390   
## 2022-09-08    62.36     62.36  0.579435  46.935461   -16.104869 -0.014058   
## 2022-09-09    62.34     62.34  0.813458  55.053922    -3.114754  0.099377   
## 
##             VolChange  CloseChange    zscore        CCI       ROC         DX  \
## Date                                                                           
## 2022-09-05  -0.551397     0.006413 -0.101970 -76.423662 -2.762262  36.625299   
## 2022-09-06   1.396770    -0.005042 -0.125573 -57.056702 -3.245622  23.524860   
## 2022-09-07  -0.048429     0.016355 -0.049406 -41.827823 -0.845769  27.976592   
## 2022-09-08   0.011630     0.003489 -0.032893  22.520137 -1.509447  17.257426   
## 2022-09-09  -0.017205     0.013014  0.028923  77.018705  0.451223  11.028787   
## 
##                   SAR  Open-Close  Open-Open  
## Date                                          
## 2022-09-05  55.081691       0.150      0.644  
## 2022-09-06  52.700000       0.325      0.520  
## 2022-09-07  52.744940      -0.310     -0.908  
## 2022-09-08  52.788981       0.012      1.203  
## 2022-09-09  52.892222       0.674      0.853

***3.6: Create model ***

3.7: train the model on the training set

## LogisticRegression(max_iter=1000000)

3.8: make an array of predictions on the test set

3.9: output the hit-rate and the confusion matrix for the model

## 
## Train Accuracy: 87.89%
## Test Accuracy: 61.85%
## [[203 157]
##  [  4  58]]

***3.10: Predict movement of stock for tomorrow. ***

##             y_test  y_pred
## Date                      
## 2022-09-05       1      -1
## 2022-09-06      -1      -1
## 2022-09-07       1      -1
## 2022-09-08       1      -1
## 2022-09-09       1       1
## Hourly Technical Indicators:
##         name       s3       s2       s1  pivot_points       r1       r2  \
## 0    Classic  55222.0  55321.0  55513.0       55612.0  55804.0  55903.0   
## 1  Fibonacci  55321.0  55432.0  55501.0       55612.0  55723.0  55792.0   
## 2  Camarilla  55626.0  55653.0  55679.0       55612.0  55733.0  55759.0   
## 3   Woodie's  55270.0  55345.0  55561.0       55636.0  55852.0  55927.0   
## 4   DeMark's      NaN      NaN  55563.0       55637.0  55854.0      NaN   
## 
##         r3  
## 0  56095.0  
## 1  55903.0  
## 2  55786.0  
## 3  56143.0  
## 4      NaN
##   technical_indicator    value      signal
## 0             RSI(14)   64.493         buy
## 1          STOCH(9,6)   99.414  overbought
## 2        STOCHRSI(14)   52.927     neutral
## 3         MACD(12,26)  223.038         buy
## 4             ADX(14)   35.855         buy
##   period  sma_value sma_signal   ema_value ema_signal
## 0      5   55509.60        buy  55565.2468        buy
## 1     10   55490.90        buy  55472.7658        buy
## 2     20   55345.10        buy  55336.8160        buy
## 3     50   54776.46        buy  54922.6271        buy
## 4    100   54330.16        buy  54777.7736        buy
## Daily Technical Indicators:
##         name       s3       s2       s1  pivot_points       r1       r2  \
## 0    Classic  53741.0  54153.0  54546.0       54958.0  55351.0  55763.0   
## 1  Fibonacci  54153.0  54461.0  54650.0       54958.0  55266.0  55455.0   
## 2  Camarilla  54719.0  54792.0  54866.0       54958.0  55014.0  55088.0   
## 3   Woodie's  53733.0  54149.0  54538.0       54954.0  55343.0  55759.0   
## 4   DeMark's      NaN      NaN  54752.0       55061.0  55558.0      NaN   
## 
##         r3  
## 0  56156.0  
## 1  55763.0  
## 2  55161.0  
## 3  56148.0  
## 4      NaN
##   technical_indicator    value      signal
## 0             RSI(14)   48.884     neutral
## 1          STOCH(9,6)   96.517  overbought
## 2        STOCHRSI(14)  100.000  overbought
## 3         MACD(12,26) -627.928        sell
## 4             ADX(14)   48.050         buy
##   period  sma_value sma_signal   ema_value ema_signal
## 0      5   54670.60        buy  54842.3623        buy
## 1     10   54375.70        buy  54852.8322        buy
## 2     20   55530.20        buy  55311.1386        buy
## 3     50   56707.46       sell  56923.3407       sell
## 4    100   59305.98       sell  58771.2108       sell
## Lag1            0.348865
## Lag2            1.635479
## Lag3           -0.504239
## Lag4            0.641312
## Lag5           -0.179986
## Lag6           -1.108318
## Lag7           -0.829800
## Lag8           -0.815811
## Lag9           -0.675845
## Lag10           1.025065
## usdinr         79.635000
## usdsilver      18.767000
## 20d_ma         55.530000
## 20d_ema        55.420000
## 50d_ma         56.710000
## 50d_ema        56.960000
## 100d_ma        59.310000
## 100d_ema       58.930000
## 200d_ma        62.040000
## 200d_ema       61.020000
## 250d_ma        62.340000
## 250d_ema       62.340000
## Std_dev         0.813458
## RSI            55.053922
## Williams %R    -3.114754
## MACD_IND        0.099377
## VolChange      -0.017205
## CloseChange     0.013014
## zscore          0.028923
## CCI            77.018705
## ROC             0.451223
## DX             11.028787
## SAR            52.892222
## Open-Close      0.674000
## Open-Open       0.853000
## Name: 2022-09-09 00:00:00, dtype: float64
## Bullish at daily: Buy at  55266.0
## Bullish hourly: Buy at  55723.0
## Pridiction: Buy at  55266.0

pd.set_option('display.max_columns', None)

dfusd2 = df[['Open','High','Low','Close','Volume']]
diffdf = dfusd2.diff()
dfusd2['Pos'] = np.where((dfusd2['Volume'] > dfusd2['Volume'].shift(1)) & ((dfusd2['Close'] > dfusd2['Close'].shift(1))),"Long","Short")



short = dfusd2[dfusd2['Open'] == dfusd2['High']]
lng = dfusd2[dfusd2['Open'] == dfusd2['Low']]


short['gain'] = short['High'] - short['Close']
## C:/Users/slaxm/AppData/Local/r-miniconda/envs/r-reticulate/python.exe:1: SettingWithCopyWarning: 
## A value is trying to be set on a copy of a slice from a DataFrame.
## Try using .loc[row_indexer,col_indexer] = value instead
## 
## See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
lng['gain'] = lng['Close'] - lng['Low']

dfusd2 = dfusd2.reset_index()

dfusd2['weekday'] = dfusd2['Date'].dt.day_name()

print(dfusd2.tail(5))
##           Date    Open    High     Low   Close  Volume    Pos    weekday
## 644 2022-06-30  20.725  20.785  20.145  20.352   61095  Short   Thursday
## 645 2022-07-01  20.290  20.290  19.295  19.667   75674  Short     Friday
## 646 2022-07-04  19.880  20.100  18.970  19.121   85003  Short     Monday
## 647 2022-07-06  19.155  19.325  18.705  19.159   66245  Short  Wednesday
## 648 2022-07-07  19.170  19.435  19.035  19.188   42480  Short   Thursday
short = short.reset_index()
lng = lng.reset_index()

short['weekday'] = short['Date'].dt.day_name()
lng['weekday'] = lng['Date'].dt.day_name()

1.2.2: Determining Average monthly closing prices

##               Open    High     Low   Close  Volume
## Date                                              
## 2019-12-09  16.600  16.730  16.565  16.642   53510
## 2019-12-10  16.655  16.770  16.625  16.702   50202
## 2019-12-11  16.710  17.025  16.650  16.849   62672
## 2019-12-12  16.930  17.185  16.820  16.949  113422
## 2019-12-13  16.970  17.095  16.895  17.012   77899
## ...            ...     ...     ...     ...     ...
## 2022-06-30  20.725  20.785  20.145  20.352   61095
## 2022-07-01  20.290  20.290  19.295  19.667   75674
## 2022-07-04  19.880  20.100  18.970  19.121   85003
## 2022-07-06  19.155  19.325  18.705  19.159   66245
## 2022-07-07  19.170  19.435  19.035  19.188   42480
## 
## [649 rows x 5 columns]
##               Open    High     Low   Close  Volume Currency  VolChange  \
## Date                                                                     
## 2019-12-09  16.600  16.730  16.565  16.642   53510      USD   0.058211   
## 2019-12-10  16.655  16.770  16.625  16.702   50202      USD  -0.061820   
## 2019-12-11  16.710  17.025  16.650  16.849   62672      USD   0.248396   
## 2019-12-12  16.930  17.185  16.820  16.949  113422      USD   0.809772   
## 2019-12-13  16.970  17.095  16.895  17.012   77899      USD  -0.313193   
## ...            ...     ...     ...     ...     ...      ...        ...   
## 2022-06-30  20.725  20.785  20.145  20.352   61095      USD   0.152671   
## 2022-07-01  20.290  20.290  19.295  19.667   75674      USD   0.238628   
## 2022-07-04  19.880  20.100  18.970  19.121   85003      USD   0.123279   
## 2022-07-06  19.155  19.325  18.705  19.159   66245      USD  -0.220675   
## 2022-07-07  19.170  19.435  19.035  19.188   42480      USD  -0.358744   
## 
##             CloseChange  
## Date                     
## 2019-12-09     0.000491  
## 2019-12-10     0.003605  
## 2019-12-11     0.008801  
## 2019-12-12     0.005935  
## 2019-12-13     0.003717  
## ...                 ...  
## 2022-06-30    -0.018613  
## 2022-07-01    -0.033658  
## 2022-07-04    -0.027762  
## 2022-07-06     0.001987  
## 2022-07-07     0.001514  
## 
## [649 rows x 8 columns]
## Date
## 2019-12-31    17.310813
## 2020-01-31    17.966571
## 2020-02-29    17.840947
## 2020-03-31    14.936818
## 2020-04-30    15.249619
## 2020-05-31    16.589400
## 2020-06-30    17.909955
## 2020-07-31    20.947455
## 2020-08-31    27.098571
## 2020-09-30    25.915238
## 2020-10-31    24.351182
## 2020-11-30    24.123150
## 2020-12-31    25.168500
## 2021-01-31    25.928737
## 2021-02-28    27.322211
## 2021-03-31    25.705522
## 2021-04-30    25.722000
## 2021-05-31    27.614286
## 2021-06-30    27.015333
## 2021-07-31    25.721190
## 2021-08-31    23.976864
## 2021-09-30    23.218048
## 2021-10-31    23.429000
## 2021-11-30    24.194524
## 2021-12-31    22.515273
## 2022-01-31    23.191400
## 2022-02-28    23.539789
## 2022-03-31    25.459435
## 2022-04-30    24.643900
## 2022-05-31    21.851714
## 2022-06-30    21.511000
## 2022-07-31    19.283750
## Freq: M, Name: Close, dtype: float64
## meanprice is  22.853995377503853

1.2.3: Technical moving averages

##   period  sma_value sma_signal   ema_value ema_signal
## 0      5   54670.60        buy  54842.3623        buy
## 1     10   54375.70        buy  54852.8322        buy
## 2     20   55530.20        buy  55311.1386        buy
## 3     50   56707.46       sell  56923.3407       sell
## 4    100   59305.98       sell  58771.2108       sell

1.2.4: Determining the z-scores

## Date
## 2022-06-30   -0.666913
## 2022-07-01   -0.849501
## 2022-07-04   -0.995039
## 2022-07-06   -0.984910
## 2022-07-07   -0.977180
## Name: zscore, dtype: float64
## Date
## 2022-06-30   -0.328213
## 2022-07-01    0.042942
## 2022-07-04    0.280441
## 2022-07-06   -0.197103
## 2022-07-07   -0.802117
## Name: zscorevolume, dtype: float64

1.2.5: Determining the daily support and resistance levels

##                    PP         R1         S1         R2         S2         R3  \
## Date                                                                           
## 2022-06-30  20.427333  20.709667  20.069667  21.067333  19.787333  21.349667   
## 2022-07-01  19.750667  20.206333  19.211333  20.745667  18.755667  21.201333   
## 2022-07-04  19.397000  19.824000  18.694000  20.527000  18.267000  20.954000   
## 2022-07-06  19.063000  19.421000  18.801000  19.683000  18.443000  20.041000   
## 2022-07-07  19.219333  19.403667  19.003667  19.619333  18.819333  19.803667   
## 
##                    S3  
## Date                   
## 2022-06-30  19.429667  
## 2022-07-01  18.216333  
## 2022-07-04  17.564000  
## 2022-07-06  18.181000  
## 2022-07-07  18.603667

1.2.6: Fibonnacci retracement levels

## Retracement levels for rising price
##                              0
## 0            {'min': [19.035]}
## 1  {'level5(61.8)': [19.1878]}
## 2     {'level4(50)': [19.235]}
## 3  {'level3(38.2)': [19.2822]}
## 4  {'level2(23.6)': [19.3406]}
## 5           {'zero': [19.435]}
## Retracement levels for falling price
##                              0
## 0           {'zero': [19.435]}
## 1  {'level2(23.6)': [19.1294]}
## 2  {'level3(38.2)': [19.1878]}
## 3     {'level4(50)': [19.235]}
## 4  {'level5(61.8)': [19.2822]}
## 5            {'min': [19.035]}

As we can see, we now have a data frame with all the entries from start date to end date. We have multiple columns here and not only the closing stock price of the respective day. Let’s take a quick look at the individual columns and their meaning.

Open: That’s the share price the stock had when the markets opened that day.

Close: That’s the share price the stock had when the markets closed that day.

High: That’s the highest share price that the stock had that day.

Low: That’s the lowest share price that the stock had that day.

Volume: Amount of shares that changed hands that day.

1.3:Reading individual values

Since our data is stored in a Pandas data frame, we can use the indexing we already know, to get individual values. For example, we could only print the closing values using print (df[ 'Close' ])

Also, we can go ahead and print the closing value of a specific date that we are interested in. This is possible because the date is our index column.

print (df[ 'Close' ][ '2020-07-14' ])
## 19.53

But we could also use simple indexing to access certain positions.

print (df[ 'Close' ][ 5 ])
## 17.113

Here we printed the closing price of the fifth entry.

2:Graphical Visualization

Even though tables are nice and useful, we want to visualize our financial data, in order to get a better overview. We want to look at the development of the share price.

Actually plotting our share price curve with Pandas and Matplotlib is very simple. Since Pandas builds on top of Matplotlib, we can just select the column we are interested in and apply the plot method. The results are amazing. Since the date is the index of our data frame, Matplotlib uses it for the x-axis. The y-values are then our adjusted close values.

2.1:CandleStick Charts

The best way to visualize stock data is to use so-called candlestick charts . This type of chart gives us information about four different values at the same time, namely the high, the low, the open and the close value. In order to plot candlestick charts, we will need to import a function of the MPL-Finance library.

import mplfinance as fplt

We are importing the candlestick_ohlc function. Notice that there also exists a candlestick_ochl function that takes in the data in a different order. Also, for our candlestick chart, we will need a different date format provided by Matplotlib. Therefore, we need to import the respective module as well. We give it the alias mdates .

import matplotlib.dates as mdates

2.2: Preparing the data for CandleStick charts

Now in order to plot our stock data, we need to select the four columns in the right order.

df1 = df[[ 'Open' , 'High' , 'Low' , 'Close' ]]

Now, we have our columns in the right order but there is still a problem. Our date doesn’t have the right format and since it is the index, we cannot manipulate it. Therefore, we need to reset the index and then convert our datetime to a number.

df1.reset_index( inplace = True )
df1[ 'Date' ] = df1[ 'Date' ].map(mdates.date2num)

For this, we use the reset_index function so that we can manipulate our Date column. Notice that we are using the inplace parameter to replace the data frame by the new one. After that, we map the date2num function of the matplotlib.dates module on all of our values. That converts our dates into numbers that we can work with.

2.3:Plotting the data

Now we can start plotting our graph. For this, we just define a subplot (because we need to pass one to our function) and call our candlestick_ohlc function.

One candlestick gives us the information about all four values of one specific day. The highest point of the stick is the high and the lowest point is the low of that day. The colored area is the difference between the open and the close price. If the stick is green, the close value is at the top and the open value at the bottom, since the close must be higher than the open. If it is red, it is the other way around.

***2.4:Analysis and Statistics ***

Now let’s get a little bit deeper into the numbers here and away from the visual. From our data we can derive some statistical values that will help us to analyze it.

PERCENTAGE CHANGE

One value that we can calculate is the percentage change of that day. This means by how many percent the share price increased or decreased that day.

The calculation is quite simple. We create a new column with the name PCT_Change and the values are just the difference of the closing and opening values divided by the opening values. Since the open value is the beginning value of that day, we take it as a basis. We could also multiply the result by 100 to get the actual percentage.

##        PCT_Change
## count  649.000000
## mean    -0.000722
## std      0.022166
## min     -0.151258
## 25%     -0.010537
## 50%     -0.000286
## 75%      0.010432
## max      0.081467
##              Close
## Date              
## 2022-07-07  19.188

*** HIGH LOW PERCENTAGE ***

Another interesting statistic is the high low percentage. Here we just calculate the difference between the highest and the lowest value and divide it by the closing value.

By doing that we can get a feeling of how volatile the stock is.

##               Open    High     Low   Close  Volume  ... Open-Open    zscore  \
## Date                                                ...                       
## 2022-06-30  20.725  20.785  20.145  20.352   61095  ...    -0.115 -0.666913   
## 2022-07-01  20.290  20.290  19.295  19.667   75674  ...    -0.435 -0.849501   
## 2022-07-04  19.880  20.100  18.970  19.121   85003  ...    -0.410 -0.995039   
## 2022-07-06  19.155  19.325  18.705  19.159   66245  ...    -0.725 -0.984910   
## 2022-07-07  19.170  19.435  19.035  19.188   42480  ...     0.015 -0.977180   
## 
##             zscorevolume  PCT_Change    HL_PCT  
## Date                                            
## 2022-06-30     -0.328213   -0.017998  0.031447  
## 2022-07-01      0.042942   -0.030705  0.050592  
## 2022-07-04      0.280441   -0.038179  0.059097  
## 2022-07-06     -0.197103    0.000209  0.032361  
## 2022-07-07     -0.802117    0.000939  0.020846  
## 
## [5 rows x 15 columns]

These statistical values can be used with many others to get a lot of valuable information about specific stocks. This improves the decision making

*** MOVING AVERAGE ***

we are going to derive the different moving averages . It is the arithmetic mean of all the values of the past n days. Of course this is not the only key statistic that we can derive, but it is the one we are going to use now. We can play around with other functions as well.

What we are going to do with this value is to include it into our data frame and to compare it with the share price of that day.

For this, we will first need to create a new column. Pandas does this automatically when we assign values to a column name. This means that we don’t have to explicitly define that we are creating a new column.

##              Close  5d_ma  20d_ma  50d_ma  100d_ma  ...  5d_ema  20d_ema  \
## Date                                                ...                    
## 2022-06-28  20.806  21.11   21.62   22.24    23.50  ...   21.11    21.53   
## 2022-06-29  20.738  20.98   21.57   22.13    23.48  ...   20.99    21.46   
## 2022-06-30  20.352  20.84   21.49   22.03    23.46  ...   20.77    21.35   
## 2022-07-01  19.667  20.55   21.36   21.92    23.43  ...   20.41    21.19   
## 2022-07-04  19.121  20.14   21.22   21.81    23.39  ...   19.98    20.99   
## 2022-07-06  19.159  19.81   21.07   21.71    23.34  ...   19.70    20.82   
## 2022-07-07  19.188  19.50   20.92   21.62    23.30  ...   19.53    20.66   
## 
##             50d_ema  100d_ema  200d_ema  
## Date                                     
## 2022-06-28    22.22     22.88     23.43  
## 2022-06-29    22.17     22.84     23.40  
## 2022-06-30    22.10     22.79     23.37  
## 2022-07-01    22.00     22.73     23.33  
## 2022-07-04    21.89     22.66     23.29  
## 2022-07-06    21.78     22.59     23.25  
## 2022-07-07    21.68     22.52     23.21  
## 
## [7 rows x 11 columns]

Here we define a three new columns with the name 20d_ma, 50d_ma, 100d_ma,200d_ma . We now fill this column with the mean values of every n entries. The rolling function stacks a specific amount of entries, in order to make a statistical calculation possible. The window parameter is the one which defines how many entries we are going to stack. But there is also the min_periods parameter. This one defines how many entries we need to have as a minimum in order to perform the calculation. This is relevant because the first entries of our data frame won’t have a n entries previous to them. By setting this value to zero we start the calculations already with the first number, even if there is not a single previous value. This has the effect that the first value will be just the first number, the second one will be the mean of the first two numbers and so on, until we get to a b values.

By using the mean function, we are obviously calculating the arithmetic mean. However, we can use a bunch of other functions like max, min or median if we like to.

*** Standard Deviation ***

The variability of the closing stock prices determinies how vo widely prices are dispersed from the average price. If the prices are trading in narrow trading range the standard deviation will return a low value that indicates low volatility. If the prices are trading in wide trading range the standard deviation will return high value that indicates high volatility.

## Date
## 2022-06-28    0.334140
## 2022-06-29    0.354968
## 2022-06-30    0.349222
## 2022-07-01    0.535021
## 2022-07-04    0.770875
## 2022-07-06    0.829377
## 2022-07-07    0.756549
## Name: Std_dev, dtype: float64

*** Relative Strength Index ***

The relative strength index is a indicator of mementum used in technical analysis that measures the magnitude of current price changes to know overbought or oversold conditions in the price of a stock or other asset. If RSI is above 70 then it is overbought. If RSI is below 30 then it is oversold condition.

##                   RSI
## Date                 
## 2022-06-30  28.293451
## 2022-07-01  21.365579
## 2022-07-04  17.518977
## 2022-07-06  18.665495
## 2022-07-07  19.624711

*** Average True range ***

##                    ATR    20dayEMA    ATRdiff
## Date                                         
## 2022-06-30  614.972184  643.164929 -28.192745
## 2022-07-01  646.545599  643.486898   3.058701
## 2022-07-04  681.078057  647.067008  34.011048
## 2022-07-06  676.715338  649.890659  26.824679
## 2022-07-07  656.949957  650.562973   6.386984

*** Wiliams %R ***

Williams %R, or just %R, is a technical analysis oscillator showing the current closing price in relation to the high and low of the past N days.The oscillator is on a negative scale, from −100 (lowest) up to 0 (highest). A value of −100 means the close today was the lowest low of the past N days, and 0 means today’s close was the highest high of the past N days.

##             Williams %R
## Date                   
## 2022-06-28   -81.619718
## 2022-06-29   -86.263345
## 2022-06-30   -86.470588
## 2022-07-01   -83.392857
## 2022-07-04   -94.113060
## 2022-07-06   -83.957597
## 2022-07-07   -81.773585

Readings below -80 represent oversold territory and readings above -20 represent overbought.

*** ADX ***

ADX is used to quantify trend strength. ADX calculations are based on a moving average of price range expansion over a given period of time. The average directional index (ADX) is used to determine when the price is trending strongly.

0-25: Absent or Weak Trend

25-50: Strong Trend

50-75: Very Strong Trend

75-100: Extremely Strong Trend

##                   ADX
## Date                 
## 2022-06-28  27.095182
## 2022-06-29  29.305020
## 2022-06-30  33.235820
## 2022-07-01  38.740162
## 2022-07-04  43.997634
## 2022-07-06  48.898406
## 2022-07-07  51.778903
##                    CCI
## Date                  
## 2022-06-28 -114.087460
## 2022-06-29 -129.806598
## 2022-06-30 -173.455423
## 2022-07-01 -229.398740
## 2022-07-04 -205.856883
## 2022-07-06 -174.333557
## 2022-07-07 -122.900664
##                   ROC
## Date                 
## 2022-06-28  -2.112444
## 2022-06-29  -1.030829
## 2022-06-30  -4.985994
## 2022-07-01 -10.134796
## 2022-07-04 -11.423542
## 2022-07-06 -11.985483
## 2022-07-07 -10.424350

*** MACD ***

Moving Average Convergence Divergence (MACD) is a trend-following momentum indicator that shows the relationship between two moving averages of a security’s price. The MACD is calculated by subtracting the 26-period Exponential Moving Average (EMA) from the 12-period EMA.

##             MACD_IND
## Date                
## 2022-06-30 -0.073315
## 2022-07-01 -0.127615
## 2022-07-04 -0.186573
## 2022-07-06 -0.207666
## 2022-07-07 -0.204022
## MACD_IND   -0.159838
## dtype: float64

*** Bollinger Bands ***

Bollinger Bands are a type of statistical chart characterizing the prices and volatility over time of a financial instrument or commodity.

## 22.79 20.92 19.06

In case we choose another value than zero for our min_periods parameter, we will end up with a couple of NaN-Values . These are not a number values and they are useless. Therefore, we would want to delete the entries that have such values.

We do this by using the dropna function. If we would have had any entries with NaN values in any column, they would now have been deleted

3: Predicting the movement of stock

To predict the movement of the stock we use 5 lag returns as the dependent variables. The first leg is return yesterday, leg2 is return day before yesterday and so on. The dependent variable is whether the prices went up or down on that day. Other variables include the technical indicators which along with 5 lag returns are used to predict the movement of stock using logistic regression.

3.1: Creating lag returns

3.2: Creating returns dataframe

3.2: create the lagged percentage returns columns

##                Today      Lag1      Lag2      Lag3      Lag4  ...     Lag11  \
## Date                                                          ...             
## 2022-06-23 -1.769292 -1.594083  0.838468 -1.361663  2.170868  ...  0.389281   
## 2022-06-24  0.394449 -1.769292 -1.594083  0.838468 -1.361663  ... -0.378754   
## 2022-06-27  0.203550  0.394449 -1.769292 -1.594083  0.838468  ... -1.253734   
## 2022-06-28 -1.710128  0.203550  0.394449 -1.769292 -1.594083  ...  0.522528   
## 2022-06-29 -0.326829 -1.710128  0.203550  0.394449 -1.769292  ... -3.082395   
## 2022-06-30 -1.861317 -0.326829 -1.710128  0.203550  0.394449  ... -1.416137   
## 2022-07-01 -3.365763 -1.861317 -0.326829 -1.710128  0.203550  ...  2.223919   
## 2022-07-04 -2.776224 -3.365763 -1.861317 -0.326829 -1.710128  ...  2.170868   
## 2022-07-06  0.198734 -2.776224 -3.365763 -1.861317 -0.326829  ... -1.361663   
## 2022-07-07  0.151365  0.198734 -2.776224 -3.365763 -1.861317  ...  0.838468   
## 
##                Lag12     Lag13     Lag14     Lag15  
## Date                                                
## 2022-06-23  0.839876 -1.647587  1.642710  1.046662  
## 2022-06-24  0.389281  0.839876 -1.647587  1.642710  
## 2022-06-27 -0.378754  0.389281  0.839876 -1.647587  
## 2022-06-28 -1.253734 -0.378754  0.389281  0.839876  
## 2022-06-29  0.522528 -1.253734 -0.378754  0.389281  
## 2022-06-30 -3.082395  0.522528 -1.253734 -0.378754  
## 2022-07-01 -1.416137 -3.082395  0.522528 -1.253734  
## 2022-07-04  2.223919 -1.416137 -3.082395  0.522528  
## 2022-07-06  2.170868  2.223919 -1.416137 -3.082395  
## 2022-07-07 -1.361663  2.170868  2.223919 -1.416137  
## 
## [10 rows x 16 columns]

3.3: “Direction” column (+1 or -1) indicating an up/down day

***3.4: Create the dependent and independent variables ***

3.5: Create training and test sets

##                 Lag1      Lag2      Lag3      Lag4      Lag5  ...         CCI  \
## Date                                                          ...               
## 2022-06-30 -0.326829 -1.710128  0.203550  0.394449 -1.769292  ... -173.455423   
## 2022-07-01 -1.861317 -0.326829 -1.710128  0.203550  0.394449  ... -229.398740   
## 2022-07-04 -3.365763 -1.861317 -0.326829 -1.710128  0.203550  ... -205.856883   
## 2022-07-06 -2.776224 -3.365763 -1.861317 -0.326829 -1.710128  ... -174.333557   
## 2022-07-07  0.198734 -2.776224 -3.365763 -1.861317 -0.326829  ... -122.900664   
## 
##                   ROC         DX  Open-Close  Open-Open  
## Date                                                     
## 2022-06-30  -4.985994  41.230925      -0.013     -0.115  
## 2022-07-01 -10.134796  53.626937      -0.062     -0.435  
## 2022-07-04 -11.423542  57.332698       0.213     -0.410  
## 2022-07-06 -11.985483  60.130396       0.034     -0.725  
## 2022-07-07 -10.424350  55.570404       0.011      0.015  
## 
## [5 rows x 34 columns]

***3.6: Create model ***

3.7: train the model on the training set

## LogisticRegression(max_iter=10000)

3.8: make an array of predictions on the test set

3.9: output the hit-rate and the confusion matrix for the model

## 
## Train Accuracy: 94.98%
## Test Accuracy: 80.26%
## [[180  70]
##  [  5 125]]

***3.10: Predict movement of stock for tomorrow. ***

##             y_test  y_pred
## Date                      
## 2022-06-30      -1      -1
## 2022-07-01      -1      -1
## 2022-07-04      -1      -1
## 2022-07-06       1      -1
## 2022-07-07       1      -1
## [-1]
## AMO Sell  55555

Say Something

Comments

Nothing yet.

Recent Posts

Categories

About

about