---------------------------------------------------
#                                                 #
#     Chapter 2: Typical Properties of            #
#           Financial Time Series                 #
#                                                 #
---------------------------------------------------



Returns of liquid tradable assets like stocks, stock 
indices, currencies or commodities like oil, gold or 
silver have some common typical properties which are 
usually referred to as stylized facts. In the following 
we list these stylized facts and illustrate them on 
concrete data.
 

Let S_k denote the closing price of some liquid 
tradable asset on day t_k. Its return ret(t_k) from 
day t_{k-1} to day t_k is defined by  

ret(t_k) := (S_k - S_{k-1}) / S_{k-1}                (1)

The d-day mean on day t_k is defined by 

mean_d(t_k) := 1/d sum_{j=0}^{d-1} ret(t_{k-j})      (2)

There are the following so called stylized facts:


Stylized Fact 1) For arbitrary time horizons d (larger 
than 2 or 3 weeks) the d-day means are close to zero. 


Because of stylized fact1 the d-day standard deviation 
(also called d-day volatility) on day t_k can be defined 
by

stddev_d(t_k) := sqrt{ 1/d sum_{j=0}^{d-1} ret(t_{k-j})^2 }    
                                                     (3)

Normalized quantities, in a probabilistic context, are 
defined by (quantity - mean)/stddev which reduces to 
quantity/stddev if the mean is zero. Thus, because of 
stylized fact1, we define normalized returns (for a 
d-day time horizon) through

normret_d(t_k) := ret(t_k) / stddev_d(t_{k-1})       (4)

Then one can take a look at the distribution of the 
normalized returns by making a histogram plot. One 
finds:


Stylized Fact 2a) For large time horizon d (say, d>=250, 
which corresponds to one year), the distribution of the 
normalized returns is "leptocurtic", that is, there is 
a more pronounced peak at zero and more faster decay 
in a neighborhood of zero when compared to the standard 
normal distribution. Furthermore there are "heavy tails": 
there are some very large positive or negative normalized 
returns which one would not see under the assumption of 
a normal distribution.  

Stylized Fact 2b) By making d smaller and finally put-
ting it to d=15 or d=20 (3 or 4 weeks), the distribution 
of the normalized returns approaches more and more the 
shape of a normal distribution. The occurance of heavy 
tails decreases, but the variability of the d-day stddev 
or equivalently, the variability of the d-day volatility 
increases. This observation lies at the bottom of the 
stochastic volatility models.


Furthermore one finds the following:


Stylized Fact 3) Returns for different days t_k are 
nearly uncorrelated. However, the absolute value or 
the square of returns have a significant positive 
correlation (which could be around 10%-20%).

Stylized Fact 4) "volatility clustering": returns tend 
to group themselves into clusters or phases with high 
or low volatility, there is no uniform distribution 
of the returns over time. This is also a motivation 
for stochastic volatility models. 


Let's demonstrate these facts now by looking at con-
crete data: the file DAX.txt contains the daily closing 
prices for the time period 2005-2014, the file SPX.txt 
contains the daily closings of the S&P500 for 1950-2015 
and GE.txt contains the daily closing prices for the 
General Electric stock from 1962 to 2015. This results 
in approximately
DAX:  2500
SPX:  16500
GE:   13500
observation days. All data are taken from Yahoo Finance.



#
# Start R Session:
#
dax=read.table("C:/Users/Admin/Courses/FinancialStatistics/DAX.txt",header=TRUE,sep=";")
dax
head(dax)
tail(dax)
summary(dax)
str(dax)
# let's remove all rows with NAs:
dax = na.omit(dax)
summary(dax)

# in R there are basically 4 different data types: vectors, 
# matrices, data frames and lists: 
# dax is of type data frame
# the content of each column can be written into some vector:

names(dax)
S = dax$index             # equivalent to S = dax[,2]
S
str(S)
plot(S)
plot(S,type="l")
dates = as.Date(dax$dates)
plot(dates,S,type="l",main="DAX30, 2005-2014")




S[1]
S[0]                      # vektor indices always start with 1, not 0
n=length(S)
n
S[n]
S[n+1]

# we calculate the returns:
ret = rep(0,n)            # vector of length n, all entries 0
for(i in 2:n)
{
  ret[i] = (S[i]-S[i-1])/S[i-1]
}
plot(ret)
plot(dates,ret,type="l",main="dax-returns, 2005-2014")




# we code 3 functions for the d-day means, stddevs and normrets:

dDayMean = function( d , ret )
{
  n = length(ret)
  result = rep(0,n)
  summe = 0

  for(i in 1:n)
  {
    summe = summe + ret[i]
    if(i > d) 
    {
      summe = summe - ret[i-d]
      result[i] = summe/d
    }
    else
    {
      result[i] = summe/i
    }
  }
  return(result)
}

# let's take a look at Stylized Fact 1:
mean20 = dDayMean(20,ret)
mean60 = dDayMean(60,ret)
mean180 = dDayMean(180,ret)
plot(dates,mean20,main="means of dax-returns, d = 20, 60 and 180 days")
points(dates,mean60,col="red")
points(dates,mean180,col="yellow")




dDayStdDev = function( d , ret )
{
  n = length(ret)
  result = rep(0,n)
  summe = 0

  for(i in 1:n)
  {
    summe = summe + ret[i]*ret[i]
    if(i > d) 
    {
      summe = summe - ret[i-d]*ret[i-d]
      result[i] = summe/d
    }
    else
    {
      result[i] = summe/i
    }
  }
  # a theoretical 0 could numerically become slightly negative:
  result = abs(result)     # we want to take a square root
  result = sqrt(result)    # square root element by element
  return(result)
}

# let's check:
stddev20 = dDayStdDev(20,ret)
stddev60 = dDayStdDev(60,ret)
stddev180 = dDayStdDev(180,ret)
stddev_n = dDayStdDev(n,ret)
plot(dates,stddev20,main="standard deviation of dax-returns,\n d = 20, 60, 180 and all=2497 days")
points(dates,stddev60,col="red")
points(dates,stddev180,col="yellow")
points(dates,stddev_n,col="green")
# apparently: volatility is not constant




dDayNormRet = function( d , ret )
{
  stddev = dDayStdDev(d,ret)
  # stddev could be 0 if data are constant:
  stddev = pmax(stddev,0.00000001)     # we want to devide by stddev
  result = ret/stddev                  # division element by element
  return(result)
}

normret15 = dDayNormRet(15,ret)
normret60 = dDayNormRet(60,ret)
normret250 = dDayNormRet(250,ret)
normret_n = ret/stddev_n[n]

par(mfrow=c(2,2),oma=c(0,0,2,0))       #set up 2 time 2 plot array
                                       #that is, 4 pictures at once
hist(normret_n,breaks=50,xlim=c(-5,5),ylim=c(0,0.8),prob=TRUE,main="d = 2497 (all days)")
curve(dnorm(x),from=-5,to=5,add=TRUE,col="red")
hist(normret250,breaks=50,xlim=c(-5,5),ylim=c(0,0.8),prob=TRUE,main="d = 250")
curve(dnorm(x),from=-5,to=5,add=TRUE,col="red")
hist(normret60,breaks=50,xlim=c(-5,5),ylim=c(0,0.8),prob=TRUE,main="d = 60")
curve(dnorm(x),from=-5,to=5,add=TRUE,col="red")
hist(normret15,breaks=40,xlim=c(-5,5),ylim=c(0,0.8),prob=TRUE,main="d = 15")
curve(dnorm(x),from=-5,to=5,add=TRUE,col="red")
title(main="Histograms of d-Day Normalized DAX30 Returns",outer=TRUE)




# thus: if returns are normalized with more recent volatility 
# data, then normalized returns are more Gaussian. 
#
# Heavy tails are less likely to occur, since the normalization 
# can take care of a recent increase in volatility, if the nor-
# malization is calculated from more recent data instead of 
# using the whole time horizon of 10 or 20 years.




----------------------------------------------
#    let's redo this for SPX, since 1950:    #
----------------------------------------------

spx=read.table("C:/Users/Admin/Courses/FinancialStatistics/SPX.txt",header=TRUE,sep=";")
head(spx)
tail(spx)
summary(spx)
str(spx)                    # no NA's!

names(spx)
S = spx[,2]                 # equivalent to S = spx$Adj.Close
dates = as.Date(spx$Date,format="%d-%m-%y")
dates                       # dates prior to 1969 have wrong year
datecorr = as.Date("1968-12-31") - as.Date("2068-12-31")
dates[1:4750] = dates[1:4750] + datecorr
dates
plot(dates,S,type="l",main="S&P500, 1950 - 2015\n linear scale")



plot(dates,S,type="l",log="y",main="S&P500, 1950 - 2015\n log scale")  # straight line would be constant exponential growth



n=length(S)
n


# we calculate the returns:
ret = rep(0,n)       
for(i in 2:n)
{
  ret[i] = (S[i]-S[i-1])/S[i-1]
}
plot(dates,ret,type="l",main="spx-returns, 1950 - 2015")




# let's take a look at Stylized Fact 1:
mean20 = dDayMean(20,ret)
mean60 = dDayMean(60,ret)
mean180 = dDayMean(180,ret)
plot(dates,mean20,main="means of spx-returns, d = 20, 60 and 180 days")
points(mean60,col="red")
points(mean180,col="yellow")




# stddev's = daily volatilities:
stddev20 = dDayStdDev(20,ret)
stddev60 = dDayStdDev(60,ret)
stddev180 = dDayStdDev(180,ret)
stddev_n = dDayStdDev(n,ret)
plot(dates,stddev20,main="standard deviation of spx-returns,\n d = 20, 60, 180 and all=16553 days")
points(dates,stddev60,col="red")
points(dates,stddev180,col="yellow")
points(dates,stddev_n,col="green")
# apparently: volatility is not constant




# normalized returns:
normret15 = dDayNormRet(15,ret)
normret60 = dDayNormRet(60,ret)
normret250 = dDayNormRet(250,ret)
normret_n = ret/stddev_n[n]

par(mfrow=c(2,2),oma=c(0,0,2,0))       #set up 2 time 2 plot array
                                       #that is, 4 pictures at once
hist(normret_n,breaks=150,xlim=c(-5,5),ylim=c(0,0.8),prob=TRUE,main="d = 16553 (all days)")
curve(dnorm(x),from=-5,to=5,add=TRUE,col="red")
hist(normret250,breaks=100,xlim=c(-5,5),ylim=c(0,0.8),prob=TRUE,main="d = 250")
curve(dnorm(x),from=-5,to=5,add=TRUE,col="red")
hist(normret60,breaks=50,xlim=c(-5,5),ylim=c(0,0.8),prob=TRUE,main="d = 60")
curve(dnorm(x),from=-5,to=5,add=TRUE,col="red")
hist(normret15,breaks=50,xlim=c(-5,5),ylim=c(0,0.8),prob=TRUE,main="d = 15")
curve(dnorm(x),from=-5,to=5,add=TRUE,col="red")
title(main="Histograms of d-Day Normalized S&P500 Returns",outer=TRUE)




# thus: if returns are normalized with more recent volatility 
# data, then normalized returns are more Gaussian 


-----------------------------------------------
#    finally we take a single stock:          #                              
#    General Electric Company, since 1962:    #
-----------------------------------------------
GE=read.table("C:/Users/Admin/Courses/FinancialStatistics/GE.txt",header=TRUE,sep=";")
head(GE)
tail(GE)
summary(GE)                 # no NA's
names(GE)
S = GE[,2]                  
dates = as.Date(GE$Date,format="%d-%m-%y")
dates                       # dates prior to 1969 have wrong year
datecorr = as.Date("1968-12-31") - as.Date("2068-12-31")
dates[1:1737] = dates[1:1737] + datecorr
dates
plot(dates,S,type="l",main="General Electric, 1962 - 2015\n linear scale")



plot(dates,S,type="l",log="y",main="General Electric, 1950 - 2015\n log scale")  # straight line would be constant exponential growth




n=length(S)
n

# we calculate the returns: 
ret = rep(0,n)       
for(i in 2:n)
{
  ret[i] = (S[i]-S[i-1])/S[i-1]
}
plot(dates,ret,type="l",main="GE-returns, 1962 - 2015")




# let's take a look at Stylized Fact 1:
mean20 = dDayMean(20,ret)
mean60 = dDayMean(60,ret)
mean180 = dDayMean(180,ret)
plot(dates,mean20,main="means of GE-returns, d = 20, 60 and 180 days")
points(dates,mean60,col="red")
points(dates,mean180,col="yellow")




# stddev's = daily volatilities:
stddev20 = dDayStdDev(20,ret)
stddev60 = dDayStdDev(60,ret)
stddev180 = dDayStdDev(180,ret)
stddev_n = dDayStdDev(n,ret)
plot(dates,stddev20,main="standard deviation of GE-returns,\n d = 20, 60, 180 and all=13540 days")
points(dates,stddev60,col="red")
points(dates,stddev180,col="yellow")
points(dates,stddev_n,col="green")
# apparently: volatility is not constant




# normalized returns:
normret15 = dDayNormRet(15,ret)
normret60 = dDayNormRet(60,ret)
normret250 = dDayNormRet(250,ret)
normret_n = ret/stddev_n[n]

par(mfrow=c(2,2),oma=c(0,0,2,0))       #set up 2 time 2 plot array
                                       #that is, 4 pictures at once
hist(normret_n,breaks=150,xlim=c(-5,5),ylim=c(0,0.8),prob=TRUE,main="d = 13540 (all days)")
curve(dnorm(x),from=-5,to=5,add=TRUE,col="red")
hist(normret250,breaks=100,xlim=c(-5,5),ylim=c(0,0.8),prob=TRUE,main="d = 250")
curve(dnorm(x),from=-5,to=5,add=TRUE,col="red")
hist(normret60,breaks=50,xlim=c(-5,5),ylim=c(0,0.8),prob=TRUE,main="d = 60")
curve(dnorm(x),from=-5,to=5,add=TRUE,col="red")
hist(normret15,breaks=50,xlim=c(-5,5),ylim=c(0,0.8),prob=TRUE,main="d = 15")
curve(dnorm(x),from=-5,to=5,add=TRUE,col="red")
title(main="Histograms of d-Day Normalized GE Returns",outer=TRUE)




# thus: if returns are normalized with more recent volatility 
# data, then normalized returns are more Gaussian 




-------------------------------------------
#            Main   Conclusion:           #
-------------------------------------------

The normalized returns

normret_d(t_k) = ret(t_k) / stddev_d(t_{k-1})                (4)

may be approximated pretty well by a standard normal distri-
bution, if the time horizon d is chosen not too large: d=15 
days or d=20 turned out to be quite reasonable choices.  
Thus we can write

ret(t_k) / stddev_d(t_{k-1}) = phi_k                         (5)

with phi_k being a standard normal random number.  
Equation (5) is equivalent to

S_k = S_{k-1} * [ 1 + stddev_d(t_{k-1}) * phi_k ]            (6)

Using the data which are known on day t_{k-1} and by drawing 
a normally distributed random number, this equation gives us 
a price for day t_k.

Now, is this a reasonable stochastic model? To this end we 
will simulate a couple of paths with price dynamics given 
by (6) in the next chapter and we will find that the model 
actually cannot be used as exactly given by (6), but we have 
to make some adjustments.