Do not compound log-returns multiplicatively¶

(with Matthias Zuckschwert)

Some students have compounded log-returns multiplicatively in their theses. This notebook illustrates with a numerical example why this is not appropriate.

Import packages matplotlib and pandas_datareader.

In [1]:

import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import pandas_datareader
import pandas_datareader.data as web
print('matplotlib version: ' + matplotlib.__version__)
print('datareader version: ' + pandas_datareader.__version__)

matplotlib version: 3.5.0
datareader version: 0.10.0

Compare the identity $f(x)=x$ with its logarithmic approximation $g(x)=\ln(1+x)$ for $x \in (-1,1.5]$.

In [2]:

x = np.linspace(-0.99,1.5,50)
plt.plot(x,x, label='f(x)=x')
plt.plot(x,np.log(1+x), label='g(x)=ln(1+x)')
plt.legend()

Out[2]:

<matplotlib.legend.Legend at 0x7f1b17298910>

The log is an approximation. This approximation is quite good near zero but deteriorates for values that are not near zero. Thus, when using log returns to approximate simple returns, there is a more significant error for more extreme returns. If you compound log returns multiplicatively in an asset's long-term performance analysis, you would compound all the small errors caused by the logarithm.

Let's look at an example using the value-weighted monthly market returns from Ken French's data library.

In [3]:

ds = web.DataReader('F-F_Research_Data_Factors', 'famafrench', start='1927-01-01', end='2020-12-31')[0]
mp = (ds['Mkt-RF']+ds['RF'])/100

Compute the logarithmic approximation of the simple returns on the market portfolio.

In [4]:

mp_log = np.log(1+mp)

Use a scatter plot to compare the simple returns to their logarithmic approximation.

In [5]:

plt.scatter(mp,mp_log, facecolors='none', edgecolors='tab:orange')
x = np.linspace(-0.3,0.4,50)
plt.plot(x,x)

Out[5]:

[<matplotlib.lines.Line2D at 0x7f1b166bc0a0>]

Note that the log return is smaller than or equal to the simple return.

How would an investment of $P_0$ currency units perform over time? Recall that the gross return $R_t$ is defined as $R_t = 1 + r_t = \dfrac{P_t}{P_{t-1}}$, where $r_t$ is the simple return. The log-return is defined as $r_t^l = ln\left(\dfrac{P_t}{P_{t-1}}\right)$.

The value of an asset at time $t$ is $P_t = P_{t-1}(1+r_t)$. Over a longer horizon, we get $P_t = P_0 \cdot\prod\limits_{i=1}^{t}(1+r_i)$, where $P_0$ represents the initial investment.

Important: You cannot substitute the simple return $r_i$ in the equation above with the log-return $r_i^l$, that is, you cannot compound log returns multiplicatively. Most of the time, the error is non-negligible.

The following plot shows the performance of the investment of $P_0 = 1$ in the US market (1927-2020). We correctly use the simple returns in equation $P_t = P_0 \cdot\prod\limits_{i=1}^{t}(1+r_i)$ for the blue line. For the orange line, we incorrectly use log-returns.

In [6]:

fig1, ax1 = plt.subplots()

(mp+1).cumprod().plot(label='Gross returns')
(mp_log+1).cumprod().plot(label='Log-returns')
ax1.set_yscale('log')
ax1.get_yaxis().set_major_formatter(matplotlib.ticker.ScalarFormatter())
ax1.grid()
ax1.set_title('Market performance')
ax1.set_ylabel('Price')
ax1.set_xlabel('Date')
ax1.legend()

Out[6]:

<matplotlib.legend.Legend at 0x7f1b16699970>

The difference in terms of final wealth is huge.

In [7]:

# final wealth of my dollar 
(mp+1).cumprod()[-1]

Out[7]:

8531.461037022278

In [8]:

# final wealth of my dollar using the log-approximation
(mp_log+1).cumprod()[-1]

Out[8]:

1593.325161370631

It is possible to rearrange $P_t = P_0 \cdot\prod\limits_{i=1}^{t}(1+r_i)$ to $P_t = P_0 \cdot e^{\sum_{i=1}^t r_i^l}$.

In [9]:

# correct final wealth using log returns
np.exp(mp_log.cumsum())[-1]

Out[9]:

8531.461037022278

Conclusion: Do not compound log-returns multiplicatively.

Optional Exercise: Things become worse if returns are more volatile. Show this by comparing a more volatile with a less volatile asset over the same time period or by examining final wealth differences as a function of volatility in simulated data.