Fama Macbeth Module

The Fama Macbeth module contains parallel, compiled implementations of Fama Macbeth regression functions.

fama_macbeth

finance_byu.fama_macbeth.fama_macbeth_master(data, t, yvar, xvar, intercept=True)

Master function for Fama Macbeth regressions which uses the best implementation based on the size of the data provided.

Parameters:
data: pandas.core.frame.DataFrame

A dataframe with regressand, regressors, and time variable for Fama Macbeth regression. This dataframe must have strictly numeric types with the exception of the time variable which may be a datetime64[ns] type.

t: str

The name of the time variable in data. Note that t must be either a numeric type or datetime64[ns] type (i.e. via pd.to_datetime()).

yvar: str

The name of the regressand variable in data.

xvar: list(str)

A list of the names of the regressor variables in data.

intercept: bool

Whether or not to regress with an intercept.

Returns:
A pandas DataFrame which contains regression coefficients for each time period in data.
finance_byu.fama_macbeth.fama_macbeth_numba(data, t, yvar, xvar, intercept=True, parallel=False)

Fama Macbeth regression implementation for small data sets using compiled machine code.

Parameters:
data: pandas.core.frame.DataFrame

A dataframe with regressand, regressors, and time variable for Fama Macbeth regression. This dataframe must have strictly numeric types with the exception of the time variable which may be a datetime64[ns] type.

t: str

The name of the time variable in data. Note that t must be either a numeric type or datetime64[ns] type (i.e. via pd.to_datetime()).

yvar: str

The name of the regressand variable in data.

xvar: list(str)

A list of the names of the regressor variables in data.

intercept: bool

Whether or not to regress with an intercept.

parallel: bool

Whether or not to use a parallel numba implementation.

Returns:
A pandas DataFrame which contains regression coefficients for each time period in data.
finance_byu.fama_macbeth.fama_macbeth_parallel(data, t, yvar, xvar, intercept=True, backend='loky')

Fama Macbeth regression implementation using pandas groupby for grouping, linear algebra routines compiled with numba for regressions, and joblib for parallelization. Jobs are pre-dispatched to each core for performance.

Parameters:
data: pandas.core.frame.DataFrame

A dataframe with regressand, regressors, and time variable for Fama Macbeth regression. This dataframe must have strictly numeric types with the exception of the time variable which may be a datetime64[ns] type.

t: str

The name of the time variable in data. Note that t must be either a numeric type or datetime64[ns] type (i.e. via pd.to_datetime()).

yvar: str

The name of the regressand variable in data.

xvar: list(str)

A list of the names of the regressor variables in data.

intercept: bool

Whether or not to regress with an intercept.

backend: {‘loky’,’multiprocessing’,’threading’}

The joblib backend to use for parallel processing. ‘loky’ is used by default and is recommended.

Returns:
A pandas DataFrame which contains regression coefficients for each time period in data.
finance_byu.fama_macbeth.fm_summary(p, pvalues=False)

Summary function for Fama Macbeth regression results.

Parameters:
p: pandas.core.frame.DataFrame

A DataFrame object returned by a Fama Macbeth regression function in this library.

pvalues: Boolean

Whether or not to include p-values in the summary table.

Returns:
A summary DataFrame with Fama Macbeth standard errors, mean coefficients, t-statistics, and p-values.
finance_byu.fama_macbeth.fama_macbeth(data, t, yvar, xvar, intercept=True)

Basic Fama Macbeth regression implementation with regressions performed by numpy linear algebra routines and grouping performed by pandas groupby functionality.

Parameters:
data: pandas.core.frame.DataFrame

A dataframe with regressand, regressors, and time variable for Fama Macbeth regression.

t: str

The name of the time variable in data.

yvar: str

The name of the regressand variable in data.

xvar: list(str)

A list of the names of the regressor variables in data.

intercept: bool

Whether or not to regress with an intercept.

Returns:
A pandas DataFrame which contains regression coefficients for each time period in data.

Examples

>>> from finance_byu.fama_macbeth import fama_macbeth, fama_macbeth_parallel, fm_summary, fama_macbeth_numba
>>> import pandas as pd
>>> import time
>>> import numpy as np
>>>
>>> n_jobs = 5
>>> n_firms = 1.0e2
>>> n_periods = 1.0e2
>>>
>>> def firm(fid):
>>>     f = np.random.random((int(n_periods),4))
>>>     f = pd.DataFrame(f)
>>>     f['period'] = f.index
>>>     f['firmid'] = fid
>>>     return f
>>> df = [firm(i) for i in range(int(n_firms))]
>>> df = pd.concat(df).rename(columns={0:'ret',1:'exmkt',2:'smb',3:'hml'})
>>> df.head()

        ret     exmkt       smb       hml  period  firmid
0  0.766593  0.002390  0.496230  0.992345       0       0
1  0.346250  0.509880  0.083644  0.732374       1       0
2  0.787731  0.204211  0.705075  0.313182       2       0
3  0.904969  0.338722  0.437298  0.669285       3       0
4  0.121908  0.827623  0.319610  0.455530       4       0

The following uses the non-parallel implementation using numpy linear algebra libraries.

>>> result = fama_macbeth(df,'period','ret',['exmkt','smb','hml'],intercept=True)
>>> result.head()

        intercept     exmkt       smb       hml
period
0        0.655784 -0.160938 -0.109336  0.028015
1        0.455177  0.033941  0.085344  0.013814
2        0.410705 -0.084130  0.218568  0.016897
3        0.410537  0.010719  0.208912  0.001029
4        0.439061  0.046104 -0.084381  0.199775

The summary function produces the summary of the results.

>>> fm_summary(result)

               mean  std_error      tstat
intercept  0.506834   0.008793  57.643021
exmkt      0.004750   0.009828   0.483269
smb       -0.012702   0.010842  -1.171530
hml        0.004276   0.010530   0.406119

Speed

Smaller data sets

For smaller data sets, the numba implementation is fastest.

>>> print('Number of firms: {} \nNumber of periods: {}'.format(int(n_firms),int(n_periods)))

Number of firms: 100
Number of periods: 100

>>> %timeit fama_macbeth(df,'period','ret',['exmkt','smb','hml'],intercept=True)

123 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 10 loops each

>>> %timeit fama_macbeth_parallel(df,'period','ret',['exmkt','smb','hml'],intercept=True,n_jobs=n_jobs,memmap=False)

74.5 ms ± 145 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

>>> %timeit fama_macbeth_numba(df,'period','ret',['exmkt','smb','hml'],intercept=True)

5.04 ms ± 5.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Many periods, few firms

The numba implementation begins to break down as the data size increases. Here it outperforms the simple implementation for a large number of periods but few firms; however, the parallel implementation still outperforms both as expected with the large number of regressions.

>>> n_firms = 1.0e2
>>> n_periods = 5.0e3
>>> df = [firm(i) for i in range(int(n_firms))]
>>> df = pd.concat(df).rename(columns={0:'ret',1:'exmkt',2:'smb',3:'hml'})

>>> %timeit fama_macbeth(df,'period','ret',['exmkt','smb','hml'],intercept=True)

6.14 s ± 18.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

>>> %timeit fama_macbeth_parallel(df,'period','ret',['exmkt','smb','hml'],intercept=True,n_jobs=n_jobs,memmap=False)

1.04 s ± 14.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

>>> %timeit fama_macbeth_numba(df,'period','ret',['exmkt','smb','hml'],intercept=True)

4.6 s ± 4.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Many firms, few periods

With a large number of firms and few periods, the parallelization outperforms again while numba and simple implementation have similar speeds.

>>> n_firms = 5.0e3
>>> n_periods = 1.0e2
>>> df = [firm(i) for i in range(int(n_firms))]
>>> df = pd.concat(df).rename(columns={0:'ret',1:'exmkt',2:'smb',3:'hml'})

>>> %timeit fama_macbeth(df,'period','ret',['exmkt','smb','hml'],intercept=True)

165 ms ± 191 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

>>> %timeit fama_macbeth_parallel(df,'period','ret',['exmkt','smb','hml'],intercept=True,n_jobs=n_jobs,memmap=False)

76.9 ms ± 1.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

>>> %timeit fama_macbeth_numba(df,'period','ret',['exmkt','smb','hml'],intercept=True)

175 ms ± 680 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Large data sets

In larger data sets, the numba implementation underperforms very severely while the parallel implementation cuts time in half. While the numba implementation is very powerful in smaller data sets, it is important to switch to parallel with larger data sets (i.e. by using the fama_macbeth_master function which always uses the correct implementation based on provided data).

>>> n_firms = 5.0e3
>>> n_periods = 5.0e3
>>> df = [firm(i) for i in range(int(n_firms))]
>>> df = pd.concat(df).rename(columns={0:'ret',1:'exmkt',2:'smb',3:'hml'})

>>> %timeit fama_macbeth(df,'period','ret',['exmkt','smb','hml'],intercept=True)

8.58 s ± 7.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

>>> %timeit fama_macbeth_parallel(df,'period','ret',['exmkt','smb','hml'],intercept=True,n_jobs=n_jobs,memmap=False)

4.18 s ± 46.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

>>> %timeit fama_macbeth_numba(df,'period','ret',['exmkt','smb','hml'],intercept=True)

4min 5s ± 63.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)