Fama Macbeth Module¶
The Fama Macbeth module contains parallel, compiled implementations of Fama Macbeth regression functions.
fama_macbeth
¶
- finance_byu.fama_macbeth.fama_macbeth_master(data, t, yvar, xvar, intercept=True)¶
Master function for Fama Macbeth regressions which uses the best implementation based on the size of the data provided.
- Parameters:
- data: pandas.core.frame.DataFrame
A dataframe with regressand, regressors, and time variable for Fama Macbeth regression. This dataframe must have strictly numeric types with the exception of the time variable which may be a
datetime64[ns]
type.- t: str
The name of the time variable in
data
. Note that t must be either a numeric type ordatetime64[ns]
type (i.e. via pd.to_datetime()).- yvar: str
The name of the regressand variable in
data
.- xvar: list(str)
A list of the names of the regressor variables in
data
.- intercept: bool
Whether or not to regress with an intercept.
- Returns:
- A pandas DataFrame which contains regression coefficients for each time period in
data
.
- A pandas DataFrame which contains regression coefficients for each time period in
- finance_byu.fama_macbeth.fama_macbeth_numba(data, t, yvar, xvar, intercept=True, parallel=False)¶
Fama Macbeth regression implementation for small data sets using compiled machine code.
- Parameters:
- data: pandas.core.frame.DataFrame
A dataframe with regressand, regressors, and time variable for Fama Macbeth regression. This dataframe must have strictly numeric types with the exception of the time variable which may be a
datetime64[ns]
type.- t: str
The name of the time variable in
data
. Note that t must be either a numeric type ordatetime64[ns]
type (i.e. via pd.to_datetime()).- yvar: str
The name of the regressand variable in
data
.- xvar: list(str)
A list of the names of the regressor variables in
data
.- intercept: bool
Whether or not to regress with an intercept.
- parallel: bool
Whether or not to use a parallel numba implementation.
- Returns:
- A pandas DataFrame which contains regression coefficients for each time period in
data
.
- A pandas DataFrame which contains regression coefficients for each time period in
- finance_byu.fama_macbeth.fama_macbeth_parallel(data, t, yvar, xvar, intercept=True, backend='loky')¶
Fama Macbeth regression implementation using
pandas
groupby for grouping, linear algebra routines compiled withnumba
for regressions, andjoblib
for parallelization. Jobs are pre-dispatched to each core for performance.- Parameters:
- data: pandas.core.frame.DataFrame
A dataframe with regressand, regressors, and time variable for Fama Macbeth regression. This dataframe must have strictly numeric types with the exception of the time variable which may be a
datetime64[ns]
type.- t: str
The name of the time variable in
data
. Note that t must be either a numeric type ordatetime64[ns]
type (i.e. via pd.to_datetime()).- yvar: str
The name of the regressand variable in
data
.- xvar: list(str)
A list of the names of the regressor variables in
data
.- intercept: bool
Whether or not to regress with an intercept.
- backend: {‘loky’,’multiprocessing’,’threading’}
The joblib backend to use for parallel processing. ‘loky’ is used by default and is recommended.
- Returns:
- A pandas DataFrame which contains regression coefficients for each time period in
data
.
- A pandas DataFrame which contains regression coefficients for each time period in
- finance_byu.fama_macbeth.fm_summary(p, pvalues=False)¶
Summary function for Fama Macbeth regression results.
- Parameters:
- p: pandas.core.frame.DataFrame
A DataFrame object returned by a Fama Macbeth regression function in this library.
- pvalues: Boolean
Whether or not to include p-values in the summary table.
- Returns:
- A summary DataFrame with Fama Macbeth standard errors, mean coefficients, t-statistics, and p-values.
- finance_byu.fama_macbeth.fama_macbeth(data, t, yvar, xvar, intercept=True)¶
Basic Fama Macbeth regression implementation with regressions performed by
numpy
linear algebra routines and grouping performed bypandas
groupby functionality.- Parameters:
- data: pandas.core.frame.DataFrame
A dataframe with regressand, regressors, and time variable for Fama Macbeth regression.
- t: str
The name of the time variable in
data
.- yvar: str
The name of the regressand variable in
data
.- xvar: list(str)
A list of the names of the regressor variables in
data
.- intercept: bool
Whether or not to regress with an intercept.
- Returns:
- A pandas DataFrame which contains regression coefficients for each time period in
data
.
- A pandas DataFrame which contains regression coefficients for each time period in
Examples¶
>>> from finance_byu.fama_macbeth import fama_macbeth, fama_macbeth_parallel, fm_summary, fama_macbeth_numba
>>> import pandas as pd
>>> import time
>>> import numpy as np
>>>
>>> n_jobs = 5
>>> n_firms = 1.0e2
>>> n_periods = 1.0e2
>>>
>>> def firm(fid):
>>> f = np.random.random((int(n_periods),4))
>>> f = pd.DataFrame(f)
>>> f['period'] = f.index
>>> f['firmid'] = fid
>>> return f
>>> df = [firm(i) for i in range(int(n_firms))]
>>> df = pd.concat(df).rename(columns={0:'ret',1:'exmkt',2:'smb',3:'hml'})
>>> df.head()
ret exmkt smb hml period firmid
0 0.766593 0.002390 0.496230 0.992345 0 0
1 0.346250 0.509880 0.083644 0.732374 1 0
2 0.787731 0.204211 0.705075 0.313182 2 0
3 0.904969 0.338722 0.437298 0.669285 3 0
4 0.121908 0.827623 0.319610 0.455530 4 0
The following uses the non-parallel implementation using numpy
linear algebra libraries.
>>> result = fama_macbeth(df,'period','ret',['exmkt','smb','hml'],intercept=True)
>>> result.head()
intercept exmkt smb hml
period
0 0.655784 -0.160938 -0.109336 0.028015
1 0.455177 0.033941 0.085344 0.013814
2 0.410705 -0.084130 0.218568 0.016897
3 0.410537 0.010719 0.208912 0.001029
4 0.439061 0.046104 -0.084381 0.199775
The summary function produces the summary of the results.
>>> fm_summary(result)
mean std_error tstat
intercept 0.506834 0.008793 57.643021
exmkt 0.004750 0.009828 0.483269
smb -0.012702 0.010842 -1.171530
hml 0.004276 0.010530 0.406119
Speed¶
Smaller data sets¶
For smaller data sets, the numba
implementation is fastest.
>>> print('Number of firms: {} \nNumber of periods: {}'.format(int(n_firms),int(n_periods)))
Number of firms: 100
Number of periods: 100
>>> %timeit fama_macbeth(df,'period','ret',['exmkt','smb','hml'],intercept=True)
123 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 10 loops each
>>> %timeit fama_macbeth_parallel(df,'period','ret',['exmkt','smb','hml'],intercept=True,n_jobs=n_jobs,memmap=False)
74.5 ms ± 145 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit fama_macbeth_numba(df,'period','ret',['exmkt','smb','hml'],intercept=True)
5.04 ms ± 5.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Many periods, few firms¶
The numba
implementation begins to break down as the data size increases. Here it outperforms the simple implementation for a large number of periods but few firms; however, the parallel implementation still outperforms both as expected with the large number of regressions.
>>> n_firms = 1.0e2
>>> n_periods = 5.0e3
>>> df = [firm(i) for i in range(int(n_firms))]
>>> df = pd.concat(df).rename(columns={0:'ret',1:'exmkt',2:'smb',3:'hml'})
>>> %timeit fama_macbeth(df,'period','ret',['exmkt','smb','hml'],intercept=True)
6.14 s ± 18.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit fama_macbeth_parallel(df,'period','ret',['exmkt','smb','hml'],intercept=True,n_jobs=n_jobs,memmap=False)
1.04 s ± 14.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit fama_macbeth_numba(df,'period','ret',['exmkt','smb','hml'],intercept=True)
4.6 s ± 4.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Many firms, few periods¶
With a large number of firms and few periods, the parallelization outperforms again while numba
and simple implementation have similar speeds.
>>> n_firms = 5.0e3
>>> n_periods = 1.0e2
>>> df = [firm(i) for i in range(int(n_firms))]
>>> df = pd.concat(df).rename(columns={0:'ret',1:'exmkt',2:'smb',3:'hml'})
>>> %timeit fama_macbeth(df,'period','ret',['exmkt','smb','hml'],intercept=True)
165 ms ± 191 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit fama_macbeth_parallel(df,'period','ret',['exmkt','smb','hml'],intercept=True,n_jobs=n_jobs,memmap=False)
76.9 ms ± 1.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit fama_macbeth_numba(df,'period','ret',['exmkt','smb','hml'],intercept=True)
175 ms ± 680 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Large data sets¶
In larger data sets, the numba
implementation underperforms very severely while the parallel implementation cuts time in half. While the numba
implementation is very powerful in smaller data sets, it is important to switch to parallel with larger data sets (i.e. by using the fama_macbeth_master
function which always uses the correct implementation based on provided data).
>>> n_firms = 5.0e3
>>> n_periods = 5.0e3
>>> df = [firm(i) for i in range(int(n_firms))]
>>> df = pd.concat(df).rename(columns={0:'ret',1:'exmkt',2:'smb',3:'hml'})
>>> %timeit fama_macbeth(df,'period','ret',['exmkt','smb','hml'],intercept=True)
8.58 s ± 7.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit fama_macbeth_parallel(df,'period','ret',['exmkt','smb','hml'],intercept=True,n_jobs=n_jobs,memmap=False)
4.18 s ± 46.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit fama_macbeth_numba(df,'period','ret',['exmkt','smb','hml'],intercept=True)
4min 5s ± 63.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)