Fama Macbeth Module =================== The Fama Macbeth module contains parallel, compiled implementations of Fama Macbeth regression functions. :mod:`fama_macbeth` -------------------------- .. autofunction:: finance_byu.fama_macbeth.fama_macbeth_master .. autofunction:: finance_byu.fama_macbeth.fama_macbeth_numba .. autofunction:: finance_byu.fama_macbeth.fama_macbeth_parallel .. autofunction:: finance_byu.fama_macbeth.fm_summary .. autofunction:: finance_byu.fama_macbeth.fama_macbeth Examples --------- :: >>> from finance_byu.fama_macbeth import fama_macbeth, fama_macbeth_parallel, fm_summary, fama_macbeth_numba >>> import pandas as pd >>> import time >>> import numpy as np >>> >>> n_jobs = 5 >>> n_firms = 1.0e2 >>> n_periods = 1.0e2 >>> >>> def firm(fid): >>> f = np.random.random((int(n_periods),4)) >>> f = pd.DataFrame(f) >>> f['period'] = f.index >>> f['firmid'] = fid >>> return f >>> df = [firm(i) for i in range(int(n_firms))] >>> df = pd.concat(df).rename(columns={0:'ret',1:'exmkt',2:'smb',3:'hml'}) >>> df.head() ret exmkt smb hml period firmid 0 0.766593 0.002390 0.496230 0.992345 0 0 1 0.346250 0.509880 0.083644 0.732374 1 0 2 0.787731 0.204211 0.705075 0.313182 2 0 3 0.904969 0.338722 0.437298 0.669285 3 0 4 0.121908 0.827623 0.319610 0.455530 4 0 The following uses the non-parallel implementation using :code:`numpy` linear algebra libraries. :: >>> result = fama_macbeth(df,'period','ret',['exmkt','smb','hml'],intercept=True) >>> result.head() intercept exmkt smb hml period 0 0.655784 -0.160938 -0.109336 0.028015 1 0.455177 0.033941 0.085344 0.013814 2 0.410705 -0.084130 0.218568 0.016897 3 0.410537 0.010719 0.208912 0.001029 4 0.439061 0.046104 -0.084381 0.199775 The summary function produces the summary of the results. :: >>> fm_summary(result) mean std_error tstat intercept 0.506834 0.008793 57.643021 exmkt 0.004750 0.009828 0.483269 smb -0.012702 0.010842 -1.171530 hml 0.004276 0.010530 0.406119 Speed ------ Smaller data sets ################## For smaller data sets, the :code:`numba` implementation is fastest. :: >>> print('Number of firms: {} \nNumber of periods: {}'.format(int(n_firms),int(n_periods))) Number of firms: 100 Number of periods: 100 >>> %timeit fama_macbeth(df,'period','ret',['exmkt','smb','hml'],intercept=True) 123 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 10 loops each >>> %timeit fama_macbeth_parallel(df,'period','ret',['exmkt','smb','hml'],intercept=True,n_jobs=n_jobs,memmap=False) 74.5 ms ± 145 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) >>> %timeit fama_macbeth_numba(df,'period','ret',['exmkt','smb','hml'],intercept=True) 5.04 ms ± 5.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Many periods, few firms ####################### The :code:`numba` implementation begins to break down as the data size increases. Here it outperforms the simple implementation for a large number of periods but few firms; however, the parallel implementation still outperforms both as expected with the large number of regressions. :: >>> n_firms = 1.0e2 >>> n_periods = 5.0e3 >>> df = [firm(i) for i in range(int(n_firms))] >>> df = pd.concat(df).rename(columns={0:'ret',1:'exmkt',2:'smb',3:'hml'}) >>> %timeit fama_macbeth(df,'period','ret',['exmkt','smb','hml'],intercept=True) 6.14 s ± 18.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) >>> %timeit fama_macbeth_parallel(df,'period','ret',['exmkt','smb','hml'],intercept=True,n_jobs=n_jobs,memmap=False) 1.04 s ± 14.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) >>> %timeit fama_macbeth_numba(df,'period','ret',['exmkt','smb','hml'],intercept=True) 4.6 s ± 4.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) Many firms, few periods ######################## With a large number of firms and few periods, the parallelization outperforms again while :code:`numba` and simple implementation have similar speeds. :: >>> n_firms = 5.0e3 >>> n_periods = 1.0e2 >>> df = [firm(i) for i in range(int(n_firms))] >>> df = pd.concat(df).rename(columns={0:'ret',1:'exmkt',2:'smb',3:'hml'}) >>> %timeit fama_macbeth(df,'period','ret',['exmkt','smb','hml'],intercept=True) 165 ms ± 191 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) >>> %timeit fama_macbeth_parallel(df,'period','ret',['exmkt','smb','hml'],intercept=True,n_jobs=n_jobs,memmap=False) 76.9 ms ± 1.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) >>> %timeit fama_macbeth_numba(df,'period','ret',['exmkt','smb','hml'],intercept=True) 175 ms ± 680 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) Large data sets ######################## In larger data sets, the :code:`numba` implementation underperforms very severely while the parallel implementation cuts time in half. While the :code:`numba` implementation is very powerful in smaller data sets, it is important to switch to parallel with larger data sets (i.e. by using the :code:`fama_macbeth_master` function which always uses the correct implementation based on provided data). :: >>> n_firms = 5.0e3 >>> n_periods = 5.0e3 >>> df = [firm(i) for i in range(int(n_firms))] >>> df = pd.concat(df).rename(columns={0:'ret',1:'exmkt',2:'smb',3:'hml'}) >>> %timeit fama_macbeth(df,'period','ret',['exmkt','smb','hml'],intercept=True) 8.58 s ± 7.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) >>> %timeit fama_macbeth_parallel(df,'period','ret',['exmkt','smb','hml'],intercept=True,n_jobs=n_jobs,memmap=False) 4.18 s ± 46.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) >>> %timeit fama_macbeth_numba(df,'period','ret',['exmkt','smb','hml'],intercept=True) 4min 5s ± 63.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)