Rolling Module ================ The rolling module contains compiled rolling functions for :code:`pandas` DataFrames which function better for our purposes than the built-in :code:`pandas` rolling functions. The rolling module also contains a rolling multiple regression function that employs parallel processing and :code:`numba` compiled routines for speed. :mod:`rolling` ----------------------- .. autofunction:: finance_byu.rolling.roll_sum .. autofunction:: finance_byu.rolling.roll_mean .. autofunction:: finance_byu.rolling.roll_var .. autofunction:: finance_byu.rolling.roll_std .. autofunction:: finance_byu.rolling.roll_cov .. autofunction:: finance_byu.rolling.roll_idio .. autofunction:: finance_byu.rolling.roll_beta .. autofunction:: finance_byu.rolling.rolling_multiple Basic Examples --------------- Below are some basic usage examples without a :code:`groupby`. :: >>> import pandas as pd >>> import finance_byu.rolling as rolling >>> import numpy as np >>> >>> n_periods = 1.0e2 >>> >>> df = pd.DataFrame(np.random.random((int(n_periods),2)) >>> df = df.rename(columns={0:'ret',1:'exmkt'}) >>> df['roll'] = rolling.roll_mean(df['ret'],5,5) >>> df.head(10) ret exmkt roll 0 0.149535 0.644943 NaN 1 0.024654 0.624619 NaN 2 0.083370 0.025087 NaN 3 0.532949 0.736360 NaN 4 0.101531 0.400754 0.178408 5 0.819424 0.215954 0.312386 6 0.419873 0.728983 0.391429 7 0.552381 0.160935 0.485232 8 0.634769 0.743071 0.505596 9 0.730326 0.246545 0.631355 >>> df['rollvar'] = rolling.roll_var(df['ret'],5,5,ddof=1) >>> df.head(10) ret exmkt roll rollvar 0 0.149535 0.644943 NaN NaN 1 0.024654 0.624619 NaN NaN 2 0.083370 0.025087 NaN NaN 3 0.532949 0.736360 NaN NaN 4 0.101531 0.400754 0.178408 0.041279 5 0.819424 0.215954 0.312386 0.121358 6 0.419873 0.728983 0.391429 0.095739 7 0.552381 0.160935 0.485232 0.067492 8 0.634769 0.743071 0.505596 0.071995 9 0.730326 0.246545 0.631355 0.024035 >>> df = df.drop(['roll','rollvar'],axis=1) >>> df['rollcov'] = rolling.roll_cov(df['ret'],df['exmkt'],5,5,ddof=1) ret exmkt rollcov 0 0.149535 0.644943 NaN 1 0.024654 0.624619 NaN 2 0.083370 0.025087 NaN 3 0.532949 0.736360 NaN 4 0.101531 0.400754 0.028305 5 0.819424 0.215954 0.000486 6 0.419873 0.728983 0.023366 7 0.552381 0.160935 -0.020825 8 0.634769 0.743071 -0.013283 9 0.730326 0.246545 -0.024831 Examples with Grouping ----------------------- Below are some examples for rolling function usage with :code:`pandas` :code:`groupby` functionality. :: >>> df = pd.DataFrame(np.random.random((100,3)),columns=['a','b','c']).sort_values('a') >>> df.head(10) a b c 0 0.000421 0.328225 0.595473 1 0.039568 0.002372 0.223387 2 0.041261 0.826214 0.684885 3 0.059252 0.234307 0.412450 4 0.077423 0.616780 0.027450 5 0.082915 0.489654 0.596222 6 0.090510 0.981726 0.519077 7 0.102022 0.384198 0.939078 8 0.123865 0.475949 0.890815 9 0.159163 0.169004 0.139885 >>> df['ports'] = pd.qcut(df['a'],5,labels=False) >>> df.head(30) a b c ports 0 0.000421 0.328225 0.595473 0 1 0.039568 0.002372 0.223387 0 2 0.041261 0.826214 0.684885 0 3 0.059252 0.234307 0.412450 0 4 0.077423 0.616780 0.027450 0 5 0.082915 0.489654 0.596222 0 6 0.090510 0.981726 0.519077 0 7 0.102022 0.384198 0.939078 0 8 0.123865 0.475949 0.890815 0 9 0.159163 0.169004 0.139885 0 10 0.182324 0.114017 0.098002 0 11 0.184595 0.712363 0.850956 0 12 0.189484 0.482832 0.568143 0 13 0.194572 0.822320 0.471494 0 14 0.200897 0.091733 0.581896 0 15 0.211877 0.613734 0.445444 0 16 0.228115 0.863478 0.928822 0 17 0.229405 0.070245 0.667584 0 18 0.247503 0.816117 0.479351 0 19 0.248632 0.698228 0.028725 0 20 0.250610 0.597579 0.595263 1 21 0.271620 0.112743 0.480844 1 22 0.285232 0.946583 0.227774 1 23 0.287593 0.354288 0.333730 1 24 0.302387 0.145458 0.117342 1 25 0.311219 0.283440 0.828860 1 26 0.312818 0.953281 0.393665 1 27 0.316154 0.113413 0.270970 1 28 0.323806 0.596317 0.951102 1 29 0.354629 0.834722 0.179076 1 >>> df['grouped_rolling_sum'] = df.groupby('ports')['c'].transform(lambda x: rolling.roll_sum(x,5,5)) >>> df.sort_values(['ports','a']).head(10) a b c ports grouped_rolling_sum 0 0.000421 0.328225 0.595473 0 NaN 1 0.039568 0.002372 0.223387 0 NaN 2 0.041261 0.826214 0.684885 0 NaN 3 0.059252 0.234307 0.412450 0 NaN 4 0.077423 0.616780 0.027450 0 1.943644 5 0.082915 0.489654 0.596222 0 1.944394 6 0.090510 0.981726 0.519077 0 2.240084 7 0.102022 0.384198 0.939078 0 2.494277 8 0.123865 0.475949 0.890815 0 2.972642 9 0.159163 0.169004 0.139885 0 3.085077 >>> df['grouped_rolling_beta'] = df.groupby('ports')[['b','c']].apply(lambda x: rolling.roll_beta(x['b'],x['c'],5,5,ddof=1)) >>> df.head(10) a b ... grouped_rolling_sum grouped_rolling_beta 0 0.000421 0.328225 ... NaN NaN 1 0.039568 0.002372 ... NaN NaN 2 0.041261 0.826214 ... NaN NaN 3 0.059252 0.234307 ... NaN NaN 4 0.077423 0.616780 ... 1.943644 0.328457 5 0.082915 0.489654 ... 1.944394 0.443657 6 0.090510 0.981726 ... 2.240084 0.269092 7 0.102022 0.384198 ... 2.494277 -0.171534 8 0.123865 0.475949 ... 2.972642 -0.280294 9 0.159163 0.169004 ... 3.085077 0.161115 Speed ------ Below are a few example speed comparisons with the built-in :code:`pandas` rolling functionality. :: >>> %timeit df['rolling_sum'] = df['c'].rolling(5).sum().reset_index(drop=True) 671 µs ± 69.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) >>> %timeit df['rolling_sum'] = rolling.roll_sum(df['c'],5,5) 297 µs ± 10.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) >>> %timeit df['grouped_rolling_sum'] = df.groupby('ports')['c'].rolling(5).sum().reset_index(drop=True) 5.39 ms ± 159 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) >>> %timeit df['grouped_rolling_sum'] = df.groupby('ports')['c'].apply(lambda x: rolling.roll_sum(x,5,5)) 3.34 ms ± 166 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) >>> %timeit df['grouped_rolling_mean'] = df.groupby('ports')['c'].rolling(5).mean().reset_index(drop=True) 5.17 ms ± 190 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) >>> %timeit df['grouped_rolling_mean'] = df.groupby('ports')['c'].apply(lambda x: rolling.roll_mean(x,5,5)) 3.01 ms ± 386 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) Rolling Multiple Regression ---------------------------- Here are a couple examples with random data for the use of the :code:`rolling_multiple` method. :: >>> from finance_byu.rolling import rolling_multiple >>> df = pd.DataFrame(np.random.random((100,4))) >>> df.columns = ['y','x1','x2','x3'] >>> coeff = rolling_multiple(df,'y',['x1','x2','x3'],10) >>> coeff.head(20) intercept x1 x2 x3 0 NaN NaN NaN NaN 1 NaN NaN NaN NaN 2 NaN NaN NaN NaN 3 NaN NaN NaN NaN 4 NaN NaN NaN NaN 5 NaN NaN NaN NaN 6 NaN NaN NaN NaN 7 NaN NaN NaN NaN 8 NaN NaN NaN NaN 9 0.252484 0.038459 0.003861 0.490564 10 0.299362 -0.020591 0.145126 0.368189 11 0.323561 -0.067829 0.237643 0.311882 12 0.329602 0.069779 0.266711 0.130818 13 0.786978 0.107650 0.273981 -0.484018 14 0.834109 0.139822 0.256740 -0.543779 15 0.937403 0.302884 0.216420 -0.774488 16 0.288141 0.101312 0.386460 0.005133 17 0.175423 -0.180466 0.646475 0.056263 18 0.012849 0.698391 0.531258 0.187829 19 0.039563 0.795769 0.378595 0.165630 >>> withresiduals = rolling_multiple(df,'y',['x1','x2','x3'],10,residuals=True) >>> withresiduals.head(20) intercept x1 x2 x3 resid 0 NaN NaN NaN NaN NaN 1 NaN NaN NaN NaN NaN 2 NaN NaN NaN NaN NaN 3 NaN NaN NaN NaN NaN 4 NaN NaN NaN NaN NaN 5 NaN NaN NaN NaN NaN 6 NaN NaN NaN NaN NaN 7 NaN NaN NaN NaN NaN 8 NaN NaN NaN NaN NaN 9 0.252484 0.038459 0.003861 0.490564 -0.909839 10 0.299362 -0.020591 0.145126 0.368189 -1.262891 11 0.323561 -0.067829 0.237643 0.311882 -0.926025 12 0.329602 0.069779 0.266711 0.130818 -1.365205 13 0.786978 0.107650 0.273981 -0.484018 -1.027773 14 0.834109 0.139822 0.256740 -0.543779 -1.044181 15 0.937403 0.302884 0.216420 -0.774488 -0.875767 16 0.288141 0.101312 0.386460 0.005133 -1.374425 17 0.175423 -0.180466 0.646475 0.056263 -0.739793 18 0.012849 0.698391 0.531258 0.187829 -0.678277 19 0.039563 0.795769 0.378595 0.165630 -1.312702 >>> df['groups'] = [1 for i in range(33)]+[2 for i in range(33)]+[3 for i in range(34)] >>> grouped = df.groupby('groups').apply(lambda x: rolling_multiple(x,'y',['x1','x2','x3'],10)).reset_index(drop=True) >>> grouped.head(50) intercept x1 x2 x3 0 NaN NaN NaN NaN 1 NaN NaN NaN NaN 2 NaN NaN NaN NaN 3 NaN NaN NaN NaN 4 NaN NaN NaN NaN 5 NaN NaN NaN NaN 6 NaN NaN NaN NaN 7 NaN NaN NaN NaN 8 NaN NaN NaN NaN 9 0.188011 0.172650 -0.036465 0.201428 10 0.127681 0.530099 -0.256984 0.447311 11 0.211469 0.585754 -0.356648 0.407959 12 0.309164 0.466337 -0.486869 0.486394 13 0.336976 0.541512 -0.683834 0.616734 14 0.242381 0.762214 -0.718999 0.539597 15 0.534390 0.592484 -0.882821 0.436392 16 0.475962 0.516497 -0.833246 0.695213 17 0.445435 0.300559 -0.683455 0.858445 18 0.424248 0.233874 -0.676784 0.879286 19 0.528965 -0.061176 -0.462627 0.834017 20 0.793513 -0.285642 -0.570421 0.456161 21 0.771210 -0.472012 -0.427634 0.489903 22 0.690572 -0.394875 -0.350176 0.697332 23 0.652243 -0.487154 -0.253501 0.716340 24 0.758479 -0.765630 -0.202638 0.606289 25 0.929314 -0.889273 -0.357062 0.522449 26 0.875805 -0.869253 -0.272132 0.482170 27 0.527850 -0.532585 0.178658 0.156535 28 0.592946 -0.613963 0.239219 0.101630 29 0.670114 -0.794297 0.576524 -0.187906 30 0.580165 -0.608576 0.466616 -0.258936 31 0.517314 -0.609204 0.438855 -0.067928 32 0.370829 -0.338844 0.511657 -0.235225 33 NaN NaN NaN NaN 34 NaN NaN NaN NaN 35 NaN NaN NaN NaN 36 NaN NaN NaN NaN 37 NaN NaN NaN NaN 38 NaN NaN NaN NaN 39 NaN NaN NaN NaN 40 NaN NaN NaN NaN 41 NaN NaN NaN NaN 42 0.952411 0.355941 -0.646408 -0.469534 43 0.993868 0.143298 -0.498905 -0.381196 44 0.921548 0.157348 -0.543432 -0.169947 45 1.023682 -0.092272 -0.521587 -0.093258 46 0.802979 -0.343775 -0.115053 0.043999 47 0.654425 -0.186637 -0.095179 0.112743 48 0.747796 -0.250212 -0.252141 0.067891 49 0.784292 -0.308890 -0.303901 0.072922 >>> df['resid'] = df.groupby('groups').apply(lambda x: rolling_multiple(x,'y',['x1','x2','x3'],10,residuals=True)).reset_index(drop=True)['resid'] >>> df.head(50) y x1 x2 x3 groups resid 0 0.292707 0.527416 0.352366 0.942830 1 NaN 1 0.009256 0.056392 0.577220 0.316502 1 NaN 2 0.254997 0.303502 0.312090 0.450186 1 NaN 3 0.630078 0.485770 0.652494 0.219974 1 NaN 4 0.806714 0.117059 0.544942 0.752988 1 NaN 5 0.259738 0.332130 0.041728 0.038601 1 NaN 6 0.186582 0.601384 0.903792 0.943630 1 NaN 7 0.058661 0.537003 0.749255 0.333435 1 NaN 8 0.693882 0.829473 0.949468 0.602170 1 NaN 9 0.107025 0.314554 0.881373 0.008589 1 -1.104885 10 0.985555 0.898102 0.889376 0.836676 1 -0.763909 11 0.844293 0.883783 0.097372 0.114230 1 -0.896729 12 0.110527 0.617211 0.714501 0.467282 1 -1.365879 13 0.991384 0.615916 0.331154 0.338925 1 -0.661689 14 0.858861 0.166270 0.221500 0.783701 1 -0.773878 15 0.679131 0.070385 0.024244 0.108567 1 -0.922935 16 0.083172 0.633883 0.981583 0.209229 1 -1.047747 17 0.105980 0.755695 0.346644 0.005208 1 -1.334142 18 0.169534 0.580690 0.323765 0.146509 1 -1.300227 19 0.602168 0.027302 0.991393 0.273679 1 -0.694735 20 0.824959 0.517080 0.378158 0.273586 1 -0.729945 21 0.284846 0.843899 0.496191 0.116380 1 -0.932860 22 0.808722 0.096899 0.268888 0.228202 1 -0.908562 23 0.885483 0.522229 0.584660 0.652320 1 -0.831425 24 0.020493 0.806937 0.321778 0.547365 1 -1.386828 25 0.127404 0.705129 0.991105 0.478141 1 -1.070777 26 0.296545 0.565533 0.330202 0.286323 1 -1.135866 27 0.022179 0.223031 0.029895 0.350958 1 -1.447167 28 0.971971 0.092243 0.829478 0.411799 1 -0.804620 29 0.732190 0.646287 0.889335 0.333274 1 -0.874680 30 0.412803 0.168067 0.896444 0.563714 1 -1.337410 31 0.162741 0.939633 0.546790 0.941365 1 -0.958161 32 0.325315 0.259795 0.818819 0.846654 1 -1.177284 33 0.154722 0.702591 0.791131 0.961928 2 NaN 34 0.491407 0.518855 0.231681 0.955700 2 NaN 35 0.159072 0.036437 0.620747 0.555733 2 NaN 36 0.891759 0.617571 0.187414 0.401648 2 NaN 37 0.481304 0.750031 0.900041 0.354723 2 NaN 38 0.669076 0.110193 0.764837 0.087526 2 NaN 39 0.522653 0.313408 0.491214 0.584310 2 NaN 40 0.576749 0.762469 0.646008 0.033188 2 NaN 41 0.755683 0.401390 0.045780 0.468417 2 NaN 42 0.939899 0.806682 0.387371 0.694695 2 -0.723060 43 0.921698 0.033224 0.613996 0.359314 2 -0.633637 44 0.365582 0.376107 0.799801 0.158204 2 -1.153621 45 0.401404 0.592845 0.700801 0.967785 2 -1.111793 46 0.225042 0.907086 0.020412 0.483045 2 -1.285008 47 0.325359 0.003486 0.641433 0.148209 2 -1.284073 48 0.239367 0.753501 0.730812 0.733136 2 -1.185399 49 0.221450 0.631365 0.604590 0.536159 2 -1.223183 Rolling Multiple Regression Speed ---------------------------------- Here are some timings for the :code:`rolling_multiple` method. Scaling with number of observations :: >>> def produce_data(nobs,nx): >>> return pd.DataFrame(np.random.random((nobs,nx+1))) >>> df = produce_data(100,3) >>> %timeit rolling_multiple(df,0,[1,2,3],10) 24.2 ms ± 440 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) >>> df = produce_data(1000,3) >>> %timeit rolling_multiple(df,0,[1,2,3],120) 184 ms ± 1.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) >>> df = produce_data(int(1.0e4),3) >>> %timeit rolling_multiple(df,0,[1,2,3],120) 2.04 s ± 21.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) >>> df = produce_data(int(1.0e5),3) >>> %timeit rolling_multiple(df,0,[1,2,3],120,predispatch='auto') 18.9 s ± 200 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) >>> df = produce_data(int(1.0e6),3) >>> %timeit -r2 -n1 rolling_multiple(df,0,[1,2,3],120,predispatch='auto') 3min 10s ± 3.18 s per loop (mean ± std. dev. of 2 runs, 1 loop each) Scaling with number of regressors :: >>> df = produce_data(int(1.0e4),3) >>> %timeit rolling_multiple(df,0,[1,2,3],120) 1.94 s ± 31.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) >>> df = produce_data(int(1.0e4),20) >>> %timeit rolling_multiple(df,0,[i for i in range(1,21)],120) 2.31 s ± 24.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) >>> df = produce_data(int(1.0e4),100) >>> %timeit rolling_multiple(df,0,[i for i in range(1,101)],120) 5.71 s ± 1.25 s per loop (mean ± std. dev. of 7 runs, 1 loop each) Scaling with window size :: >>> df = produce_data(int(1.0e4),3) >>> %timeit rolling_multiple(df,0,[1,2,3],10) 1.82 s ± 31.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) >>> df = produce_data(int(1.0e4),3) >>> %timeit rolling_multiple(df,0,[1,2,3],50) 1.88 s ± 28.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) >>> df = produce_data(int(1.0e4),3) >>> %timeit rolling_multiple(df,0,[1,2,3],100) 1.91 s ± 18.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) >>> df = produce_data(int(1.0e4),3) >>> %timeit rolling_multiple(df,0,[1,2,3],500) 2.01 s ± 28.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)