Rolling Module
================

The rolling module contains compiled rolling functions for :code:`pandas` DataFrames
which function better for our purposes than the built-in :code:`pandas` rolling functions. 
The rolling module also contains a rolling multiple regression function that employs parallel 
processing and :code:`numba` compiled routines for speed.

:mod:`rolling`
-----------------------

.. autofunction:: finance_byu.rolling.roll_sum

.. autofunction:: finance_byu.rolling.roll_mean

.. autofunction:: finance_byu.rolling.roll_var

.. autofunction:: finance_byu.rolling.roll_std

.. autofunction:: finance_byu.rolling.roll_cov

.. autofunction:: finance_byu.rolling.roll_idio

.. autofunction:: finance_byu.rolling.roll_beta

.. autofunction:: finance_byu.rolling.rolling_multiple


Basic Examples
---------------

Below are some basic usage examples without a :code:`groupby`.

::

    >>> import pandas as pd
    >>> import finance_byu.rolling as rolling
    >>> import numpy as np
    >>> 
    >>> n_periods = 1.0e2
    >>> 
    >>> df = pd.DataFrame(np.random.random((int(n_periods),2))
    >>> df = df.rename(columns={0:'ret',1:'exmkt'})
    >>> df['roll'] = rolling.roll_mean(df['ret'],5,5)
    >>> df.head(10)

            ret     exmkt      roll
    0  0.149535  0.644943       NaN
    1  0.024654  0.624619       NaN
    2  0.083370  0.025087       NaN
    3  0.532949  0.736360       NaN
    4  0.101531  0.400754  0.178408
    5  0.819424  0.215954  0.312386
    6  0.419873  0.728983  0.391429
    7  0.552381  0.160935  0.485232
    8  0.634769  0.743071  0.505596
    9  0.730326  0.246545  0.631355
    
    >>> df['rollvar'] = rolling.roll_var(df['ret'],5,5,ddof=1)
    >>> df.head(10)
    
            ret     exmkt      roll   rollvar
    0  0.149535  0.644943       NaN       NaN
    1  0.024654  0.624619       NaN       NaN
    2  0.083370  0.025087       NaN       NaN
    3  0.532949  0.736360       NaN       NaN
    4  0.101531  0.400754  0.178408  0.041279
    5  0.819424  0.215954  0.312386  0.121358
    6  0.419873  0.728983  0.391429  0.095739
    7  0.552381  0.160935  0.485232  0.067492
    8  0.634769  0.743071  0.505596  0.071995
    9  0.730326  0.246545  0.631355  0.024035
    
    >>> df = df.drop(['roll','rollvar'],axis=1)
    >>> df['rollcov'] = rolling.roll_cov(df['ret'],df['exmkt'],5,5,ddof=1)
    
            ret     exmkt   rollcov
    0  0.149535  0.644943       NaN
    1  0.024654  0.624619       NaN
    2  0.083370  0.025087       NaN
    3  0.532949  0.736360       NaN
    4  0.101531  0.400754  0.028305
    5  0.819424  0.215954  0.000486
    6  0.419873  0.728983  0.023366
    7  0.552381  0.160935 -0.020825
    8  0.634769  0.743071 -0.013283
    9  0.730326  0.246545 -0.024831
    
    
Examples with Grouping
-----------------------

Below are some examples for rolling function usage with :code:`pandas` :code:`groupby` functionality.

::

    >>> df = pd.DataFrame(np.random.random((100,3)),columns=['a','b','c']).sort_values('a')
    >>> df.head(10)

              a         b         c
    0  0.000421  0.328225  0.595473
    1  0.039568  0.002372  0.223387
    2  0.041261  0.826214  0.684885
    3  0.059252  0.234307  0.412450
    4  0.077423  0.616780  0.027450
    5  0.082915  0.489654  0.596222
    6  0.090510  0.981726  0.519077
    7  0.102022  0.384198  0.939078
    8  0.123865  0.475949  0.890815
    9  0.159163  0.169004  0.139885

    >>> df['ports'] = pd.qcut(df['a'],5,labels=False)
    >>> df.head(30)

               a         b         c  ports
    0   0.000421  0.328225  0.595473      0
    1   0.039568  0.002372  0.223387      0
    2   0.041261  0.826214  0.684885      0
    3   0.059252  0.234307  0.412450      0
    4   0.077423  0.616780  0.027450      0
    5   0.082915  0.489654  0.596222      0
    6   0.090510  0.981726  0.519077      0
    7   0.102022  0.384198  0.939078      0
    8   0.123865  0.475949  0.890815      0
    9   0.159163  0.169004  0.139885      0
    10  0.182324  0.114017  0.098002      0
    11  0.184595  0.712363  0.850956      0
    12  0.189484  0.482832  0.568143      0
    13  0.194572  0.822320  0.471494      0
    14  0.200897  0.091733  0.581896      0
    15  0.211877  0.613734  0.445444      0
    16  0.228115  0.863478  0.928822      0
    17  0.229405  0.070245  0.667584      0
    18  0.247503  0.816117  0.479351      0
    19  0.248632  0.698228  0.028725      0
    20  0.250610  0.597579  0.595263      1
    21  0.271620  0.112743  0.480844      1
    22  0.285232  0.946583  0.227774      1
    23  0.287593  0.354288  0.333730      1
    24  0.302387  0.145458  0.117342      1
    25  0.311219  0.283440  0.828860      1
    26  0.312818  0.953281  0.393665      1
    27  0.316154  0.113413  0.270970      1
    28  0.323806  0.596317  0.951102      1
    29  0.354629  0.834722  0.179076      1

    >>> df['grouped_rolling_sum'] = df.groupby('ports')['c'].transform(lambda x: rolling.roll_sum(x,5,5))
    >>> df.sort_values(['ports','a']).head(10)

              a         b         c  ports  grouped_rolling_sum
    0  0.000421  0.328225  0.595473      0                  NaN
    1  0.039568  0.002372  0.223387      0                  NaN
    2  0.041261  0.826214  0.684885      0                  NaN
    3  0.059252  0.234307  0.412450      0                  NaN
    4  0.077423  0.616780  0.027450      0             1.943644
    5  0.082915  0.489654  0.596222      0             1.944394
    6  0.090510  0.981726  0.519077      0             2.240084
    7  0.102022  0.384198  0.939078      0             2.494277
    8  0.123865  0.475949  0.890815      0             2.972642
    9  0.159163  0.169004  0.139885      0             3.085077

    >>> df['grouped_rolling_beta'] = df.groupby('ports')[['b','c']].apply(lambda x: rolling.roll_beta(x['b'],x['c'],5,5,ddof=1))
    >>> df.head(10)

              a         b  ...  grouped_rolling_sum  grouped_rolling_beta
    0  0.000421  0.328225  ...                  NaN                   NaN
    1  0.039568  0.002372  ...                  NaN                   NaN
    2  0.041261  0.826214  ...                  NaN                   NaN
    3  0.059252  0.234307  ...                  NaN                   NaN
    4  0.077423  0.616780  ...             1.943644              0.328457
    5  0.082915  0.489654  ...             1.944394              0.443657
    6  0.090510  0.981726  ...             2.240084              0.269092
    7  0.102022  0.384198  ...             2.494277             -0.171534
    8  0.123865  0.475949  ...             2.972642             -0.280294
    9  0.159163  0.169004  ...             3.085077              0.161115

Speed
------

Below are a few example speed comparisons with the built-in :code:`pandas` rolling functionality. 

::

    >>> %timeit df['rolling_sum'] = df['c'].rolling(5).sum().reset_index(drop=True)
    671 µs ± 69.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)    

    >>> %timeit df['rolling_sum'] = rolling.roll_sum(df['c'],5,5)    
    297 µs ± 10.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

    >>> %timeit df['grouped_rolling_sum'] = df.groupby('ports')['c'].rolling(5).sum().reset_index(drop=True)
    5.39 ms ± 159 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

    >>> %timeit df['grouped_rolling_sum'] = df.groupby('ports')['c'].apply(lambda x: rolling.roll_sum(x,5,5))
    3.34 ms ± 166 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

    >>> %timeit df['grouped_rolling_mean'] = df.groupby('ports')['c'].rolling(5).mean().reset_index(drop=True)
    5.17 ms ± 190 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

    >>> %timeit df['grouped_rolling_mean'] = df.groupby('ports')['c'].apply(lambda x: rolling.roll_mean(x,5,5))
    3.01 ms ± 386 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


Rolling Multiple Regression
----------------------------

Here are a couple examples with random data for the use of the :code:`rolling_multiple` method.

::

    >>> from finance_byu.rolling import rolling_multiple
    >>> df = pd.DataFrame(np.random.random((100,4)))
    >>> df.columns = ['y','x1','x2','x3']
    >>> coeff = rolling_multiple(df,'y',['x1','x2','x3'],10)
    >>> coeff.head(20)
    
        intercept        x1        x2        x3
    0         NaN       NaN       NaN       NaN
    1         NaN       NaN       NaN       NaN
    2         NaN       NaN       NaN       NaN
    3         NaN       NaN       NaN       NaN
    4         NaN       NaN       NaN       NaN
    5         NaN       NaN       NaN       NaN
    6         NaN       NaN       NaN       NaN
    7         NaN       NaN       NaN       NaN
    8         NaN       NaN       NaN       NaN
    9    0.252484  0.038459  0.003861  0.490564
    10   0.299362 -0.020591  0.145126  0.368189
    11   0.323561 -0.067829  0.237643  0.311882
    12   0.329602  0.069779  0.266711  0.130818
    13   0.786978  0.107650  0.273981 -0.484018
    14   0.834109  0.139822  0.256740 -0.543779
    15   0.937403  0.302884  0.216420 -0.774488
    16   0.288141  0.101312  0.386460  0.005133
    17   0.175423 -0.180466  0.646475  0.056263
    18   0.012849  0.698391  0.531258  0.187829
    19   0.039563  0.795769  0.378595  0.165630
    
    >>> withresiduals = rolling_multiple(df,'y',['x1','x2','x3'],10,residuals=True)
    >>> withresiduals.head(20)
    
        intercept        x1        x2        x3     resid
    0         NaN       NaN       NaN       NaN       NaN
    1         NaN       NaN       NaN       NaN       NaN
    2         NaN       NaN       NaN       NaN       NaN
    3         NaN       NaN       NaN       NaN       NaN
    4         NaN       NaN       NaN       NaN       NaN
    5         NaN       NaN       NaN       NaN       NaN
    6         NaN       NaN       NaN       NaN       NaN
    7         NaN       NaN       NaN       NaN       NaN
    8         NaN       NaN       NaN       NaN       NaN
    9    0.252484  0.038459  0.003861  0.490564 -0.909839
    10   0.299362 -0.020591  0.145126  0.368189 -1.262891
    11   0.323561 -0.067829  0.237643  0.311882 -0.926025
    12   0.329602  0.069779  0.266711  0.130818 -1.365205
    13   0.786978  0.107650  0.273981 -0.484018 -1.027773
    14   0.834109  0.139822  0.256740 -0.543779 -1.044181
    15   0.937403  0.302884  0.216420 -0.774488 -0.875767
    16   0.288141  0.101312  0.386460  0.005133 -1.374425
    17   0.175423 -0.180466  0.646475  0.056263 -0.739793
    18   0.012849  0.698391  0.531258  0.187829 -0.678277
    19   0.039563  0.795769  0.378595  0.165630 -1.312702
    
    >>> df['groups'] = [1 for i in range(33)]+[2 for i in range(33)]+[3 for i in range(34)]
    >>> grouped = df.groupby('groups').apply(lambda x: rolling_multiple(x,'y',['x1','x2','x3'],10)).reset_index(drop=True)
    >>> grouped.head(50)
    
        intercept        x1        x2        x3
    0         NaN       NaN       NaN       NaN
    1         NaN       NaN       NaN       NaN
    2         NaN       NaN       NaN       NaN
    3         NaN       NaN       NaN       NaN
    4         NaN       NaN       NaN       NaN
    5         NaN       NaN       NaN       NaN
    6         NaN       NaN       NaN       NaN
    7         NaN       NaN       NaN       NaN
    8         NaN       NaN       NaN       NaN
    9    0.188011  0.172650 -0.036465  0.201428
    10   0.127681  0.530099 -0.256984  0.447311
    11   0.211469  0.585754 -0.356648  0.407959
    12   0.309164  0.466337 -0.486869  0.486394
    13   0.336976  0.541512 -0.683834  0.616734
    14   0.242381  0.762214 -0.718999  0.539597
    15   0.534390  0.592484 -0.882821  0.436392
    16   0.475962  0.516497 -0.833246  0.695213
    17   0.445435  0.300559 -0.683455  0.858445
    18   0.424248  0.233874 -0.676784  0.879286
    19   0.528965 -0.061176 -0.462627  0.834017
    20   0.793513 -0.285642 -0.570421  0.456161
    21   0.771210 -0.472012 -0.427634  0.489903
    22   0.690572 -0.394875 -0.350176  0.697332
    23   0.652243 -0.487154 -0.253501  0.716340
    24   0.758479 -0.765630 -0.202638  0.606289
    25   0.929314 -0.889273 -0.357062  0.522449
    26   0.875805 -0.869253 -0.272132  0.482170
    27   0.527850 -0.532585  0.178658  0.156535
    28   0.592946 -0.613963  0.239219  0.101630
    29   0.670114 -0.794297  0.576524 -0.187906
    30   0.580165 -0.608576  0.466616 -0.258936
    31   0.517314 -0.609204  0.438855 -0.067928
    32   0.370829 -0.338844  0.511657 -0.235225
    33        NaN       NaN       NaN       NaN
    34        NaN       NaN       NaN       NaN
    35        NaN       NaN       NaN       NaN
    36        NaN       NaN       NaN       NaN
    37        NaN       NaN       NaN       NaN
    38        NaN       NaN       NaN       NaN
    39        NaN       NaN       NaN       NaN
    40        NaN       NaN       NaN       NaN
    41        NaN       NaN       NaN       NaN
    42   0.952411  0.355941 -0.646408 -0.469534
    43   0.993868  0.143298 -0.498905 -0.381196
    44   0.921548  0.157348 -0.543432 -0.169947
    45   1.023682 -0.092272 -0.521587 -0.093258
    46   0.802979 -0.343775 -0.115053  0.043999
    47   0.654425 -0.186637 -0.095179  0.112743
    48   0.747796 -0.250212 -0.252141  0.067891
    49   0.784292 -0.308890 -0.303901  0.072922    
    
    >>> df['resid'] = df.groupby('groups').apply(lambda x: rolling_multiple(x,'y',['x1','x2','x3'],10,residuals=True)).reset_index(drop=True)['resid']
    >>> df.head(50)
    
               y        x1        x2        x3  groups     resid
    0   0.292707  0.527416  0.352366  0.942830       1       NaN
    1   0.009256  0.056392  0.577220  0.316502       1       NaN
    2   0.254997  0.303502  0.312090  0.450186       1       NaN
    3   0.630078  0.485770  0.652494  0.219974       1       NaN
    4   0.806714  0.117059  0.544942  0.752988       1       NaN
    5   0.259738  0.332130  0.041728  0.038601       1       NaN
    6   0.186582  0.601384  0.903792  0.943630       1       NaN
    7   0.058661  0.537003  0.749255  0.333435       1       NaN
    8   0.693882  0.829473  0.949468  0.602170       1       NaN
    9   0.107025  0.314554  0.881373  0.008589       1 -1.104885
    10  0.985555  0.898102  0.889376  0.836676       1 -0.763909
    11  0.844293  0.883783  0.097372  0.114230       1 -0.896729
    12  0.110527  0.617211  0.714501  0.467282       1 -1.365879
    13  0.991384  0.615916  0.331154  0.338925       1 -0.661689
    14  0.858861  0.166270  0.221500  0.783701       1 -0.773878
    15  0.679131  0.070385  0.024244  0.108567       1 -0.922935
    16  0.083172  0.633883  0.981583  0.209229       1 -1.047747
    17  0.105980  0.755695  0.346644  0.005208       1 -1.334142
    18  0.169534  0.580690  0.323765  0.146509       1 -1.300227
    19  0.602168  0.027302  0.991393  0.273679       1 -0.694735
    20  0.824959  0.517080  0.378158  0.273586       1 -0.729945
    21  0.284846  0.843899  0.496191  0.116380       1 -0.932860
    22  0.808722  0.096899  0.268888  0.228202       1 -0.908562
    23  0.885483  0.522229  0.584660  0.652320       1 -0.831425
    24  0.020493  0.806937  0.321778  0.547365       1 -1.386828
    25  0.127404  0.705129  0.991105  0.478141       1 -1.070777
    26  0.296545  0.565533  0.330202  0.286323       1 -1.135866
    27  0.022179  0.223031  0.029895  0.350958       1 -1.447167
    28  0.971971  0.092243  0.829478  0.411799       1 -0.804620
    29  0.732190  0.646287  0.889335  0.333274       1 -0.874680
    30  0.412803  0.168067  0.896444  0.563714       1 -1.337410
    31  0.162741  0.939633  0.546790  0.941365       1 -0.958161
    32  0.325315  0.259795  0.818819  0.846654       1 -1.177284
    33  0.154722  0.702591  0.791131  0.961928       2       NaN
    34  0.491407  0.518855  0.231681  0.955700       2       NaN
    35  0.159072  0.036437  0.620747  0.555733       2       NaN
    36  0.891759  0.617571  0.187414  0.401648       2       NaN
    37  0.481304  0.750031  0.900041  0.354723       2       NaN
    38  0.669076  0.110193  0.764837  0.087526       2       NaN
    39  0.522653  0.313408  0.491214  0.584310       2       NaN
    40  0.576749  0.762469  0.646008  0.033188       2       NaN
    41  0.755683  0.401390  0.045780  0.468417       2       NaN
    42  0.939899  0.806682  0.387371  0.694695       2 -0.723060
    43  0.921698  0.033224  0.613996  0.359314       2 -0.633637
    44  0.365582  0.376107  0.799801  0.158204       2 -1.153621
    45  0.401404  0.592845  0.700801  0.967785       2 -1.111793
    46  0.225042  0.907086  0.020412  0.483045       2 -1.285008
    47  0.325359  0.003486  0.641433  0.148209       2 -1.284073
    48  0.239367  0.753501  0.730812  0.733136       2 -1.185399
    49  0.221450  0.631365  0.604590  0.536159       2 -1.223183
    
    
Rolling Multiple Regression Speed
----------------------------------

Here are some timings for the :code:`rolling_multiple` method.

Scaling with number of observations

::

    >>> def produce_data(nobs,nx):
    >>>     return pd.DataFrame(np.random.random((nobs,nx+1)))
        
    >>> df = produce_data(100,3)
    >>> %timeit rolling_multiple(df,0,[1,2,3],10)
    24.2 ms ± 440 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
     
    >>> df = produce_data(1000,3)
    >>> %timeit rolling_multiple(df,0,[1,2,3],120)
    184 ms ± 1.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
     
    >>> df = produce_data(int(1.0e4),3)
    >>> %timeit rolling_multiple(df,0,[1,2,3],120)
    2.04 s ± 21.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
     
    >>> df = produce_data(int(1.0e5),3)
    >>> %timeit rolling_multiple(df,0,[1,2,3],120,predispatch='auto')
    18.9 s ± 200 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
     
    >>> df = produce_data(int(1.0e6),3)
    >>> %timeit -r2 -n1 rolling_multiple(df,0,[1,2,3],120,predispatch='auto')
    3min 10s ± 3.18 s per loop (mean ± std. dev. of 2 runs, 1 loop each)
    
    
Scaling with number of regressors

::

    >>> df = produce_data(int(1.0e4),3)
    >>> %timeit rolling_multiple(df,0,[1,2,3],120)
    1.94 s ± 31.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

    >>> df = produce_data(int(1.0e4),20)
    >>> %timeit rolling_multiple(df,0,[i for i in range(1,21)],120)
    2.31 s ± 24.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

    >>> df = produce_data(int(1.0e4),100)
    >>> %timeit rolling_multiple(df,0,[i for i in range(1,101)],120)
    5.71 s ± 1.25 s per loop (mean ± std. dev. of 7 runs, 1 loop each)


Scaling with window size

::

    >>> df = produce_data(int(1.0e4),3)
    >>> %timeit rolling_multiple(df,0,[1,2,3],10)
    1.82 s ± 31.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

    >>> df = produce_data(int(1.0e4),3)
    >>> %timeit rolling_multiple(df,0,[1,2,3],50)
    1.88 s ± 28.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

    >>> df = produce_data(int(1.0e4),3)
    >>> %timeit rolling_multiple(df,0,[1,2,3],100)
    1.91 s ± 18.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

    >>> df = produce_data(int(1.0e4),3)
    >>> %timeit rolling_multiple(df,0,[1,2,3],500)
    2.01 s ± 28.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)