Monday, 22 July 2013

Python for Quant Research

I have been a daily user of Matlab for the last five years and find it to be an excellent tool for carrying out quantitative research. However, having recently started to learn Python, I am now officially a Python convert and favour this intuitive language as the tool of choice for quickly prototyping ideas. The Python community is open source and growing exponentially with no shortage of excellent packages on offer. Python also possesses a high level of interoperability with C/C++, and also with R via the RPy2 package.

There are a number of open source Machine Learning packages available including Statsmodels, PyML, PyBrain, MLPy, milk, and scikit-learn. These packages come with implementations of various regression and classification algorithms from Logistic Regression and Support Vector Machines, to Random Forest and Neural Networks. Each package has it's own pros and cons and some packages focus on particular algorithms. scikit-learn is my personal favourite ML Python package. Some other useful packages that I have installed include SciPy a library of scientific computing routines, NumPy which is an N-dimensional array package, Matplotlib for 2D plotting, Pandas for data structures and analysis, and iPython, which is an interactive console for editing Python code.

R has a mature set of statistical packages on offer which can be called from Python. Thus, I set about installing R, then RPy2, and finally RMetrics which is the most comprehensive R package for analysing financial time series. I decided to build R from source rather than download a binary. Use wget to download the source.

-bash$ wget http://ftp.heanet.ie/mirrors/cran.r-project.org/src/base/R-3/R-3.0.1.tar.gz

Extract using tar, and enter the source directory.

-bash$ tar -zxvf ; cd R-3.0.1

You must configure the build with the --enable-R-shlib option as this makes R a shared library, which is a prerequisite for the RPy2 installation.

-bash$ ./configure --prefix=$HOME/.local  --enable-R-shlib

The R make process can take a while so I put it into the background, and detach from the process with disown so that it does not terminate if I close my shell. I pipe the stdout and stderr to a text file.

-bash$ make  &> make.txt &
-bash$ disown -h

I can then keep track of the make progress by tailing this file.

-bash$ tail -f make.txt
Once the make is complete I install.

-bash$ make install

With R successfully installed I download the latest rpy2 package and extract.

-bash$ wget http://sourceforge.net/projects/rpy/files/latest/download?source=files
-bash$ tar -zxvf rpy2-2.3.1.tar.gz ; cd rpy2-2.3.1

Next, update the relevant environment variables in your .bash_profile. This will vary depending on your installation, check the installation guidelines for more. Finally, install!

-bash$ python setup.py install

I then ran a test Python script from the rpy2 introduction.
import rpy2.robjects as robjects
pi = robjects.r['pi']
print(pi[0])
And the script output the value of pi as expected!

-bash$ python rp2_test.py
-bash$ 3.141592653589793
Next I installed the excellent RMetrics from the R shell.

-bash$ R
> source("http://www.rmetrics.org/Rmetrics.R")
> install.Rmetrics()

No comments:

Post a Comment