Monday, November 17, 2008

Call C from R and R from C

Several years ago, while a research associate at the University of Chicago, I had the privilege of sitting in on a course taught by Peter Rossi: Bayesian Applications in Marketing and MicroEconometrics. This course -- one I recommend to anyone at U Chicago who is interested in statistics -- was an incredibly clear treatment of Bayesian statistics, but the aspect I appreciated most was Peter's careful demonstration of Bayesian theory and methods using R.

One feature of R that I had not made use of up until that point was the ability to call compiled C and Fortran functions from within R (this makes loop-heavy Metropolis-Hastings samplers much, much faster). It turns out that you can also include the R libraries in C source code so that R functions (e.g., random number generators) can be easily accessed. The R-Cran website has an excellent tutorial on how to develop R extensions (here), but I wanted to share an example Peter used in class because it is extremely brief, and for 95% of what I do, this is all I need.

As Peter writes, this is an incredibly inefficient way of simulating from the chisquare distribution, but it demonstrates the point. His more extensive writeup is located here.

Save the following as testfun.c:
#include <R.h>
#include <Rmath.h>
#include <math.h>
/* Function written by Peter Rossi from the University of Chicago GSB */
/* */

/* include standard C math library and R internal function declarations */
void mychisq(double *vec, double *chisq, int *nu)
/* void means return nothing */
int i,iter; /* declare local vars */
/* all statements end in; */
GetRNGstate(); /* set random number seed */
for (i=0 ; i < *nu; ++i)
/* loop over elements of vec */
/*nu "dereferences" the pointer */
{ /* vectors start at 0 location!*/
vec[i] = rnorm(0.0,1.0); /*use R function to draw normals */
Rprintf("%ith normal draw= %lf \n",(i+1),vec[i]);
/* print out results for "debugging" */
while(iter < *nu) /* "while" version of a loop */
if( iter == iter)
{*chisq=*chisq + vec[iter]*vec[iter];}
/* redundant if stmnt */
iter=iter+1; /* note: can't use ** */
/* if you want to be "cool" use iter += 1 */
PutRNGstate(); /* write back ran number seed */

To call this function in R, you first need to compile it. To do this you need all the standard compilers and libraries for your operating system. For Debian or Ubuntu, this should do it (if I missed a package, let me know in the comments):

$ sudo aptitude update
$ sudo aptitude install build-essential r-base-dev

Now, you should be able to compile the function:

$ R CMD SHLIB testfun.c

If all goes well, you should see the files testfun.o and in the directory. To test the function we will source the following R script into R:

##This function is just a wrapper for .C
vector=double(nu); chisq=1

##Load the compiled code (you may need to include
## the explicit file path if it is not local
## NOTE: for Windows machines, you will want to load testfun.dll"


##If you re-compile testfun.c, you must unload it
## and then re-load it:
## dyn.unload("")
## dyn.load("")

And get the following output:

> dyn.load("")
> result<-call_mychisq(10) 1th normal draw= -1.031170 2th normal draw= -1.214103 3th normal draw= 0.002335 4th normal draw= 0.296146 5th normal draw= -0.908862 6th normal draw= -1.567820 7th normal draw= -0.079227 8th normal draw= -1.404414 9th normal draw= 0.616567 10th normal draw= -0.007855 > result
[1] 8.268028

Wednesday, November 5, 2008

Using subversion to manage code

I have finally come to terms with the fact that I need some kind of version control for the projects I am working on and the best bet these days is Subversion (svn). I have been using svn for some time now via a GUI client (Linux: kdesvn, Windows: tortoisesvn); however, it turns out that working with svn from the command line is pretty easy and far more versatile. For a complete treatment of this subject, check out the online documentation here and an excellent cheat sheet here. What follows is a quick primer on the very basics of setting up and managing your svn repositories on a local machine or server.

For ease of use I will describe the creation of a single repository for a single project. This means a little more overhead; however, it makes the repository more portable and flexible in the long run.

1) First we set up the repository structure in a temporary folder (either on the server or locally):
$ mkdir ~/tmp
$ mkdir ~/tmp/project1
$ mkdir ~/tmp/project1/trunk
$ mkdir ~/tmp/project1/branches
$ mkdir ~/tmp/project1/tags

2) Now, make a folder to hold your repositories and create an empty repository for your project.
$ mkdir ~/svnroot
$ svnadmin create ~/svnroot/project1

3) Import the folder structure into the empty repository. After this import, the folders in tmp can be removed -- they are only there to make the creation of the folder structure easier.
$ svn import ~/tmp/project1 file:///home/myusername/svnroot/project1 -m "Initial import"

4) Finally, make your current project folder a "working copy" of the repository. Checkout the trunk (or head) of the repository to the folder where project1 currently resides (in this example, the existing project files are located at ~/working/project1).
If you created your repository on a local folder:
$ svn checkout file:///home/myusername/svnroot/project1/trunk ~/working/project1

Alternatively, if you created your repository on a remote server:
$ svn checkout svn+ssh:// ~/working/project1

Because the repository is empty at this stage, all the above commands do is create a .svn folder in the ~/working/project1 directory. The following command will show that there are folders and files in the project directory that are not currently part of the repository:
$ svn st
? somefolder
? someotherfolder
? somefile.txt

Now you need to add all of the files and folders in this directory to the repository. This is easily accomplished using a bit of awk code (modified from a post here):

$ svn status | grep "^\?" | awk -F "      " '{print $2}' | tr "\n" "\0" | xargs -0 svn add
$ svn st
A somefolder
A somefolder/file1.txt
A somefolder/file2.txt
A someotherfolder
A someotherfolder/file3.txt
A somefile.txt

Now you just need to commit these changes and your working directory is up to date:
$ svn commit -m "Adding original files to repository."

5) When you are ready to commit new changes to the repository, make sure that all new files/folders are added and all deleted files/folders are removed:
$ svn status | grep "^\?" | awk -F "      " '{print $2}' | tr "\n" "\0" | xargs -0 svn add
$ svn status | grep "^\!" | awk -F " " '{print $2}' | tr "\n" "\0" | xargs -0 svn remove
$ svn commit -m "Some comment to remind you why you are committing changes..."

6) Finally, if for some reason you want to remove a working directory from versioning (i.e., get rid of the .svn folders that are placed in every folder subfolder), use the following:
$ cd ~/working/project1
$ rm -rf 'find . -name .svn'

Update: If your svn repository changes to a new server name, use the following syntax to update your working directory:
$ cd ~/working/project1
$svn info
[the current URL and other info are printed to the screen]
$svn switch --relocate svn+ssh://OLD.URL/path/to/svnrepo svn+ssh://NEW.URL/path/to/svnrepo .
$svn commit -m "new update"

Note: If you mis-type the old URL, "svn switch" will fail silently. So make sure to check that it has updated by using the "svn info" command.