Thursday, November 29, 2007

Convert factors to numbers

If you have a vector of factors it is easy to get the factor level; however, I always forget how to extract the factor value. I ran into the answer here.

> x<-factor(c(round(rnorm(10),2),"A","B",NA))
> x
[1] 1.61 1.12 1.26 0.09 -0.13 0.16 -0.03 -0.1 0.09 -0.47
[11] A B
Levels: -0.03 0.09 -0.1 -0.13 0.16 -0.47 1.12 1.26 1.61 A B
> as.numeric(x)
[1] 9 7 8 2 4 5 1 3 2 6 10 11 NA
> as.numeric(levels(x)[x])
[1] 1.61 1.12 1.26 0.09 -0.13 0.16 -0.03 -0.10 0.09 -0.47
[11] NA NA NA
Warning message:
NAs introduced by coercion



**UPDATE**
And another method mentioned in the comments (that I like better):

> as.numeric(as.character(x))

Thursday, November 15, 2007

Preparing plots for publication

The plotting capabilities of R are excellent; however, when I am preparing a figure for publication, I often need to combine multiple plots or add objects (e.g., arrows or text) to an existing plot. While this can be accomplished in R, my patience for tweaking layout parameters tends to run out quickly. I searched around and found a nice solution using open source software (an example with Matlab plots is described here).


1) Create the desired figure in R and export as an eps file:



library(ggplot2)
postscript("myfig.eps",horizontal=FALSE, paper="special",height=10,width=10)
qplot(x,y,data=mydat)
dev.off()


2) Install the necessary software on your computer, and then convert the image file to one more easily edited.


#install software
sudo aptitude install pstoedit tgif

#clean up the eps file so it is more easily read
eps2eps myfig.eps myfig2.eps

#convert the eps file to a tgif object
pstoedit -f tgif myfig2.eps myfig2.obj


3) Edit the new file myfig2.obj in tgif.

Thursday, October 18, 2007

Approximate sunrise and sunset times

This function is not perfect, but it does a reasonable job estimating sunrise and sunset times for my field site. If more accurate data are required, try here. Note, the command to calculate day of year is: strptime(x, "%m/%d/%Y")$yday+1
suncalc<-function(d,Lat=48.1442,Long=-122.7551){
  ## d is the day of year
  ## Lat is latitude in decimal degrees
  ## Long is longitude in decimal degrees (negative == West)
  
  ##This method is copied from:
  ##Teets, D.A. 2003. Predicting sunrise and sunset times.
  ##  The College Mathematics Journal 34(4):317-321.
 
  ## At the default location the estimates of sunrise and sunset are within
  ## seven minutes of the correct times (http://aa.usno.navy.mil/data/docs/RS_OneYear.php)
  ## with a mean of 2.4 minutes error.

  ## Function to convert degrees to radians
  rad<-function(x)pi*x/180
  
  ##Radius of the earth (km)
  R=6378
  
  ##Radians between the xy-plane and the ecliptic plane
  epsilon=rad(23.45)

  ##Convert observer's latitude to radians
  L=rad(Lat)

  ## Calculate offset of sunrise based on longitude (min)
  ## If Long is negative, then the mod represents degrees West of
  ## a standard time meridian, so timing of sunrise and sunset should
  ## be made later.
  timezone = -4*(abs(Long)%%15)*sign(Long)

  ## The earth's mean distance from the sun (km)
  r = 149598000

  theta = 2*pi/365.25*(d-80)

  z.s = r*sin(theta)*sin(epsilon)
  r.p = sqrt(r^2-z.s^2)

  t0 = 1440/(2*pi)*acos((R-z.s*sin(L))/(r.p*cos(L)))
  
  ##a kludge adjustment for the radius of the sun
  that = t0+5 

  ## Adjust "noon" for the fact that the earth's orbit is not circular:
  n = 720-10*sin(4*pi*(d-80)/365.25)+8*sin(2*pi*d/365.25)

  ## now sunrise and sunset are:
  sunrise = (n-that+timezone)/60
  sunset = (n+that+timezone)/60

  return(list("sunrise" = sunrise,"sunset" = sunset))
}

Sunday, October 14, 2007

Convert polar coordinates to Cartesian

When I want to calculate the coordinates of a location (e.g., a nest or burrow) based on distance and bearing from a grid point, this function helps me avoid writing down SOH-CAH-TOA every time. Just note that the bearing in this case is from the grid point (known location) to the unknown location.

polar2cart<-function(x,y,dist,bearing,as.deg=FALSE){
## Translate Polar coordinates into Cartesian coordinates
## based on starting location, distance, and bearing
## as.deg indicates if the bearing is in degrees (T) or radians (F)

if(as.deg){
##if bearing is in degrees, convert to radians
bearing=bearing*pi/180
}

newx<-x+dist*sin(bearing) ##X
newy<-y+dist*cos(bearing) ##Y
return(list("x"=newx,"y"=newy))
}


##Example

oldloc=c(0,5)
bearing=200 #degrees
dist=5

newloc<-polar2cart(oldloc[1],oldloc[2],dist,bearing,TRUE)
plot(oldloc[1],oldloc[2],xlim=c(-10,10),ylim=c(-10,10))
points(newloc$x,newloc$y,col="red")

Wednesday, October 3, 2007

Reorder factor levels

Very often, especially when plotting data, I need to reorder the levels of a factor because the default order is alphabetical. There must be many ways of reordering the levels; however, I always forget which package to look in. A direct way of reordering, using standard syntax is as follows:
## generate data
x = factor(sample(letters[1:5],100, replace=TRUE))

print(levels(x)) ## This will show the levels of x are "Levels: a b c d e"

## To reorder the levels:
## note, if x is not a factor use levels(factor(x))

x = factor(x,levels(x)[c(4,5,1:3)])

print(levels(x)) ## Now "Levels: d e a b c"
In this case, the level order could be set in the first line; however, if there are many levels (and you don't want to type out all of the levels explicitly), the above method is preferred. Again, if there is a better way to do this, please let me know in the comments.

Wednesday, September 5, 2007

Extract objects from a list

When using Rmpi to send processes to many nodes, it is convenient to create a list of tasks that are assigned to nodes as they become available. In my case, I was working through a large factorial set of simulations and needed to use a unique set of variable values for each task. Rather than assign these variables individually within the slave function, I decided to extract the variables directly from the list using a variation of the following code:

mylist=(list(a=1,b=2,c="string1",d=list("r"=2,"z"="string2")))

for(i in 1:length(mylist)){
##first extract the object value
tempobj=mylist[[i]]
##now create a new variable with the original name of the list item
eval(parse(text=paste(names(mylist)[[i]],"= tempobj")))
}

> print(a)
[1] 1
> print(b)
[1] 2
> print(c)
[1] "string1"
> print(d)
$r
[1] 2

$z
[1] "string2"

If there is a more elegant way to do this, please let me know in the comments.

Parallel computing in R

Lately I have been running R in parallel on a Beowulf cluster using the Rmpi package. Instrumental in my ability to do this were two excellent tutorials:

The Division of Biostatistics at UCSF provides some of the background required to get things running on the Linux side of things. (Note: sometimes the url for this site comes up with an extra "biostat" in the address -- just delete that and the page should load.

The Acadia Centre for Mathematical Modelling and Computation has a variety of sample scripts to point you in the right direction. Once you read through the code, it becomes very easy to set up a template you can re-use in the future.

Thursday, August 23, 2007

Execute system commands within an R Script

To execute a system command from within an R script, just use system(). This is handy for zipping large output files on the fly.

write.csv(mydat, "mydat.csv")
system("gzip mydat.csv", wait=FALSE)

Friday, August 17, 2007

Offset in glm ()

To add an offset to the linear predictor of a generalized linear model (or models from the survival package such as coxph and clogit), use offset(x) in the formula. This will add an offset to the linear predictor with known coefficient 1.

Thursday, August 16, 2007

Including arguments in R CMD BATCH mode

When you have multiple computers or processors at your disposal and wish to run the same script with different arguments, use the following at the command line (here described for Linux; remove the linebreak, it is just there for display purposes):

$ R CMD BATCH --no-save --no-restore '--args a=1 b=c(2,5,6)'
test.R test.out &

Where test.R is the R script file you wish to run and test.out is a text file to include the screen output of the R terminal. A key point here is that each argument must have no spaces because --args is space delimited.

To include the variables listed in --args, adapt the following code from test.R:

##First read in the arguments listed at the command line
args=(commandArgs(TRUE))

##args is now a list of character vectors
## First check to see if arguments are passed.
## Then cycle through each element of the list and evaluate the expressions.
if(length(args)==0){
print("No arguments supplied.")
##supply default values
a = 1
b = c(1,1,1)
}else{
for(i in 1:length(args)){
eval(parse(text=args[[i]]))
}
}
print(a*2)
print(b*3)

This produces the following in test.out:

> print(a*2)
[1] 2
> print(b*3)
[1] 6 15 18

Friday, July 20, 2007

Variance-Covariance Matrix in glm

This is a small function Venables and Ripley provide in their MASS book. You don't need it anymore because vcov() has a method for the glm class. However, it is useful to see how to extract bits from a fitted model object.


vcov.glm<-function(obj){
#return the variance-covariance matrix of a glm object
#from p. 188 in Venables and Ripley. 2002.
#Modern Applied Statistics With S. Springer. New York.
so <- summary(obj,corr=F)
so$dispersion * so$cov.unscaled
}


Wednesday, July 18, 2007

Visualizing Data

Sometimes it is helpful to see the plots created from the example
files within R libraries. Of course you can paste them directly
into R, but this web page allows you to view the images within the
context of the help files:

R Graphical Manuals

Also, check out this link for the R Graph Gallery.

Thursday, May 17, 2007

Random number generation

Luc Devroye from McGill University's School of Computer Science has provided pdf copies of his book, "Non-Uniform Random Variate Generation," on-line.

http://cg.scs.carleton.ca/~luc/rnbookindex.html

Most common distributions are already built into R, but sometimes you need to build your own -- Luc provides the algorithms to do it.

Repeat elements of a vector

Two methods. The first produces a vector, the second a one column matrix.

x=1:10

# Method 1
rep(x,each=3)

# Method 2
matrix(t(matrix(x,length(x),3)))

Convert image to matrix in R

This is how to use the Pixmap library to read in an image as a matrix.


> library(pixmap)
# the next command may only work on Linux
> system("convert foo.tiff foo.ppm")
> img <- read.pnm("foo.ppm")


To get info on your new object:

> str(img)

Although included in the previous output, the size of the image can be extracted by:

>img@size

Then to extract the red channel from the image for the first ten rows:

> myextract <- img@red[1:10,]

Or to extract the entire red channel to an actual matrix:

> red.mat<-matrix(NA,img@size[1],img@size[2])
> red.mat<-img@red

Wednesday, May 16, 2007

Calculate turning angles and step lengths from location data

anglefun <- function(xx,yy,bearing=TRUE,as.deg=FALSE){
  ## calculates the compass bearing of the line between two points
## xx and yy are the differences in x and y coordinates between two points
## Options:
## bearing = FALSE returns +/- pi instead of 0:2*pi
## as.deg = TRUE returns degrees instead of radians
c = 1
if (as.deg){
c = 180/pi
}

b<-sign(xx)
b[b==0]<-1 #corrects for the fact that sign(0) == 0
tempangle = b*(yy<0)*pi+atan(xx/yy)
if(bearing){
#return a compass bearing 0 to 2pi
#if bearing==FALSE then a heading (+/- pi) is returned
tempangle[tempangle<0]<-tempangle[tempangle<0]+2*pi
}
return(tempangle*c)
}

bearing.ta <- function(loc1,loc2,loc3,as.deg=FALSE){
## calculates the bearing and length of the two lines
## formed by three points
## the turning angle from the first bearing to the
## second bearing is also calculated
## locations are assumed to be in (X,Y) format.
## Options:
## as.deg = TRUE returns degrees instead of radians
if (length(loc1) != 2 | length(loc2) != 2 | length(loc3) !=2){
print("Locations must consist of either three vectors, length == 2,
or three two-column dataframes"
)
return(NaN)
}
c = 1
if (as.deg){
c = 180/pi
}

locdiff1<-loc2-loc1
locdiff2<-loc3-loc2
bearing1<-anglefun(locdiff1[1],locdiff1[2],bearing=F)
bearing2<-anglefun(locdiff2[1],locdiff2[2],bearing=F)

if(is.data.frame(locdiff1)){
dist1<-sqrt(rowSums(locdiff1^2))
dist2<-sqrt(rowSums(locdiff2^2))
}else{
dist1<-sqrt(sum(locdiff1^2))
dist2<-sqrt(sum(locdiff2^2))
}

ta=(bearing2-bearing1)

ta[ta < -pi] = ta[ta < -pi] + 2*pi
ta[ta > pi] = ta[ta > pi] - 2*pi
return(list(bearing1=unlist(bearing1*c),bearing2=unlist(bearing2*c),
ta=unlist(ta*c),dist1=unlist(dist1),dist2=unlist(dist2)))
}