Tuesday, December 13, 2016

Recursively search for text in R scripts.

Often I find myself needing to search for .R or or .Rmd files in which I have used a specific function. There are a variety of searches that you can do; however, I wanted something that would work recursively at the command line of a Linux or Mac terminal. I found the answer here and have modified it a bit so that it works as a function in Bash (in this example I search through .R and . Rmd files).

Add the following to your ~/.bash_aliases file and then source ~/.bashrc:


findinRfun() {

if [ -z "$1" ]
 then
    echo "No argument supplied"
 else
    #search text is passed as argument $1
    find . -type f \( -name "*.R" -o -name "*.Rmd" -o -name "*.r" -o -name "*.rmd" \) -print0 | xargs --null grep --with-filename --line-number --no-messages --color --ignore-case "$1"
fi
            }

alias findinR=findinRfun


Now, at the command line, you can search through all the files below the current location that contain your search text.

findinR mysearchtext

Friday, May 16, 2014

Sample uniformly within a fixed radius.

I was asked how to do this today and thought that I would share the answer:
## Sample points uniformly within a fixed radius

nrand=1000
maxstep=10

## Sample data  
## NB: To get a truly uniform sample over the circle, you must 
##     sample the square of the distance and then transform back.
tempdat<-data.frame(X0=0,Y0=0, bearing0=0, 
                    bad.dist= runif(nrand)*maxstep, 
                    dist2=sqrt(runif(nrand)*maxstep^2),
                    turningangle=runif(nrand)*2*pi-pi)

##convert Turning angle to bearing (in this case no change)
tempdat$bearing=tempdat$bearing0+tempdat$turningangle

## Convert from polar to cartesian coordinates
tempdat$X<-tempdat$X0+tempdat$dist2*sin(tempdat$bearing)
tempdat$Y<-tempdat$Y0+tempdat$dist2*cos(tempdat$bearing)

tempdat$Xbad<-tempdat$X0+tempdat$bad.dist*sin(tempdat$bearing)
tempdat$Ybad<-tempdat$Y0+tempdat$bad.dist*cos(tempdat$bearing)


##make plots
png(filename="sampleplots.png",width=500,height=1000)
par(mfrow=c(2,1))
plot(Ybad~Xbad, data=tempdat, asp=1, main="Center is oversampled")
plot(Y~X, data=tempdat, asp=1, main="Uniform across space")
dev.off()

Monday, April 28, 2014

RStudio and X2Go

After a recent system upgrade (to Linux Mint LMDE), I was no longer able to run RStudio through the remote desktop application X2Go. It turns out that this is due to a problem with the Qt libraries (see this website). As suggested here, I just deleted all the Qt libraries used by RStudio:

sudo rm  /usr/lib/rstudio/bin/libQt* 
 

and all was well. (NB, the specific link to these files will vary depending on the flavor of Linux you use.)

**EDIT: If you are installing this on pure Debian (I just confirmed this with a new Jessie install) you will need to install appropriate libraries too:

sudo aptitude install libqtwebkit4
 




Wednesday, November 10, 2010

At last... I have been suffering with XEmacs displaying odd characters instead of the quotation marks that are used in R help files. This was driving me up the wall because it makes the files (and R output in general) very hard to read; however, I finally diagnosed the problem: Xemacs was not recognizing UTF-8 encoding. Below is a quote from Marjan Parsa that describes how to set up Emacs and XEmacs to automatically detect UTF-8 files. My quality of life has already improved.


How can I get XEmacs to work with UTF-8 files?

* Set up XEmacs so that it autodetects UTF-8 encoded files.
* In the case of starting a new file in a non-UTF-8 locale, set the file coding system to UTF-8 using C-x RET f.
* If running XEmacs in non-graphical mode in a UTF-8 xterm, set the terminal coding system to UTF-8 using C-x RET t.

If you want XEmacs to load UTF-8 files correctly, add the following lines to your ~/.xemacs/init.el:

(require 'un-define)
(set-coding-priority-list '(utf-8))
(set-coding-category-system 'utf-8 'utf-8)

Note that Emacs does not deal well with these additions, so if you also run Emacs, then adding the following will keep Emacs from complaining:

;; Are we running XEmacs or Emacs?
(defvar running-xemacs (string-match "XEmacs\\|Lucid" emacs-version))

...

(if (not running-xemacs) nil
;; enable Mule-UCS
(require 'un-define)

;; by default xemacs does not autodetect Unicode
(set-coding-priority-list '(utf-8))
(set-coding-category-system 'utf-8 'utf-8))

These lines will get XEmacs to load UTF-8 files in UTF-8 mode (it will display a "u" in the bottom left corner of your status bar). If you have already loaded a file and would like to start inputting UTF-8, you can use C-x RET f, to set the file coding system to UTF-8. Note that you may additionally have to set the terminal coding system to UTF-8. This seems to be necessary, for example, in the case where XEmacs is run in non-graphical mode inside a UTF-8 enabled xterm. You can set the terminal encoding using C-x RET t.

Caution: I have had problems with XEmacs double encoding in the case where 1) the file contains UTF-8, 2) the file is loaded in non-UTF-8 mode, 3) the user switches to UTF-8 mode (using C-x RET f), 4) enters some text, and 5) saves. In other words, if your file already contains UTF-8 characters, make sure that it is loaded in UTF-8 mode before editing it.

Thursday, October 22, 2009

ISO week

I am working with a model that produces estimates of snow water equivalent through time. Because I deal with large spatial extents, I decided to have the model produce weekly averages. The problem with this is knowing which file to access for a given date. The snow files are saved using an ISO-8061 week number (I had no idea how many date "standards" there were, but this is a common one). I thought that getting the week number out of a date format in R would be easy, but I was wrong. It turns out to be platform specific and regardless, as far as I could tell, there is no built-in way to calculate the ISO week. Thus the following function.

ISOweek<-function(date,format="%Y-%m-%d",return.val="weekofyear"){
##converts dates into "dayofyear" or "weekofyear", the latter providing the ISO-8601 week
##date should be a vector of class Date or a vector of formatted character strings
##format refers to the date form used if a vector of
## character strings is supplied

##convert date to POSIXt format
if(class(date)[1]%in%c("Date","character")){
date=as.POSIXlt(date,format=format)
}
if(class(date)[1]!="POSIXt"){
print("Date is of wrong format.")
break
}else if(class(date)[2]=="POSIXct"){
date=as.POSIXlt(date)
}


if(return.val=="dayofyear"){
##add 1 because POSIXt is base zero
return(date$yday+1)
}else if(return.val=="weekofyear"){
##Based on the ISO8601 weekdate system,
## Monday is the first day of the week
## W01 is the week with 4 Jan in it.
year=1900+date$year
jan4=strptime(paste(year,1,4,sep="-"),format="%Y-%m-%d")
wday=jan4$wday

wday[wday==0]=7 ##convert to base 1, where Monday == 1, Sunday==7

##calculate the date of the first week of the year
weekstart=jan4-(wday-1)*86400
weeknum=ceiling(as.numeric((difftime(date,weekstart,units="days")+0.1)/7))

#########################################################################
##calculate week for days of the year occuring in the next year's week 1.
#########################################################################
mday=date$mday
wday=date$wday
wday[wday==0]=7
year=ifelse(weeknum==53 & mday-wday>=28,year+1,year)
weeknum=ifelse(weeknum==53 & mday-wday>=28,1,weeknum)

################################################################
##calculate week for days of the year occuring prior to week 1.
################################################################

##first calculate the numbe of weeks in the previous year
year.shift=year-1
jan4.shift=strptime(paste(year.shift,1,4,sep="-"),format="%Y-%m-%d")
wday=jan4.shift$wday
wday[wday==0]=7 ##convert to base 1, where Monday == 1, Sunday==7
weekstart=jan4.shift-(wday-1)*86400
weeknum.shift=ceiling(as.numeric((difftime(date,weekstart)+0.1)/7))

##update year and week
year=ifelse(weeknum==0,year.shift,year)
weeknum=ifelse(weeknum==0,weeknum.shift,weeknum)

return(list("year"=year,"weeknum"=weeknum))
}else{
print("Unknown return.val")
break
}
}




Of course after I wrote this function, I found this thread in R-help. Gustaf Rydevik provided a function that is similar to mine (his function is also about twice as fast); however, on my computer it was giving incorrect years and week numbers for days in January that occur in the previous year's final week (e.g., 1 January 2010 should be week 53 of 2009). Reading through the rest of the thread indicates that the confusion involving week numbers is widespread. I guess the moral of the story is pick a standard and stick to it.

Please let me know in the comments if you encounter any problems running this function (or if you can suggest ways to make it more efficient).

Leap years

A quick function that when provided a numeric vector of years returns a boolean vector where TRUE == Leap year.

is.leapyear=function(year){
#http://en.wikipedia.org/wiki/Leap_year
return(((year %% 4 == 0) & (year %% 100 != 0)) | (year %% 400 == 0))
}

Thursday, September 3, 2009

Truncated Normal Distribution

Many distributions may be used to describe patterns that are non-negative; however, there are not as many choices when an upper bound is also needed (although the beta distribution is very flexible). For various reasons, truncated distributions are sometimes preferred, and the truncated normal is particularly popular. While R has a package that includes the standard functions for this distribution (see rtnorm, dtnorm, etc. in the msm pacakge), the true expectation and variance of the distribution may be of interest. It turns out that the first two moments of the truncated normal are not too hard to calculate (but worth writing functions for):

mean.tnorm<-function(mu,sd,lower,upper){
##return the expectation of a truncated normal distribution
lower.std=(lower-mu)/sd
upper.std=(upper-mu)/sd
mean=mu+sd*(dnorm(lower.std)-dnorm(upper.std))/
(pnorm(upper.std)
-pnorm(lower.std))
return(mean)
}

var.tnorm<-function(mu,sd,lower,upper){
##return the variance of a truncated normal distribution
lower.std=(lower-mu)/sd
upper.std=(upper-mu)/sd
variance=sd^2*(1+(lower.std*dnorm(lower.std)-upper.std*dnorm(upper.std))/
(pnorm(upper.std)-pnorm(lower.std))-((dnorm(lower.std)-dnorm(upper.std))/
(pnorm(upper.std)-pnorm(lower.std)))^2)
return(variance)
}

###Testing
> library(msm)
> a=rtnorm(1000000,-5,2,1,3)
> paste(mean(a),var(a))
[1] "1.52135857341077 0.197281057170982"
> paste(mean.tnorm(-5,2,1,3),var.tnorm(-5,2,1,3))
[1] "1.52090857118 0.197111175109889"