[心得] sqldf效率问题

楼主: kenshin528 (成立奥凶帝国!!)   2014-07-24 13:30:54
[关键字]:sqldf tapply
[重点摘要]:
刚开始学R的时候对于R的指令很不熟,所以大部分都习惯用sqldf来写查询指令
但是随着对R的熟悉,最近也在尝试用R内建的fnction来查询资料
所以才想说来比较一下两者的效能差异
实验的DATA很简单(大至长这样,实验方法就是增加row的数量)
Category FREQ
T 0.2
T 0.3
T 0.4
F 0.5
F 0.6
F 0.7
目的是依照category来sum FREQ
原始码
#产生DATASET
x <- data.frame(Freq=runif(1000000,0,1),Category=c("T","F"))
##测试SQL
ptm_sql <- proc.time()
result<-sqldf("SELECT Category, sum(Freq)
FROM x
GROUP BY Category
")
ptm_sql <- proc.time() - ptm_sql
ptm_sql
##测试tapply
ptm_tapply <- proc.time()
result<-tapply(x$Freq, x$Category, FUN=sum)
ptm_tapply <- proc.time() - ptm_tapply
ptm_tapply
测试结果:
当rows = 10,000时
user system elapsed
SQLDF 0.05 0.00 0.94
TAPPLY 0.00 0.00 0.34

Links booklink

Contact Us: admin [ a t ] ucptt.com