[问题] dataframe includes date with caret

楼主: babysian7 (Babysian)   2015-11-03 04:18:09
文章分类提示:
- 问题: 当你想要问问题时,请使用这个类别
[问题类型]:
程式咨询(我想用R 做某件事情,但是我不知道要怎么用R 写出来)
[软件熟悉度]:
入门
[问题叙述]:
我有一个dataframe,里面包含日期变量,
'data.frame': 1000 obs. of 49 variables:
$ estate_Post : int 10069 10065 10044 10044 10044 10045 10044
10045 10044 10045 ...
$ estate_TransType : int 3 1 4 2 4 4 4 4 4 4 ...
$ estate_LandArea : num 15.54 47.3 20.89 1.99 23.98 ...
$ estate_ZoneUse : int 2 2 3 3 3 3 3 3 3 3 ...
$ estate_TransDate : Date, format: "1989-03-01" "1998-01-01"
"2015-01-01" "2015-01-01" ...
$ estate_Land : int 1 1 1 0 1 1 1 1 1 1 ...
$ estate_House : int 1 0 1 0 1 1 1 1 1 1 ...
$ estate_ParkingLot : int 0 0 2 2 2 1 3 3 4 3 ...
$ estate_TransFloor : int 5 -99 17 -4 11 6 6 5 15 5 ...
$ estate_TotalFloor : int 5 -99 31 31 31 31 31 31 31 31 ...
$ estate_HouseType : int 1 12 2 12 2 2 2 2 2 2 ...
$ estate_HouseUse : int 1 -99 1 3 1 1 1 1 1 1 ...
$ estate_HouseMaterials: int 5 -99 13 13 13 13 13 13 13 13 ...
$ estate_HouseDate : Date, format: "1967-05-19" NA "2013-11-29"
"2013-11-29" ...
$ estate_HouseArea : num 35.1 0 442.7 62.1 507.1 ...
$ estate_HouseRoom_1 : int 1 0 5 0 5 4 4 4 3 4 ...
$ estate_HouseRoom_2 : int 1 0 2 0 2 2 2 2 2 2 ...
$ estate_HouseRoom_3 : int 1 0 6 0 6 3 3 3 3 3 ...
$ estate_HouseRoom_4 : int 1 1 1 1 1 1 1 1 1 1 ...
$ estate_Guards : int 2 2 2 2 2 2 2 2 2 2 ...
$ estate_Price : int 3535 54299 164882 -99 195808 181428 174799
175356 190717 165250 ...
$ estate_ParkingType : int -99 -99 3 4 3 4 4 4 4 4 ...
$ estate_ParkingArea : num 0 0 13.2 32.2 27.5 ...
$ estate_ParkingPrice : int 0 0 0 5600000 0 0 0 0 8400000 0 ...
$ estate_Lng : num 122 122 122 122 122 ...
$ estate_Lat : num 25 25 25 25 25 ...
$ Aport_Distance : num 7.3 6.7 5.3 5.3 5.3 5.3 5.3 5.3 5.3 5.3 ...
$ ParkB_Distance : num 0.29 0.785 0.214 0.217 0.215 ...
$ Univ_Distance : num 1.7 1 1 1 1 1 1 1 1 1 ...
$ ParkR_Distance : num 1.4 2 1.7 1.7 1.7 1.6 1.7 1.7 1.7 1.6 ...
$ MRT_StationDistance : num 0.914 0.327 0.403 0.401 0.402 ...
$ MRT_LineDistance : num 999 999 999 999 999 999 999 999 999 999 ...
$ Fway_EntranceDistance: int 999 999 999 999 999 999 999 999 999 999 ...
$ Fway_LineDistance : int 999 999 999 999 999 999 999 999 999 999 ...
$ TRA_StationDistance : num 1 1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 ...
$ THSR_StationDistance : num 3.1 2.5 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 ...
$ River_Distance : num 999 1.84 1.49 1.48 1.49 ...
$ Schools_Distance : num 0.2 0.2 0.7 0.7 0.7 0.8 0.7 0.7 0.7 0.8 ...
$ Lib_Distance : num 0.8 0.9 1.2 1.2 1.2 1.2 1.2 1.2 1.2 1.2 ...
$ Sport_Distance : num 2.4 1.8 0.9 0.9 0.9 0.8 0.9 0.9 0.9 0.8 ...
$ ParkS_Distance : num 0.6 1 0.6 0.6 0.6 0.7 0.6 0.6 0.6 0.7 ...
$ Hyper_Distance : num 1.3 0.6 1.2 1.2 1.2 1.1 1.2 1.2 1.2 1.1 ...
$ Shop_Distance : num 1.7 1 0.5 0.5 0.5 0.4 0.5 0.5 0.5 0.4 ...
$ Post_Distance : num 0.5 0.2 0.5 0.5 0.5 0.4 0.5 0.5 0.5 0.4 ...
$ Hosp_Distance : num 0.7 0.4 0.9 0.9 0.9 0.8 0.9 0.9 0.9 0.8 ...
$ Gas_Distance : num 0.5 0.4 1.4 1.4 1.4 1.4 1.4 1.5 1.4 1.4 ...
$ Incin_Distance : num 10.9 10.2 8.9 8.9 8.9 8.9 8.9 8.9 8.9 8.9 ...
$ Mort_Distance : num 6.3 5.7 4.3 4.3 4.3 4.3 4.3 4.3 4.3 4.3 ...
$ estate_TotalPrice : num 124117 2568347 73000000 5600000 99300000 ...
当我将日期变量写成as.Date后,在挑选参数时会有错误讯息
Error in { :
task 1 failed - "rfe is expecting 48 importance values but only has 46"
In addition: Warning messages:
1: In predict.lm(object, x) :
prediction from a rank-deficient fit may be misleading
请问我该怎么改才好
[程式范例]:
library(mlbench)
library(caret)
library(maps)
library(rgdal)
library(raster)
library(sp)
library(spdep)
library(GWmodel)
library(e1071)
library(plyr)
library(kernlab)
library(zoo)
mydata <-
read.csv("E:/SupportVectorRegression/Realestatedata_1000_delete_date.csv",
header=TRUE)
mydata$estate_TransDate<-as.Date(paste(mydata$estate_TransDate,1,sep="-"),format="%Y-%m-%d")
mydata$estate_HouseDate<-as.Date(mydata$estate_HouseDate,format="%Y-%m-%d")
rfectrl <- rfeControl(functions=lmFuncs,
method="cv",number=10,verbose=TRUE,returnResamp = "final")
results <- rfe(mydata[,1:4],mydata[,49],sizes =
c(1:49),rfeControl=rfectrl,method = "svmRadial")
#metric = "Rsquared"
print(results)
predictors(results)
plot(results, type=c("g", "o"))
[环境叙述]:
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)
[关键字]:
caret、dataframe、date
作者: celestialgod (天)   2015-11-03 08:40:00
算correlation看看是不是有两个变量跟其他变量相关系数很高这个真像实价等登录的资料感觉是input date出错,date是你的变量之一吗?
楼主: babysian7 (Babysian)   2015-11-03 13:42:00
您好,里面的两个变量date型态,我想把他们当作input,但不知道是哪里出错了
作者: celestialgod (天)   2015-11-03 14:08:00
http://tinyurl.com/p6hbvjy跟我想法一致XDD我自己去生成date去跑没问题 他当成整数在run应该是你资料有一部分是相依我也试过NA没有问题
楼主: babysian7 (Babysian)   2015-11-06 16:58:00
您好:谢谢您的解答。另外在更改的过程中有新的问题,我把NA的部分都改掉,错误讯息是missing value where TRUE/FALSE needed In adition:There were20 warnings(use warnings() to see them)不是很明白,因为我的资料都是连续型的数值,没有TRUE/FALSE...
作者: celestialgod (天)   2015-11-07 11:25:00
没看到程式 我也无法隔空抓药 如果能附资料一起 我才能重现错误 并尝试找出解决方法
楼主: babysian7 (Babysian)   2015-11-11 13:35:00
您好:我将资料整理好如下https://www.dropbox.com/sh/u62abna1cp4fw8n/AAC9EXdhNN8GKdVqkgOM6OQ-a?dl=0谢谢
作者: celestialgod (天)   2015-11-12 21:45:00
放弃~"~ 不知道怎么办qq写信去问作者吧QQ
楼主: babysian7 (Babysian)   2015-11-13 13:00:00
还是谢谢您拨空帮忙:)

Links booklink

Contact Us: admin [ a t ] ucptt.com