Re: [问题] 资料堆叠

楼主: celestialgod (天)   2015-08-07 17:07:29
※ 引述《naturalsmen (日日夜夜)》之铭言:
: ※ 引述《celestialgod (攸蓝)》之铭言:
: 恕删
: 借这篇问一下
: 要怎么避免spread后r自动排序的问题??
: 如果spread(num, sth.)的num是包含>9的数字的名称
: 例如: student1~student10
: r会自己把它排成student1 student10 student2 ... student9
: 这样的情况要怎么解决??
我有两个笨方法,一个是重新排列columns,另一个是改名
另外还有一个好方法是factorize
好读版:http://pastebin.com/BPWGgByi
library(data.table)
library(dplyr)
library(tidyr)
library(magrittr)
DT = data.table(stu = paste0("stu", 1:20),
X = rnorm(20), Y = rnorm(20)) %>%
gather(cate, values, -stu)
DT %>% spread(stu, values) %>% tbl_dt(FALSE)
# cate stu1 stu10 stu11 stu12 stu13 stu14
# 1 X -0.08476976 0.5428922 1.9929332 -0.6145632 -0.06098296 -1.066283
# 2 Y 0.59710869 -1.0037766 0.3508158 0.4587201 -0.13639207 1.385517
# Variables not shown: stu15 (dbl), stu16 (dbl), stu17 (dbl), stu18 (dbl),
# stu19 (dbl), stu2 (dbl), stu20 (dbl), stu3 (dbl), stu4 (dbl), stu5 (dbl),
# stu6 (dbl), stu7 (dbl), stu8 (dbl), stu9 (dbl)
## factorize
DT = data.table(stu = paste0("stu", 1:20),
X = rnorm(20), Y = rnorm(20)) %>%
gather(cate, values, -stu)
DT %>% mutate(stu = factor(stu, levels = paste0("stu", 1:20))) %>%
spread(stu, values) %>% tbl_dt(FALSE)
# cate stu1 stu2 stu3 stu4 stu5 stu6
# 1 X 1.6890231 -1.300332 -1.378376 -1.874321 0.54141060 -1.2848391
# 2 Y 0.2796895 1.635385 1.048334 0.424909 0.09111916 -0.4147811
# Variables not shown: stu7 (dbl), stu8 (dbl), stu9 (dbl), stu10 (dbl), stu11
# (dbl), stu12 (dbl), stu13 (dbl), stu14 (dbl), stu15 (dbl), stu16 (dbl),
# stu17 (dbl), stu18 (dbl), stu19 (dbl), stu20 (dbl)
## method 1 with select
DT_spread = DT %>% spread(stu, values)
sele_names = setdiff(names(DT_spread), paste0("stu", 1:20))
col_num = match(c(sele_names, paste0("stu", 1:20)), names(DT_spread))
DT_spread %>% select(col_num) %>% tbl_dt(FALSE)
# cate stu1 stu2 stu3 stu4 stu5 stu6
# 1 X 1.6890231 -1.300332 -1.378376 -1.874321 0.54141060 -1.2848391
# 2 Y 0.2796895 1.635385 1.048334 0.424909 0.09111916 -0.4147811
# Variables not shown: stu7 (dbl), stu8 (dbl), stu9 (dbl), stu10 (dbl), stu11
# (dbl), stu12 (dbl), stu13 (dbl), stu14 (dbl), stu15 (dbl), stu16 (dbl),
# stu17 (dbl), stu18 (dbl), stu19 (dbl), stu20 (dbl)
## method 2
DT %<>% mutate(stu_num = as.integer(gsub("stu(\\d*)", "\\1", stu)),
stu = sprintf("stu%02i", stu_num)) %>% select(-stu_num)
DT %>% spread(stu, values) %>% tbl_dt(FALSE)
# cate stu01 stu02 stu03 stu04 stu05 stu06
# 1 X 1.6890231 -1.300332 -1.378376 -1.874321 0.54141060 -1.2848391
# 2 Y 0.2796895 1.635385 1.048334 0.424909 0.09111916 -0.4147811
# Variables not shown: stu07 (dbl), stu08 (dbl), stu09 (dbl), stu10 (dbl),
# stu11 (dbl), stu12 (dbl), stu13 (dbl), stu14 (dbl), stu15 (dbl), stu16
# (dbl), stu17 (dbl), stu18 (dbl), stu19 (dbl), stu20 (dbl)
作者: naturalsmen (日日夜夜)   2015-08-08 08:47:00
花了一些时间才看懂1, 2哈哈哈 谢c大

Links booklink

Contact Us: admin [ a t ] ucptt.com