Re: [问题] 抓取气象局地震资料

楼主: celestialgod (天)   2019-09-26 23:43:55
※ 引述《hanglong (小焕)》之铭言:
: [问题类型]:
: 程式咨询(我想用R 做某件事情,但是我不知道要怎么用R 写出来)
: 我想要抓台湾每次地震,固定地点的震度
: [软件熟悉度]:
: 新手(没写过程式,R 是我的第一次)
: [问题叙述]:
: 以台湾时间09-11 05:24的地震为例,它的网址是:
: https://scweb.cwb.gov.tw/zh-tw/earthquake/details/2019091105245636
: 以此网址为例,我可以撷取固定地点,例如 玉山 的资料
: 但是地震发生很多次,每次的网址都不一样,不可能每次都手动抓网址,
: 再用R抓到玉山的震度。
: 我在地震资料的网址中,可以从原始码中看到每个地震的连结,
: https://scweb.cwb.gov.tw/zh-tw/earthquake/data/
: 因此,如果有办法可以从这里撷取到每次地震的网址,
: 应该就可以完成我的需求,但是这个网址的部分,不知道该如何撷取,
: 在请版上的各位先进帮忙,谢谢。
: [程式范例]:
:
: 在https://scweb.cwb.gov.tw/zh-tw/earthquake/details/2019091105245636之下
: 利用以下程式,可以看到玉山的震度:
: data <-
: read_html("https://scweb.cwb.gov.tw/zh-tw/earthquake/details/2019091105245636")
: ths <- xml_find_all(data, "//div/ul/li")
: xml_text(ths)[substring(xml_text(ths),1,2) == "玉山"]
: [1] "玉山 1"
: 但在https://scweb.cwb.gov.tw/zh-tw/earthquake/data/之下,
: 我想要用一样的方式,至少先撷取出网址的位置,结果什么都没有...
: 程式如下:
: data <-
: read_html("https://scweb.cwb.gov.tw/zh-tw/earthquake/data/")
: ths <- xml_find_all(data, "//div/table/tbody/tr/td/a")
: xml_text(ths)
: character(0)
: [环境叙述]:
:
: [关键字]:
: 中央气象局 地震
看原PO好像还没解决问题...
我来补上我的做法
先上结果图: https://i.imgur.com/0PWmvmf.png
逻辑解释:
基本上观察一下network可以找到相关的id
它便是用ajaxhandler这个API去捞地震资料下来
所以我们可以在network里面看到这个网址:
https://scweb.cwb.gov.tw/zh-tw/earthquake/ajaxhandler
然后Network里面会跟你说它是用POST,然后打了一个很长的form (下面POST的body)
只要乖乖照着打就可以拿到10笔了
然后只要把length那个参数改掉就能拿更多笔 (九月现在最多25笔)
然后改Search就能切换月份,讲完了.... 直接上程式
程式码:
library(httr)
library(pipeR)
library(xml2)
library(stringr)
url <- "https://scweb.cwb.gov.tw/zh-tw/earthquake/ajaxhandler"
referer_url <- "https://scweb.cwb.gov.tw/zh-tw/earthquake/data/"
response_data <- POST(url, body = list("draw" = "1",
"columns[0][data]" = "0",
"columns[0][name]" = "EventNo",
"columns[0][searchable]" = "false",
"columns[0][orderable]" = "true",
"columns[0][search][value]" = "",
"columns[0][search][regex]" = "false",
"columns[1][data]" = "1",
"columns[1][name]" = "MaxIntensity",
"columns[1][searchable]" = "true",
"columns[1][orderable]" = "true",
"columns[1][search][value]" = "",
"columns[1][search][regex]" = "false",
"columns[2][data]" = "2",
"columns[2][name]" = "OriginTime",
"columns[2][searchable]" = "true",
"columns[2][orderable]" = "true",
"columns[2][search][value]" = "",
"columns[2][search][regex]" = "false",
"columns[3][data]" = "3",
"columns[3][name]" = "MagnitudeValue",
"columns[3][searchable]" = "true",
"columns[3][orderable]" = "true",
"columns[3][search][value]" = "",
"columns[3][search][regex]" = "false",
"columns[4][data]" = "4",
"columns[4][name]" = "Depth",
"columns[4][searchable]" = "true",
"columns[4][orderable]" = "true",
"columns[4][search][value]" = "",
"columns[4][search][regex]" = "false",
"columns[5][data]" = "5",
"columns[5][name]" = "Description",
"columns[5][searchable]" = "true",
"columns[5][orderable]" = "true",
"columns[5][search][value]" = "",
"columns[5][search][regex]" = "false",
"columns[6][data]" = "6",
"columns[6][name]" = "Description",
"columns[6][searchable]" = "true",
"columns[6][orderable]" = "true",
"columns[6][search][value]" = "",
"columns[6][search][regex]" = "false",
"order[0][column]" = "2",
"order[0][dir]" = "desc",
"start" = "0",
"length" = "10",
"search[value]" = "",
"search[regex]" = "false",
"Search" = "2019年9月",
"txtSDate" = "",
"txtEDate" = "",
"txtSscale" = "",
"txtEscale" = "",
"txtSdepth" = "",
"txtEdepth" = "",
"txtLonS" = "",
"txtLonE" = "",
"txtLatS" = "",
"txtLatE" = "",
"ddlCity" = "",
"ddlCitySta" = "",
"txtIntensityB" = "",
"txtIntensityE" = "",
"txtLon" = "",
"txtLat" = "",
"txtKM" = "",
"ddlStationName" = ""),
encode = "form",
add_headers(Referer = referer_url)) %>>% content
data_ids <- sapply(response_data$data, `[[`, 1L)
details_url <- "https://scweb.cwb.gov.tw/zh-tw/earthquake/details/"
detail_urls <- paste0(details_url, data_ids)
eq_details <- lapply(detail_urls, function(url){
GET(url) %>>% content %>>%
xml_find_all("//ul[@class='eqResultBoxRight BulSet BkeyinList']") %>>%
xml_find_all("li") %>>%
xml_text %>>%
str_replace_all("[\\s]", "") %>>%
`[`(2L:6L)
})
有任何问题再推文问吧

Links booklink

Contact Us: admin [ a t ] ucptt.com