r - GET {httr} returns a Bad Request response -
i trying scrape html elements of url stored insearchlink
. method worked me ishtmltreeparse
{xml}.however it's not returning elements i'm looking for. example:img[@title='add compare']
searchlink <- "http://www.realtor.ca/map.aspx#cultureid=1&applicationid=1&recordsperpage=9&maximumresults=9&propertytypeid=300&transactiontypeid=2&sortorder=a&sortby=1&longitudemin=-114.52066040039104&longitudemax=-113.60536193847697&latitudemin=50.94776904194829&latitudemax=51.14246522072541&pricemin=0&pricemax=0&bedrange=0-0&bathrange=0-0&parkingspacerange=0-0&viewstate=m&longitude=-114.063011169434&latitude=51.0452194213867&zoomlevel=11¤tpage=1" doc <- htmltreeparse(searchlink,useinternalnodes = t) classes <- xpathsapply(doc,"//img[@title='add compare']",function(x){xmlgetattr(x,'class')})
the result of running classes above:
list()
i have tried readlines
, get
{httr} both returned error in reading url. guessing it's because of special characters in url don't know how go fixing it. response given below:
response [http://www.realtor.ca/map.aspx#cultureid=1&applicationid=1&recordsperpage=9&maximumresults=9&propertytypeid=300&transactiontypeid=2&sortorder=a&sortby=1&longitudemin=-114.52066040039104&longitudemax=-113.60536193847697&latitudemin=50.94776904194829&latitudemax=51.14246522072541&pricemin=0&pricemax=0&bedrange=0-0&bathrange=0-0&parkingspacerange=0-0&viewstate=m&longitude=-114.063011169434&latitude=51.0452194213867&zoomlevel=11¤tpage=1] date: 2014-12-01 16:46 status: 400 content-type: text/html; charset=us-ascii size: 324 b <!doctype html public "-//w3c//dtd html 4.01//en""http://www.w3.org/tr/html4/strict.dtd"> <html><head><title>bad request</title> <meta http-equiv="content-type" content="text/html; charset=us-ascii"></head> <body><h2>bad request - invalid url</h2> <hr><p>http error 400. request url invalid.</p> </body></html>
try removing 1 #
in url, replaced ?
library("httr") url <- "http://www.realtor.ca/map.aspx?cultureid=1&applicationid=1&recordsperpage=9&maximumresults=9&propertytypeid=300&transactiontypeid=2&sortorder=a&sortby=1&longitudemin=-114.52066040039104&longitudemax=-113.60536193847697&latitudemin=50.94776904194829&latitudemax=51.14246522072541&pricemin=0&pricemax=0&bedrange=0-0&bathrange=0-0&parkingspacerange=0-0&viewstate=m&longitude=-114.063011169434&latitude=51.0452194213867&zoomlevel=11¤tpage=1" res <- get(url) tt <- content(res)
then parse html content in tt
Comments
Post a Comment