r - GET {httr} returns a Bad Request response -


i trying scrape html elements of url stored insearchlink. method worked me ishtmltreeparse {xml}.however it's not returning elements i'm looking for. example:img[@title='add compare']

searchlink <- "http://www.realtor.ca/map.aspx#cultureid=1&applicationid=1&recordsperpage=9&maximumresults=9&propertytypeid=300&transactiontypeid=2&sortorder=a&sortby=1&longitudemin=-114.52066040039104&longitudemax=-113.60536193847697&latitudemin=50.94776904194829&latitudemax=51.14246522072541&pricemin=0&pricemax=0&bedrange=0-0&bathrange=0-0&parkingspacerange=0-0&viewstate=m&longitude=-114.063011169434&latitude=51.0452194213867&zoomlevel=11&currentpage=1"   doc <- htmltreeparse(searchlink,useinternalnodes = t)      classes <- xpathsapply(doc,"//img[@title='add compare']",function(x){xmlgetattr(x,'class')}) 

the result of running classes above:

list() 

i have tried readlines , get {httr} both returned error in reading url. guessing it's because of special characters in url don't know how go fixing it. response given below:

response [http://www.realtor.ca/map.aspx#cultureid=1&applicationid=1&recordsperpage=9&maximumresults=9&propertytypeid=300&transactiontypeid=2&sortorder=a&sortby=1&longitudemin=-114.52066040039104&longitudemax=-113.60536193847697&latitudemin=50.94776904194829&latitudemax=51.14246522072541&pricemin=0&pricemax=0&bedrange=0-0&bathrange=0-0&parkingspacerange=0-0&viewstate=m&longitude=-114.063011169434&latitude=51.0452194213867&zoomlevel=11&currentpage=1]   date: 2014-12-01 16:46   status: 400   content-type: text/html; charset=us-ascii   size: 324 b <!doctype html public "-//w3c//dtd html 4.01//en""http://www.w3.org/tr/html4/strict.dtd"> <html><head><title>bad request</title> <meta http-equiv="content-type" content="text/html; charset=us-ascii"></head> <body><h2>bad request - invalid url</h2> <hr><p>http error 400. request url invalid.</p> </body></html>  

try removing 1 # in url, replaced ?

library("httr") url <- "http://www.realtor.ca/map.aspx?cultureid=1&applicationid=1&recordsperpage=9&maximumresults=9&propertytypeid=300&transactiontypeid=2&sortorder=a&sortby=1&longitudemin=-114.52066040039104&longitudemax=-113.60536193847697&latitudemin=50.94776904194829&latitudemax=51.14246522072541&pricemin=0&pricemax=0&bedrange=0-0&bathrange=0-0&parkingspacerange=0-0&viewstate=m&longitude=-114.063011169434&latitude=51.0452194213867&zoomlevel=11&currentpage=1" res <- get(url) tt <- content(res) 

then parse html content in tt


Comments

Popular posts from this blog

c++ - OpenMP unpredictable overhead -

ruby on rails - RuntimeError: Circular dependency detected while autoloading constant - ActiveAdmin.register Role -

javascript - Wordpress slider, not displayed 100% width -