应该是某些网站对这种“网络收集器”类的东西进行了过滤,你设置请求头伪装成浏览器应该可以的


就是需要setheader


代码如下:

httpclient 模拟浏览器动作需注意的cookie和HTTP头等信息

commons-httpclient是apache下的一个开源项目,提供了一个纯java实现的http客户端,使用它可以很方便发送HTTP请求,接受HTTP应答,自动管理Cookie等等。

对于contact-list类库来说,需要使用的功能有,自动管理Cookie,设置HTTP头,发送HTTP请求,接受HTTP应答,转发HTTP重定向,还有输出HTTP请求/应答日志,下面对这些功能的实现进行解释:

1. 自动管理Cookie

public EmailImporter(String email, String password, String encoding) { 
......
client = new HttpClient();
client.getParams().setCookiePolicy(CookiePolicy.BROWSER_COMPATIBILITY);
client.getParams().setParameter("http.protocol.single-cookie-header", true);
}
其中将HttpClient的Cookie策略设置为CookiePolicy.BROWSER_COMPATIBILITY,即表示java client将按照浏览器的方式来自动处理Cookie。当然你也可以在运行过程中手动调整cookie,比如:

hotmail登录之前需要设置当前时间的Cookie:
client.getState().addCookie(new Cookie("login.live.com", "CkTst", "G" + new Date().getTime())); 

不过,httpclient似乎没有提供删除cookie的功能,于是我增加了两个cookie管理的接口,一个是保留指定的cookies,一个是删除指定的cookies:

protected void retainCookies(String[] cookieNames) { 
Cookie[] cookies = client.getState().getCookies();
ArrayList<Cookie> retainCookies = new ArrayList<Cookie>();
for (Cookie cookie : cookies) {
if (Arrays.binarySearch(cookieNames, cookie.getName()) >= 0) {
retainCookies.add(cookie);
}
}
client.getState().clearCookies();
client.getState().addCookies(retainCookies.toArray(new Cookie[0]));
}

protected void removeCookies(String[] cookieNames) {
Cookie[] cookies = client.getState().getCookies();
ArrayList<Cookie> retainCookies = new ArrayList<Cookie>();
for (Cookie cookie : cookies) {
if (Arrays.binarySearch(cookieNames, cookie.getName()) < 0) {
retainCookies.add(cookie);
}
}
client.getState().clearCookies();
client.getState().addCookies(retainCookies.toArray(new Cookie[0]));
}

2. 设置HTTP头:

http头的设置,可以让邮件服务器认为是在和浏览器打交道,而避免被refuse的可能

private void setHeaders(HttpMethod method) { 
method.setRequestHeader("Accept", "text/html,application/xhtml+xml,application/xml;");
method.setRequestHeader("Accept-Language", "zh-cn");
method.setRequestHeader("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3");
method.setRequestHeader("Accept-Charset", encoding);
method.setRequestHeader("Keep-Alive", "300");
method.setRequestHeader("Connection", "Keep-Alive");
method.setRequestHeader("Cache-Control", "no-cache");
}

另外,在GET和POST的时候设置referer值,以及在POST的时候设置Content-Type:

protected String doPost(String actionUrl, NameValuePair[] params, String referer) throws HttpException, IOException { 
......
method.setRequestHeader("Referer", referer);
method.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");
......
}

3. 发送HTTP请求,接收HTTP应答。在contact-list中只使用了GET和POST请求,我也做了简单的封装:

protected String doGet(String url, String referer) throws HttpException, IOException { 
GetMethod method = new GetMethod(url);
setHeaders(method);
method.setRequestHeader("Referer", referer);
// log request
client.executeMethod(method);
String responseStr = readInputStream(method.getResponseBodyAsStream());
// log response
method.releaseConnection();
lastUrl = method.getURI().toString();
return responseStr;
}

protected String doPost(String actionUrl, NameValuePair[] params, String referer) throws HttpException, IOException {
PostMethod method = new PostMethod(actionUrl);
setHeaders(method);
method.setRequestHeader("Referer", referer);
method.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");
method.setRequestBody(params);
// log request
client.executeMethod(method);
String responseStr = readInputStream(method.getResponseBodyAsStream());
// log response
method.releaseConnection();
if (method.getResponseHeader("Location") != null) {
// do redirect
} else {
lastUrl = method.getURI().toString();
return responseStr;
}
}
4. HTTP重定向,主要是两种,一种是根据HTTP头的Location
if (method.getResponseHeader("Location").getValue().startsWith("http")) { 
return doGet(method.getResponseHeader("Location").getValue());
} else {
return doGet("http://" + getResponseHost(method) + method.getResponseHeader("Location").getValue());
}
另一种是根据javascript中的window.location.replace。

5. 输出请求/应答日志,这个对调试非常重要:
private void logGetRequest(GetMethod method) throws URIException { 
logger.debug("do get request: " + method.getURI().toString());
logger.debug("header:/n" + getHeadersStr(method.getRequestHeaders()));
logger.debug("cookie:/n" + getCookieStr());
}

private void logGetResponse(GetMethod method, String responseStr) throws URIException {
logger.debug("do get response: " + method.getURI().toString());
logger.debug("header: /n" + getHeadersStr(method.getResponseHeaders()));
logger.debug("body: /n" + responseStr);
}

private void logPostRequest(PostMethod method) throws URIException {
logger.debug("do post request: " + method.getURI().toString());
logger.debug("header:/n" + getHeadersStr(method.getRequestHeaders()));
logger.debug("body:/n" + getPostBody(method.getParameters()));
logger.debug("cookie:/n" + getCookieStr());
}


private void logPostResponse(PostMethod method, String responseStr) throws URIException {
logger.debug("do post response:" + method.getURI().toString());
logger.debug("header:/n" + getHeadersStr(method.getResponseHeaders()));
logger.debug("body:/n" + responseStr);
}

private String getHeadersStr(Header[] headers) {
StringBuilder builder = new StringBuilder();
for (Header header : headers) {
builder.append(header.getName()).append(": ").append(header.getValue()).append("/n");
}
return builder.toString();
}


private String getPostBody(NameValuePair[] postValues) {
StringBuilder builder = new StringBuilder();
for (NameValuePair pair : postValues) {
builder.append(pair.getName()).append(":").append(pair.getValue()).append("/n");
}
return builder.toString();
}

private String getCookieStr() {
Cookie[] cookies = client.getState().getCookies();
StringBuilder builder = new StringBuilder();
for (Cookie cookie : cookies) {
builder.append(cookie.getDomain()).append(":")
.append(cookie.getName()).append("=").append(cookie.getValue()).append(";")
.append(cookie.getPath()).append(";")
.append(cookie.getExpiryDate()).append(";")
.append(cookie.getSecure()).append(";/n");
}
return builder.toString();
}



更多相关文章

  1. javascript(六)js事件绑定浏览器兼容解决方案 attachEvent addEve
  2. java数组常用功能
  3. 基于JavaScript实现验证码功能
  4. 在Java中,使用DefaultSelenium对象在selenium中启动测试,我如何找
  5. JAVA 实现tail -f 日志文件监控功能
  6. JSP+JavaBean实现用户登录功能

随机推荐

  1. mysql优化---in型子查询,exists子查询,from
  2. MySql常用命令总结
  3. MYSQL 多表联合查询并分组的问题?
  4. mybatis中mysql多条件查询
  5. 怎样在办公网络通过跳板机获取生产环境的
  6. PLSQL乱码解决方案
  7. mysql每天凌晨0点准时启动taskeng.exe如
  8. 50个查询系列-第9个查询:查询所有课程成绩
  9. 新手求学:用VF如何去添加、删除、修改SQL
  10. SQL Server中的TextPtr函数