爬虫技术 httpclient模拟发包
1 http数据包的组成
作为一个去模拟发包的程序猿,在实际发包之前,你首先要知道的是,数据包中你需要关注的东西。
1.1 url
这个不用多说,就是发出去的请求
1.2 请求类型
常见的如get put delete post 等
1.3 请求头
包含一些请求头字段 比如爬虫中经常用到的 User-Agent等
1.4 请求体
主要是post 请求中附带的参数,如提交的表单等
1.5 Cookie
浏览器缓存
对于两个Http请求,如果上面的5个部分是完全相等的,他们在服务器看来就是等价的。所以要完成模拟发包,就要能够设置以上5个部分。下面来看具体操作
2 如何获取httpclient 相关jar包
可以去http://hc.apache.org/downloads.cgi 下载最新的jar包
或者使用maven 构建,maven的相关配置为
<dependency> <groupId>org.apache.httpcomponents</groupId> <artifactId>httpclient</artifactId> <version>4.5.5</version> </dependency>
3 常用的动能
3.1 构建不同的请求
String url="https://mp.csdn.net/"; //构建get请求 HttpGet httpGet=new HttpGet(url); //构建post请求 HttpPost httpPost=new HttpPost(url); //put 请求 HttpPut httpPut=new HttpPut(url); //....
3.2 设置http 头参数
//设置 请求头参数 httpGet.setHeader("User-Agent","Mozilla/5.0 (Windows NT 10.0; …) Gecko/20100101 Firefox/60.0"); httpGet.setHeader("Host","mp.csdn.net");
3.3设置请求体
//设置请求参数 //设置请求参数通过HttpEntity //1设置formdata; List<NameValuePair> paramList = new ArrayList<NameValuePair>(); paramList.add(new BasicNameValuePair("a", "b")); HttpEntity formData = null; try { formData = new UrlEncodedFormEntity(paramList, "utf-8"); //处理编码 } catch (Exception e) { e.printStackTrace(); } httpPost.setEntity(formData); //2 设置jsondata json字符串 StringEntity stringEntity = new StringEntity("{a:b}","utf-8"); httpPost.setEntity(stringEntity);
3.5设置cookie 及发送请求
//3设置cookie 及发送请求 CookieStore cookieStore = new BasicCookieStore(); CloseableHttpClient httpClient = HttpClients.custom() .setDefaultCookieStore(cookieStore) .build(); BasicClientCookie cookie = new BasicClientCookie("co", "ba"); cookie.setDomain(url); cookieStore.addCookie(cookie); HttpResponse httpResponse=null; try { httpResponse= httpClient.execute(httpPost); } catch (IOException e) { e.printStackTrace(); }
3.6获取返回结果
//获取返回http 状态码 httpResponse.getStatusLine().getStatusCode(); //获取返回结果 try { BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(httpResponse.getEntity().getContent())); StringBuffer sb = new StringBuffer(""); String line = ""; String NL = System.getProperty("line.separator"); while ((line = bufferedReader.readLine()) != null) { sb.append(line + NL); } bufferedReader.close(); String res= sb.toString(); } catch (IOException e) { e.printStackTrace(); }
原文链接:https://blog.csdn.net/qq_34661726/article/details/80599488