吾爱破解 - LCG - LSG |安卓破解|病毒分析|www.52pojie.cn

 找回密码
 注册[Register]

QQ登录

只需一步,快速开始

查看: 6753|回复: 7
收起左侧

[其他转载] VBS也爬虫~VBS爬取美图查照片

  [复制链接]
xiaomingtt 发表于 2020-2-21 15:02
这两年python爬虫挺火,我的python就学了个皮毛,还不会用爬虫。我也不知道python爬虫什么原理,不过我倒是会用VBS批量下载我需要的东西,这应该 也算是爬虫吧。
前两天在52论坛看到美图查美图查,发现照片质量挺高,就想爬下来,于是就有了下面的代码。
[Visual Basic] 纯文本查看 复制代码
'u = "https://wx.xxx.com/action.php"
'cookies=HttpRequest(u, "POST", "getcookie", "mode=login&u=adminn&p=123", "")
'msgbox cookies
'u = "https://wx.xxx.com/nowyuyue.php"
's = HttpRequest(u, "GET", "setcookie", "", cookies)
'wscript.echo s

'http://meitucha.com/a/1
'http://meitucha.com/a/32157
for i = 1 to 32157
	j = 1 
	do 
		u = "http://meitucha.com/a/" & i & "?page=" & j
		s = HttpRequest(u, "GET", "", "", "")
		
		Set html = CreateObject("htmlfile")
		html.designMode = "on"
		html.write s
		title = html.title
		title = replace(replace(replace(replace(title,"\",""),"/",""),"*",""),"?","")
		title = replace(replace(replace(replace(title,"<",""),">",""),"|",""),":","")
		title = replace(replace(replace(title,chr(34),""),vblf,""),vbcr,"")
		For Each M In html.getElementsByTagName("img")
			if M.classname = "tupian_img" then 
				img = M.getAttribute("src")
				filename = right(img,len(img) - instrrev(img,"/"))
				download img,"D:\" & title & "\" & filename
				wscript.sleep 500
			end if 
		Next 
		nextpage = false 
		For Each A In html.getElementsByTagName("a")
			if A.classname = "nxt" and A.innertext = "下一页" then
				nextpage = true
				j = j + 1
			end if 
		Next
		wscript.sleep 1000
	loop until nextpage = false 
	wscript.sleep 1000
next 


Function HttpRequest(url, mode, cook, data, cookies)
'地址,GET/POST,getcookie/setcookie,请求数据, cookies
    Set cHttp = CreateObject("WinHttp.WinHttpRequest.5.1")
    With cHttp
        .SetTimeouts 80000, 80000, 80000, 80000
        .Open mode, url, False
		If cook = "setcookie" Then .SetRequestHeader "Cookie", cookies
		.SetRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36"  
		if mode = "POST" then
			.SetRequestHeader "content-Type", "application/x-www-form-urlencoded"
			.SetRequestHeader "Content-Length", Len(data)
		end if 
		.SetRequestHeader "Referer", url
		.SetRequestHeader "Cache-Control", "no-cache"
		.SetRequestHeader "Connection", "keep-alive"
        .Send data
        If .WaitForResponse Then
            If .Status = 200 Then
				If cook = "getcookie" Then
					HttpRequest = GetCookie(.getAllResponseHeaders)
				Else
					HttpRequest =.responseText	
				End If
            End If
        End If
    End With
End Function
 
Function GetCookie(str)
    a = InStr(str, "Set-Cookie: ")
    If a = 0 Then
        GetCookie = ""
    Else
        b = InStr(a, str, ";")
        c = Mid(str, a + 12, b - a - 11)
        Cookie = c
        Do
            d = InStr(b, str, "Set-Cookie: ")
            If d = 0 Then Exit Do
            e = InStr(d, str, ";")
            f = Mid(str, d + 12, e - d - 11)
            b = e
            Cookie = Cookie & " " & f
        Loop
        GetCookie = Cookie
    End If
End Function


Sub download(url,target)
	Set fso=CreateObject("scripting.FileSystemObject")
	path = left(target,instrrev(target,"\"))
	If Not FSO.FolderExists(path) Then fso.CreateFolder(path)
	Const adTypeBinary = 1
	Const adSaveCreateOverWrite = 2
	Dim http,ado
	Set http = CreateObject("Msxml2.XMLHTTP")
	http.open "GET",url,False
	http.send
	Set ado = createobject("Adodb.Stream")
	ado.Type = adTypeBinary
	ado.Open
	ado.Write http.responseBody
	ado.SaveToFile target
	ado.Close
End Sub

要说VBS爬虫,还得追溯到10年前,当时《电脑爱好者》杂志的博客质量挺高,当时有电脑,但没条件上网,就写了段VBS,把博客文章全部抓下来,以后慢慢看。当时这个VBS还在该杂志发表过(署名是当时已经分手的前女友)。
450219954839b82c7bf4800f.jpg
虽然都是用VBS爬,但技术还是有区别的,当时用的XMLHTTP和正则表达式,不能cookie登录,正则写起来也麻烦,现在用的WinHttpRequest和htmlfile对象,伪造header,cookie登录都没问题,使用htmlfile也比正则处理HTML方便很多。美图查不涉及到登录啥的,只用了一个GET,其他用法脚本里有实例,可以参考。

免费评分

参与人数 3吾爱币 +5 热心值 +2 收起 理由
yuansu0304 + 1 + 1 我很赞同!
BlovedCQ + 1 前辈,你好!
苏紫方璇 + 3 + 1 欢迎分析讨论交流,吾爱破解论坛有你更精彩!

查看全部评分

发帖前要善用论坛搜索功能,那里可能会有你要找的答案或者已经有人发布过相同内容了,请勿重复发帖。

shi128862 发表于 2020-2-21 15:27
不错的文章,这几天正在写vb的爬虫,学习了!
JuncoJet 发表于 2020-2-21 15:27
XMLHTTP和WININET都是系统管理cookie的
WINHTTP可以自己控制cookie(做自动刷单用的多
xunfish 发表于 2020-2-22 09:12
 楼主| xiaomingtt 发表于 2020-2-22 11:35
xunfish 发表于 2020-2-22 09:12
这个站我的。。。

哦?哈哈,没真的爬哈,就试了两个。你把图片分享出来,就不用费事了。
caozb 发表于 2020-3-13 12:23
出现这个错误是什么原因?
QQ截图20200313122255.jpg
ljm009 发表于 2020-4-21 01:24
试试效果
LeavesKing 发表于 2020-5-18 18:11
楼主可以导出成exe吗,感谢
您需要登录后才可以回帖 登录 | 注册[Register]

本版积分规则 警告:本版块禁止灌水或回复与主题无关内容,违者重罚!

快速回复 收藏帖子 返回列表 搜索

RSS订阅|小黑屋|处罚记录|联系我们|吾爱破解 - LCG - LSG ( 京ICP备16042023号 | 京公网安备 11010502030087号 )

GMT+8, 2024-4-27 13:17

Powered by Discuz!

Copyright © 2001-2020, Tencent Cloud.

快速回复 返回顶部 返回列表