MENU

《阿七美图馆》爬虫

January 7, 2019 • 资源教程

  • 看到群友一个美图站,于是写了个爬虫爬了下来
  • 运行环境: python3
#!/usr/bin/env python3
import requests
from lxml import etree
import os

def get_requests(url):

    headers = {
            "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"
        }

    html = requests.get(url=url,headers=headers).content.decode()

    result = etree.HTML(html)
    img_list = result.xpath("//div[@class='post row']/div/img/@data-original")
    name_list = result.xpath("//div[@class='post row']/div/img/@title")
    dir = result.xpath("//div[@class='post-info']/div[1]/span[3]/text()")
    dir = ''.join(dir)
    for img,name in zip(img_list,name_list):
        # img = 'https:' + img
        download_img_url =requests.get(url=img, headers=headers).content
        print("下载的图像: %s   路径: %s" % (name, img))

        path = os.path.join(os.getcwd(), '阿七美图馆/{}/'.format(dir))
        if not os.path.exists(path):
            os.makedirs(path)
        folder_path = path + name + '.jpg'
        with open(folder_path, 'wb') as file:
            file.write(download_img_url)


if __name__ == '__main__':
    for item in range(3,1000):
        try:
            url = "http://a7a7.net/index.php/archives/{}/".format(str(item))
            get_requests(url)
        except:
            continue

171E5FEA-A03A-439C-A0E9-13EC1B35200C.png

Leave a Comment

7 Comments
  1. 苏     Windows 10 /    Google Chrome

    啥时候弄点高清图

  2. 克莱 克莱     Windows 10 /    FireFox

    下载的到哪里了啊

  3. 心语难诉 心语难诉     Windows 10 /    Google Chrome

    前来串门~ ::aru:pouting::

    1. 左岸 左岸     Android Pie /    Google Chrome

      @心语难诉嘿嘿,每天都去你那 ::quyin:1huaji::

  4. 左岸 左岸     Windows 10 /    Google Chrome

    ::quyin:1huaji::

  5. 难拥你i 难拥你i     Windows 7 /    Google Chrome

    大佬666

    1. 左岸 左岸     Windows 10 /    Google Chrome

      @难拥你i ::aru:cryingface:: ::aru:cryingface:: ::aru:cryingface::