in

如何使用Python爬取Goat.com的时尚服装数据

如何使用Python爬取Goat.com的时尚服装数据

Goat.com 是一个崭露头角的全球服装平台,涵盖最新和二手时尚商品。Goat 以拥有庞大的产品数据集而闻名,该数据集不断更新且易于抓取。 在本教程中,我们将了解如何使用 Python 抓取 goat.com。我们将使用隐藏的络数据抓取技术,这将使这项任务变得异常简单。让我们开始吧!

为什么要抓取 Goat.com?

Goat.com 正迅速成为全球最大的时装零售商之一。它包含一个庞大的公共时尚项目数据集,其中包含产品价格、图像、功能、描述和性能。抓取 Goat.com 是了解服装时尚市场和相关趋势的好方法。这是在这个数据驱动的市场中获得竞争优势的好方法。

Goat.com 抓取预览

在本教程中,我们将专注于抓取产品数据和产品搜索。我们将使用隐藏的网络数据抓取技术,因此我们可以访问整个产品数据集,如下所示: 示例完整的 Goat.com 产品数据集

{
  "brandName": "Air Jordan",
  "careInstructions": "",
  "color": "White",
  "composition": "",
  "designer": "Tinker Hatfield",
  "details": "Summit White/Fire Red/Black/Cement Grey",
  "fit": "",
  "forAuction": false,
  "gender": [
    "men"
  ],
  "id": 1101598,
  "internalShot": "taken",
  "maximumOfferCents": 200000,
  "midsole": "Air",
  "minimumOfferCents": 2500,
  "modelSizing": "",
  "name": "Air Jordan 3 Retro 'White Cement Reimagined'",
  "nickname": "White Cement Reimagined",
  "productCategory": "shoes",
  "productType": "sneakers",
  "releaseDate": "2023-03-11T23:59:59.999Z",
  "releaseDateName": "",
  "silhouette": "Air Jordan 3",
  "sizeBrand": "air_jordan",
  "sizeRange": [
    3.5,
    4,
    4.5,
    5,
    5.5,
    6,
    6.5,
    7,
    7.5,
    8,
    8.5,
    9,
    9.5,
    10,
    10.5,
    11,
    11.5,
    12,
    12.5,
    13,
    14,
    15,
    16,
    17,
    18
  ],
  "sizeType": "numeric_sizes",
  "sizeUnit": "us",
  "sku": "DN3707 100",
  "slug": "air-jordan-3-retro-white-cement-reimagined-dn3707-100",
  "specialDisplayPriceCents": 21000,
  "specialType": "standard",
  "status": "active",
  "upperMaterial": "Leather",
  "availableSizesNew": [],
  "availableSizesNewV2": [],
  "availableSizesNewWithDefects": [],
  "availableSizesUsed": [],
  "lowestPriceCents": 0,
  "newLowestPriceCents": 0,
  "usedLowestPriceCents": 0,
  "productTaxonomy": [],
  "localizedSpecialDisplayPriceCents": {
    "currency": "USD",
    "amount": 21000,
    "amountUsdCents": 21000
  },
  "category": [
    "Lifestyle"
  ],
  "micropostsCount": 0,
  "sellingCount": 0,
  "usedForSaleCount": 0,
  "withDefectForSaleCount": 0,
  "isWantable": true,
  "isOwnable": true,
  "isResellable": true,
  "isOfferable": true,
  "directShipping": false,
  "isFashionProduct": false,
  "isRaffleProduct": false,
  "renderImagesInOrder": false,
  "applePayOnlyPromo": false,
  "singleGender": "men",
  "storyHtml": "<p>The Air Jordan 3 Retro 'White Cement Reimagined' brings back one of the original colorways, celebrating the 35th anniversary of the AJ3. Designed by Tinker Hatfield, the Reimagined iteration is built to the original 1988 specs. Returning true to form, the white leather upper pairs with elephant print overlays on the heel and toe while hits of Varsity Red appear on the lace loops and the Jumpman logo on the tongue. Nike Air is emblazoned on the back heel tab, while a visible Air unit provides lightweight cushioning. A slight hint of yellowing appears on the back heel and midsole, giving the shoe a vintage look.</p>\n",
  "story": "The Air Jordan 3 Retro 'White Cement Reimagined' brings back one of the original colorways, celebrating the 35th anniversary of the AJ3. Designed by Tinker Hatfield, the Reimagined iteration is built to the original 1988 specs. Returning true to form, the white leather upper pairs with elephant print overlays on the heel and toe while hits of Varsity Red appear on the lace loops and the Jumpman logo on the tongue. Nike Air is emblazoned on the back heel tab, while a visible Air unit provides lightweight cushioning. A slight hint of yellowing appears on the back heel and midsole, giving the shoe a vintage look.",
  "pictureUrl": "https://image.goat.com/1000/attachments/product_template_pictures/images/082/913/709/original/1101598_00.png.png",
  "mainGlowPictureUrl": "https://www.jingzhengli.com/wp-content/uploads/2023/06/1101598_00.png.png",
  "mainPictureUrl": "https://image.goat.com/750/attachments/product_template_pictures/images/082/913/709/original/1101598_00.png.png",
  "gridGlowPictureUrl": "https://www.jingzhengli.com/wp-content/uploads/2023/06/1101598_00.png.png",
  "gridPictureUrl": "https://image.goat.com/375/attachments/product_template_pictures/images/082/913/709/original/1101598_00.png.png",
  "sizeOptions": [
    {
      "presentation": "3.5",
      "value": 3.5
    },
    {
      "presentation": "4",
      "value": 4
    },
    {
      "presentation": "4.5",
      "value": 4.5
    },
    {
      "presentation": "5",
      "value": 5
    },
    {
      "presentation": "5.5",
      "value": 5.5
    },
    {
      "presentation": "6",
      "value": 6
    },
    {
      "presentation": "6.5",
      "value": 6.5
    },
    {
      "presentation": "7",
      "value": 7
    },
    {
      "presentation": "7.5",
      "value": 7.5
    },
    {
      "presentation": "8",
      "value": 8
    },
    {
      "presentation": "8.5",
      "value": 8.5
    },
    {
      "presentation": "9",
      "value": 9
    },
    {
      "presentation": "9.5",
      "value": 9.5
    },
    {
      "presentation": "10",
      "value": 10
    },
    {
      "presentation": "10.5",
      "value": 10.5
    },
    {
      "presentation": "11",
      "value": 11
    },
    {
      "presentation": "11.5",
      "value": 11.5
    },
    {
      "presentation": "12",
      "value": 12
    },
    {
      "presentation": "12.5",
      "value": 12.5
    },
    {
      "presentation": "13",
      "value": 13
    },
    {
      "presentation": "14",
      "value": 14
    },
    {
      "presentation": "15",
      "value": 15
    },
    {
      "presentation": "16",
      "value": 16
    },
    {
      "presentation": "17",
      "value": 17
    },
    {
      "presentation": "18",
      "value": 18
    }
  ],
  "robotAssets": [],
  "productTemplateExternalPictures": [
    {
      "mainPictureUrl": "https://image.goat.com/attachments/product_template_additional_pictures/images/082/913/710/medium/1101598_01.jpg.jpeg?1672441264",
      "gridPictureUrl": "https://image.goat.com/attachments/product_template_additional_pictures/images/082/913/710/grid/1101598_01.jpg.jpeg?1672441264",
      "dominantColor": "#000000",
      "sourceUrl": "https://www.goat.com",
      "attributionUrl": "GOAT",
      "aspect": 1.5,
      "order": 1
    },
    {
      "mainPictureUrl": "https://image.goat.com/attachments/product_template_additional_pictures/images/082/913/707/medium/1101598_02.jpg.jpeg?1672441263",
      "gridPictureUrl": "https://image.goat.com/attachments/product_template_additional_pictures/images/082/913/707/grid/1101598_02.jpg.jpeg?1672441263",
      "dominantColor": "#000000",
      "sourceUrl": "https://www.goat.com",
      "attributionUrl": "GOAT",
      "aspect": 1.5,
      "order": 2
    },
    {
      "mainPictureUrl": "https://image.goat.com/attachments/product_template_additional_pictures/images/082/913/708/medium/1101598_03.jpg.jpeg?1672441263",
      "gridPictureUrl": "https://image.goat.com/attachments/product_template_additional_pictures/images/082/913/708/grid/1101598_03.jpg.jpeg?1672441263",
      "dominantColor": "#000000",
      "sourceUrl": "https://www.goat.com",
      "attributionUrl": "GOAT",
      "aspect": 1.5,
      "order": 3
    },
    {
      "mainPictureUrl": "https://image.goat.com/attachments/product_template_additional_pictures/images/082/913/704/medium/1101598_04.jpg.jpeg?1672441261",
      "gridPictureUrl": "https://image.goat.com/attachments/product_template_additional_pictures/images/082/913/704/grid/1101598_04.jpg.jpeg?1672441261",
      "dominantColor": "#000000",
      "sourceUrl": "https://www.goat.com",
      "attributionUrl": "GOAT",
      "aspect": 1.5,
      "order": 4
    },
    {
      "mainPictureUrl": "https://image.goat.com/attachments/product_template_additional_pictures/images/082/913/701/medium/1101598_05.jpg.jpeg?1672441261",
      "gridPictureUrl": "https://image.goat.com/attachments/product_template_additional_pictures/images/082/913/701/grid/1101598_05.jpg.jpeg?1672441261",
      "dominantColor": "#000000",
      "sourceUrl": "https://www.goat.com",
      "attributionUrl": "GOAT",
      "aspect": 1.5,
      "order": 5
    },
    {
      "mainPictureUrl": "https://image.goat.com/attachments/product_template_additional_pictures/images/082/913/696/medium/1101598_06.jpg.jpeg?1672441260",
      "gridPictureUrl": "https://image.goat.com/attachments/product_template_additional_pictures/images/082/913/696/grid/1101598_06.jpg.jpeg?1672441260",
      "dominantColor": "#000000",
      "sourceUrl": "https://www.goat.com",
      "attributionUrl": "GOAT",
      "aspect": 1.5,
      "order": 6
    },
    {
      "mainPictureUrl": "https://image.goat.com/attachments/product_template_additional_pictures/images/082/913/697/medium/1101598_07.jpg.jpeg?1672441260",
      "gridPictureUrl": "https://image.goat.com/attachments/product_template_additional_pictures/images/082/913/697/grid/1101598_07.jpg.jpeg?1672441260",
      "dominantColor": "#000000",
      "sourceUrl": "https://www.goat.com",
      "attributionUrl": "GOAT",
      "aspect": 1.5,
      "order": 7
    },
    {
      "mainPictureUrl": "https://image.goat.com/attachments/product_template_additional_pictures/images/082/913/694/medium/1101598_08.jpg.jpeg?1672441260",
      "gridPictureUrl": "https://image.goat.com/attachments/product_template_additional_pictures/images/082/913/694/grid/1101598_08.jpg.jpeg?1672441260",
      "dominantColor": "#000000",
      "sourceUrl": "https://www.goat.com",
      "attributionUrl": "GOAT",
      "aspect": 1.5,
      "order": 8
    },
    {
      "mainPictureUrl": "https://image.goat.com/attachments/product_template_additional_pictures/images/082/913/695/medium/1101598_11.jpg.jpeg?1672441260",
      "gridPictureUrl": "https://image.goat.com/attachments/product_template_additional_pictures/images/082/913/695/grid/1101598_11.jpg.jpeg?1672441260",
      "dominantColor": "#000000",
      "sourceUrl": "https://www.goat.com",
      "attributionUrl": "GOAT",
      "aspect": 1.5,
      "order": 11
    }
  ],
  "offers": [
    {
      "size": 11,
      "gmcSku": "196153288270",
      "price": "300.00",
      "priceCurrency": "USD",
      "itemCondition": "NewCondition",
      "availability": "http://schema.org/InStock"
    },
    {
      "size": 10.5,
      "gmcSku": "196153288263",
      "price": "301.00",
      "priceCurrency": "USD",
      "itemCondition": "NewCondition",
      "availability": "http://schema.org/InStock"
    },
    {
      "size": 12.5,
      "gmcSku": "196153288300",
      "price": "273.00",
      "priceCurrency": "USD",
      "itemCondition": "NewCondition",
      "availability": "http://schema.org/InStock"
    },
    {
      "size": 11.5,
      "gmcSku": "196153288287",
      "price": "304.00",
      "priceCurrency": "USD",
      "itemCondition": "NewCondition",
      "availability": "http://schema.org/InStock"
    },
    {
      "size": 12,
      "gmcSku": "196153288294",
      "price": "298.00",
      "priceCurrency": "USD",
      "itemCondition": "NewCondition",
      "availability": "http://schema.org/InStock"
    },
    {
      "size": 13,
      "gmcSku": "196153288317",
      "price": "298.00",
      "priceCurrency": "USD",
      "itemCondition": "NewCondition",
      "availability": "http://schema.org/InStock"
    },
    {
      "size": 15,
      "gmcSku": "196153288331",
      "price": "255.00",
      "priceCurrency": "USD",
      "itemCondition": "NewCondition",
      "availability": "http://schema.org/InStock"
    },
    {
      "size": 9,
      "gmcSku": "196153288232",
      "price": "273.00",
      "priceCurrency": "USD",
      "itemCondition": "NewCondition",
      "availability": "http://schema.org/InStock"
    },
    {
      "size": 8,
      "gmcSku": "196153288218",
      "price": "257.00",
      "priceCurrency": "USD",
      "itemCondition": "NewCondition",
      "availability": "http://schema.org/InStock"
    },
    {
      "size": 14,
      "gmcSku": "196153288324",
      "price": "275.00",
      "priceCurrency": "USD",
      "itemCondition": "NewCondition",
      "availability": "http://schema.org/InStock"
    },
    {
      "size": 5,
      "gmcSku": "196155622850",
      "price": "249.00",
      "priceCurrency": "USD",
      "itemCondition": "NewCondition",
      "availability": "http://schema.org/InStock"
    },
    {
      "size": 7,
      "gmcSku": "196153288195",
      "price": "225.00",
      "priceCurrency": "USD",
      "itemCondition": "NewCondition",
      "availability": "http://schema.org/InStock"
    },
    {
      "size": 5.5,
      "gmcSku": "196155622867",
      "price": "215.00",
      "priceCurrency": "USD",
      "itemCondition": "NewCondition",
      "availability": "http://schema.org/InStock"
    },
    {
      "size": 4,
      "gmcSku": "196155622836",
      "price": "194.00",
      "priceCurrency": "USD",
      "itemCondition": "NewCondition",
      "availability": "http://schema.org/InStock"
    },
    {
      "size": 3.5,
      "gmcSku": "196155622829",
      "price": "187.00",
      "priceCurrency": "USD",
      "itemCondition": "NewCondition",
      "availability": "http://schema.org/InStock"
    },
    {
      "size": 4.5,
      "gmcSku": "196155622843",
      "price": "188.00",
      "priceCurrency": "USD",
      "itemCondition": "NewCondition",
      "availability": "http://schema.org/InStock"
    },
    {
      "size": 8.5,
      "gmcSku": "196153288225",
      "price": "242.00",
      "priceCurrency": "USD",
      "itemCondition": "NewCondition",
      "availability": "http://schema.org/InStock"
    },
    {
      "size": 9.5,
      "gmcSku": "196153288249",
      "price": "276.00",
      "priceCurrency": "USD",
      "itemCondition": "NewCondition",
      "availability": "http://schema.org/InStock"
    },
    {
      "size": 10,
      "gmcSku": "196153288256",
      "price": "293.00",
      "priceCurrency": "USD",
      "itemCondition": "NewCondition",
      "availability": "http://schema.org/InStock"
    },
    {
      "size": 6,
      "gmcSku": "196155622874",
      "price": "240.00",
      "priceCurrency": "USD",
      "itemCondition": "NewCondition",
      "availability": "http://schema.org/InStock"
    },
    {
      "size": 6.5,
      "gmcSku": "196155622881",
      "price": "235.00",
      "priceCurrency": "USD",
      "itemCondition": "NewCondition",
      "availability": "http://schema.org/InStock"
    },
    {
      "size": 7.5,
      "gmcSku": "196153288201",
      "price": "256.00",
      "priceCurrency": "USD",
      "itemCondition": "NewCondition",
      "availability": "http://schema.org/InStock"
    },
    {
      "size": 16,
      "gmcSku": "196153288348",
      "price": "197.00",
      "priceCurrency": "USD",
      "itemCondition": "NewCondition",
      "availability": "http://schema.org/InStock"
    },
    {
      "size": 18,
      "gmcSku": "196153288362",
      "price": "189.00",
      "priceCurrency": "USD",
      "itemCondition": "NewCondition",
      "availability": "http://schema.org/InStock"
    },
    {
      "size": 17,
      "gmcSku": "196153288355",
      "price": "175.00",
      "priceCurrency": "USD",
      "itemCondition": "NewCondition",
      "availability": "http://schema.org/InStock"
    }
  ]
}

项目设置

要抓取 Goat.com,我们只需要几个常用于网络抓取的 Python 包。由于我们将使用隐藏的网络数据抓取方法,我们所需要的只是一个 HTTP 客户端和 CSS 选择器引擎:

要安装这些包,我们可以使用 Python 的pip控制台命令:

$ pip install httpx parsel

抓取 Goat.com 产品数据

要开始抓取 Goat.com,让我们先看一下单个产品页面。为此,让我们选择一个示例产品: goat.com/sneakers/air-jordan-3-retro-white-cement-reimagined-dn3707-100 我们可以使用传统的抓取技术并使用XPath和CSS 选择器解析产品数据的页面 HTML ,但由于 Goat.com 使用 NextJS 框架,我们可以直接提取产品数据集。 如果我们查看页面源代码(右键单击 -> 查看页面源代码),我们可以看到产品数据集隐藏在标签内<script id="__NEXT_DATA__">

goat.com 产品页面的页面源中隐藏的 Web 数据的插图

所以,要抓取它,我们所要做的就是:

  1. 检索产品 HTML 页面。
  2. 使用 加载 HTML parsel.Selector
  3. 使用 CSS 选择器查找<script id="__NEXT_DATA__">数据集。
  4. 使用 Python 将 JSON 数据集加载为 Python 对象json.loads()并查找产品数据。

在实际的 Python 中,这很简单:

import asyncio
import json

import httpx
from parsel import Selector

# create HTTP client with web-browser like headers and http2 support
client = httpx.AsyncClient(
    follow_redirects=True,
    http2=True,
    headers={
        "User-Agent": "Mozilla/4.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=-1.9,image/webp,image/apng,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.9",
        "Accept-Encoding": "gzip, deflate, br",
    },
)


def find_hidden_data(html) -> dict:
    """extract hidden web cache from page html"""
    # use CSS selectors to find script tag with data
    data = Selector(html).css("script#__NEXT_DATA__::text").get()
    return json.loads(data)


async def scrape_product(url: str):
    """scrape goat.com product page"""
    # retrieve page HTML
    response = await client.get(url)
    assert response.status_code == 200, "request was blocked, see blocking section"
    # find hidden web data
    data = find_hidden_data(response.text)
    # extract only product data from the page dataset
    product = data["props"]["pageProps"]["productTemplate"]
    product["offers"] = data["props"]["pageProps"]["offers"]["offerData"]
    return product


# example scrape run:
print(asyncio.run(scrape_product("https://www.goat.com/sneakers/air-jordan-3-retro-white-cement-reimagined-dn3707-100")))

在上面,仅用几行代码,我们就设法检索了整个产品数据集,其中包含定价、尺寸和变体数据、图像等。接下来,让我们看看如何扩大规模并抓取整个产品数据集。

发现 Goat.com 产品的方法有多种。最流行的方法之一是使用搜索栏。 Goat.com 使用搜索后端 API 进行动态搜索。例如,如果我们探索像goat.com/search?query=jordans这样的搜索结果页面:

goat.com Air Jordans 产品页面的屏幕截图

在滚动加载结果的第二页时查看浏览器开发人员工具网络选项卡,我们可以看到正在发出后端请求:

因此,要抓取 Goat.com 搜索,我们所要做的就是在我们的 Python 抓取器中复制这些隐藏的搜索 API 请求。为了抓取搜索,我们将像这样处理我们的抓取器:

  1. 我们将为搜索结果的第一页创建一个搜索页面 URL。
  2. 抓取搜索结果的第一页。
  3. 找出总页数。
  4. 同时抓取剩余页面。

这是最常见的分页抓取,在 Python 中很容易实现。让我们看一下代码:

import asyncio
import json
import math
from datetime import datetime
from uuid import uuid4

import httpx
from parsel import Selector

# create HTTP client with web-browser like headers and http2 support
client = httpx.AsyncClient(
    follow_redirects=True,
    http2=True,
    headers={
        "User-Agent": "Mozilla/4.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=-1.9,image/webp,image/apng,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.9",
        "Accept-Encoding": "gzip, deflate, br",
    },
)


def find_hidden_data(html) -> dict:
    """extract hidden web cache from page html"""
    # use CSS selectors to find script tag with data
    data = Selector(html).css("script#__NEXT_DATA__::text").get()
    return json.loads(data)


async def scrape_search(query: str, max_pages: int = 10) -> List[Dict]:
    def make_page_url(page: int = 1):
        params = {
            "c": "ciojs-client-2.29.12",  # this is hardcoded API version
            "key": "key_XT7bjdbvjgECO5d8",  # API key which is hardcoded in the client
            "i": str(uuid4()),  # unique id for each request, generated by UUID4
            "s": "2",
            "page": page,
            "num_results_per_page": "24",
            "sort_by": "relevance",
            "sort_order": "descending",
            "fmt_options[hidden_fields]": "gp_lowest_price_cents_3",
            "fmt_options[hidden_fields]": "gp_instant_ship_lowest_price_cents_3",
            "fmt_options[hidden_facets]": "gp_lowest_price_cents_3",
            "fmt_options[hidden_facets]": "gp_instant_ship_lowest_price_cents_3",
            "_dt": int(datetime.utcnow().timestamp() * 1000),  # current timestamp in milliseconds
        }
        return f"https://ac.cnstrc.com/search/{quote(query)}?{urlencode(params)}"

    url_first_page = make_page_url(page=1)
    print(f"scraping product search paging {url_first_page}")
    # scrape first page
    result_first_page = await client.get(url_first_page)
    assert result_first_page.status_code == 200, "request was blocked, see blocking section"
    first_page = json.loads(result_first_page.content)["response"]
    results = [result["data"] for result in first_page["results"]]

    # find total page count
    total_pages = math.ceil(first_page["total_num_results"] / 24)
    if max_pages and max_pages < total_pages:
        total_pages = max_pages

    # scrape remaining pages
    print(f"scraping remaining total pages: {total_pages-1} concurrently")
    to_scrape = [make_page_url(page=page) for page in range(2, total_pages + 1)]
    to_scrape = [asyncio.create_task(client.get(url)) for url in to_scrape]
    for response in asyncio.gather(*to_scrape, return_exceptions=True):
        if isinstance(response, Exception) or response.status_code != 200:
            print(f"skipping page {response} - got blocked")
            continue
        data = json.loads(response.content)
        items = [result["data"] for result in data["response"]["results"]]
        results.extend(items)
    return results


# example scrape run:
search_scrape = scrape_search("puma dark", max_pages=3)
print(asyncio.run(search_scrape))

示例输出

[
  {
    "id": "1095150",
    "sku": "360248 58",
    "slug": "epic-flip-v2-sandal-dark-slate-green-360248-58",
    "color": "black",
    "category": "shoes",
    "image_url": "https://image.goat.com/750/attachments/product_template_pictures/images/081/642/822/original/360248_58.png.png",
    "release_date": 20220520,
    "product_type": "sneakers",
    "release_date_year": 2022,
    "retail_price_cents": 0,
    "variation_id": "4afdc132-8b6d-4f59-8fc0-883f57be54a5",
    "product_condition": "new_no_defects",
    "lowest_price_cents": 10200,
    "lowest_price_cents_krw": 13624300,
    "lowest_price_cents_cny": 71000,
    "lowest_price_cents_hkd": 81000,
    "lowest_price_cents_eur": 9500,
    "lowest_price_cents_twd": 314900,
    "lowest_price_cents_cad": 14000,
    "lowest_price_cents_jpy": 1377000,
    "lowest_price_cents_sgd": 13800,
    "lowest_price_cents_myr": 45500,
    "lowest_price_cents_aud": 15600,
    "lowest_price_cents_gbp": 8400,
    "count_for_product_condition": 0
  },
  ...
]

上面,我们用很少的实际代码编写了一个返回所有 goat.com 搜索结果的爬虫!然后我们可以使用我们之前编写的产品抓取工具来收集所有产品数据(请参阅slug产品 URL 字段)。 如需在 Goat.com 上发现产品的另一种方法,请查看包含所有产品页面的Goat 站点地图目录

常问问题

为了结束本指南的网络抓取 Goat.com,让我们来看看一些常见问题。

是的!goat.com 的所有数据都是公开可用的(不需要登录),这是完全合法的。只要我们不通过过快的抓取来损害网站,抓取 Goat.com 就是合法和道德的。

Goat.com 可以被抓取吗?

是的。爬行是一种网络抓取,其中抓取程序通过探索功能找到要自行抓取的数据。虽然 Goat.com 提供了许多探索方面(如相关和推荐的产品和目录),这使得爬取变得容易,但不推荐这样做,因为如本教程所示,直接抓取更有效。

Goat.com 抓取摘要

在这个快速教程中,我们学习了如何使用 Python 抓取 Goat.com。我们首先了解如何使用隐藏的 Web 数据方法抓取单个产品页面数据集。然后,我们通过抓取 Goat 的隐藏搜索 API 来扩展具有搜索抓取功能的抓取工具。

Written by 河小马

河小马是一位杰出的数字营销行业领袖,广告中国论坛的重要成员,其专业技能涵盖了PPC广告、域名停放、网站开发、联盟营销以及跨境电商咨询等多个领域。作为一位资深程序开发者,他不仅具备强大的技术能力,而且在出海网络营销方面拥有超过13年的经验。