in

如何使用 Python 批量爬取 Ebay 产品数据

如何使用 Python 批量爬取 Ebay 产品数据

在本网络抓取教程中,我们将了解如何抓取 Ebay 数据——世界上最大的点对点电子商务网络市场。 我们将抓取产品详细信息,例如定价、变体信息、功能和描述。 为了使用 Python 抓取 Ebay 数据,我们将使用一些流行的社区包和一些巧妙的解析技术。 我们还将了解如何抓取 Ebay 的搜索系统来发现新的商品列表,以便第一个知道何时有新的交易。

为什么要抓取 Ebay?

Ebay是世界上最大的产品市场之一,尤其是针对更多利基和稀有商品。这使得 Ebay 成为电子商务数据分析的重要目标。 抓取 Ebay 数据(如卖家评论)也可以让 Ebay 卖家能够轻松进行市场和竞争对手分析。

可用的Ebay数据字段

在这个 Ebay 网络抓取教程中,我们将抓取常见的产品数据,如定价、库存、功能和性能元数据。有关更多信息,请参阅此示例输出:

示例产品数据集
{
  "url": "https://www.ebay.com/itm/393531906094",
  "id": "393531906094",
  "price": "C $579.00",
  "price_converted": "US $427.32",
  "name": "Apple iPhone 11 Pro Max - Unlocked - 64GB / 256GB / 512GB - CA - Grade A",
  "seller_name": "device_care",
  "seller_url": "https://www.ebay.com/str/devicecare",
  "photos": [
    "https://i.ebayimg.com/images/g/93cAAOSwvEJgbLW8/s-l64.jpg",
    "https://www.jingzhengli.com/wp-content/uploads/2023/06/s-l64.jpg",
    "https://www.jingzhengli.com/wp-content/uploads/2023/06/s-l64-1.jpg",
    "https://www.jingzhengli.com/wp-content/uploads/2023/06/s-l64-2.jpg",
    "https://www.jingzhengli.com/wp-content/uploads/2023/06/s-l64-3.jpg",
    "https://www.jingzhengli.com/wp-content/uploads/2023/06/s-l500.jpg"
  ],
  "description_url": "https://vi.vipr.ebaydesc.com/ws/eBayISAPI.dll?ViewItemDescV4&item=393531906094&t=1631237959000&category=9355&seller=device_care&excSoj=1&excTrk=1&lsite=2&ittenable=true&domain=ebay.com&descgauge=1&cspheader=1&oneClk=2&secureDesc=1",
  "features": {
    "Condition": "Excellent - Refurbished: The item is in like-new condition, backed by a one year warranty. It has ... Read moreExcellent - Refurbished: The item is in like-new condition, backed by a one year warranty. It has been professionally refurbished, inspected and cleaned to excellent condition by qualified sellers. The item includes original or new accessories and will come in new generic packaging. See the seller's listing for full details. See all condition definitions",
    "Camera Resolution": "12.0 MP",
    "Operating System": "iOS",
    "Contract": "Without Contract",
    "Connectivity": "5G, Bluetooth, GPS, Lightning",
    "Features": "4K Video Recording, Accelerometer, Bluetooth Enabled, Camera, Facial Recognition",
    "Model Number": "A2161 (CDMA + GSM)",
    "RAM": "4 GB",
    "Lock Status": "Factory Unlocked",
    "Network": "1&1, Unlocked",
    "SIM Card Slot": "Dual SIM (SIM + eSIM)",
    "Brand": "Apple",
    "Processor": "Hexa Core",
    "Screen Size": "6.5 in"
  },
  "variants": {
    "Apple iPhone 11 Pro Max 512 GB Midnight Green": {
      "id": "662315637180",
      "price": "C $779.00",
      "price_converted": "US $574.93",
      "vat_price": null,
      "quantity": 5,
      "in_stock": false,
      "sold": 5,
      "available": 0,
      "watch_count": 27,
      "epid": "9034209121",
      "top_product": false,
      "traits": {
        "Model": "Apple iPhone 11 Pro Max",
        "Storage Capacity": "512 GB",
        "Color": "Midnight Green"
      }
    },
    "Apple iPhone 11 Pro Max 512 GB Gold": {
      "id": "662315637181",
      "price": "C $779.00",
      "price_converted": "US $574.93",
      "vat_price": null,
      "quantity": 5,
      "in_stock": false,
      "sold": 5,
      "available": 0,
      "watch_count": 10,
      "epid": "9034209182",
      "top_product": false,
      "traits": {
        "Model": "Apple iPhone 11 Pro Max",
        "Storage Capacity": "512 GB",
        "Color": "Gold"
      }
    },
    "Apple iPhone 11 Pro Max 512 GB Space Gray": {
      "id": "662315637182",
      "price": "C $779.00",
      "price_converted": "US $574.93",
      "vat_price": null,
      "quantity": 9,
      "in_stock": true,
      "sold": 4,
      "available": 5,
      "watch_count": 29,
      "epid": "19034211488",
      "top_product": false,
      "traits": {
        "Model": "Apple iPhone 11 Pro Max",
        "Storage Capacity": "512 GB",
        "Color": "Space Gray"
      }
    },
    "Apple iPhone 11 Pro Max 256 GB    Midnight Green": {
      "id": "662315637176",
      "price": "C $639.00",
      "price_converted": "US $471.60",
      "vat_price": null,
      "quantity": 134,
      "in_stock": false,
      "sold": 134,
      "available": 0,
      "watch_count": 165,
      "epid": "11037566785",
      "top_product": false,
      "traits": {
        "Model": "Apple iPhone 11 Pro Max",
        "Storage Capacity": "256 GB   ",
        "Color": "Midnight Green"
      }
    },
    "Apple iPhone 11 Pro Max 256 GB    Gold": {
      "id": "662315637177",
      "price": "C $639.00",
      "price_converted": "US $471.60",
      "vat_price": null,
      "quantity": 77,
      "in_stock": false,
      "sold": 77,
      "available": 0,
      "watch_count": 104,
      "epid": "27041453299",
      "top_product": false,
      "traits": {
        "Model": "Apple iPhone 11 Pro Max",
        "Storage Capacity": "256 GB   ",
        "Color": "Gold"
      }
    },
    "Apple iPhone 11 Pro Max 256 GB    Space Gray": {
      "id": "662315637178",
      "price": "C $639.00",
      "price_converted": "US $471.60",
      "vat_price": null,
      "quantity": 161,
      "in_stock": false,
      "sold": 161,
      "available": 0,
      "watch_count": 169,
      "epid": "10057225571",
      "top_product": false,
      "traits": {
        "Model": "Apple iPhone 11 Pro Max",
        "Storage Capacity": "256 GB   ",
        "Color": "Space Gray"
      }
    },
    "Apple iPhone 11 Pro Max 512 GB Silver": {
      "id": "662315637179",
      "price": "C $779.00",
      "price_converted": "US $574.93",
      "vat_price": null,
      "quantity": 4,
      "in_stock": true,
      "sold": 3,
      "available": 1,
      "watch_count": 10,
      "epid": "9034209212",
      "top_product": false,
      "traits": {
        "Model": "Apple iPhone 11 Pro Max",
        "Storage Capacity": "512 GB",
        "Color": "Silver"
      }
    },
    "Apple iPhone 11 Pro Max 64 GB Midnight Green": {
      "id": "662315637172",
      "price": "C $579.00",
      "price_converted": "US $427.32",
      "vat_price": null,
      "quantity": 236,
      "in_stock": true,
      "sold": 199,
      "available": 37,
      "watch_count": 183,
      "epid": "19042851646",
      "top_product": false,
      "traits": {
        "Model": "Apple iPhone 11 Pro Max",
        "Storage Capacity": "64 GB",
        "Color": "Midnight Green"
      }
    },
    "Apple iPhone 11 Pro Max 64 GB Gold": {
      "id": "662315637173",
      "price": "C $579.00",
      "price_converted": "US $427.32",
      "vat_price": null,
      "quantity": 257,
      "in_stock": true,
      "sold": 211,
      "available": 46,
      "watch_count": 161,
      "epid": "21042400312",
      "top_product": false,
      "traits": {
        "Model": "Apple iPhone 11 Pro Max",
        "Storage Capacity": "64 GB",
        "Color": "Gold"
      }
    },
    "Apple iPhone 11 Pro Max 64 GB Space Gray": {
      "id": "662315637174",
      "price": "C $579.00",
      "price_converted": "US $427.32",
      "vat_price": null,
      "quantity": 279,
      "in_stock": true,
      "sold": 221,
      "available": 58,
      "watch_count": 226,
      "epid": "7034220649",
      "top_product": false,
      "traits": {
        "Model": "Apple iPhone 11 Pro Max",
        "Storage Capacity": "64 GB",
        "Color": "Space Gray"
      }
    },
    "Apple iPhone 11 Pro Max 256 GB    Silver": {
      "id": "662315637175",
      "price": "C $639.00",
      "price_converted": "US $471.60",
      "vat_price": null,
      "quantity": 19,
      "in_stock": false,
      "sold": 19,
      "available": 0,
      "watch_count": 35,
      "epid": "23034220736",
      "top_product": false,
      "traits": {
        "Model": "Apple iPhone 11 Pro Max",
        "Storage Capacity": "256 GB   ",
        "Color": "Silver"
      }
    },
    "Apple iPhone 11 Pro Max 64 GB Silver": {
      "id": "662315637171",
      "price": "C $579.00",
      "price_converted": "US $427.32",
      "vat_price": null,
      "quantity": 72,
      "in_stock": true,
      "sold": 53,
      "available": 19,
      "watch_count": 55,
      "epid": "9034209203",
      "top_product": false,
      "traits": {
        "Model": "Apple iPhone 11 Pro Max",
        "Storage Capacity": "64 GB",
        "Color": "Silver"
      }
    }
  }
}

我们还将抓取提供产品预览数据集的 Ebay 搜索。请参阅此示例输出:

示例搜索数据集
[
  {
    "url": "https://www.ebay.com/itm/394406593931",
    "title": "iPhone 11 Pro Max 256GB Space Gray (Unlocked) CRACKED FRONT BACK CLEAN ESN",
    "price": "$289.94",
    "shipping": "Free shipping",
    "list_date": "Jan-3 20:32",
    "subtitles": [
      "Apple iPhone 11 Pro Max",
      "256 GB",
      "Unlocked"
    ],
    "condition": "Parts Only",
    "photo": "https://i.ebayimg.com/thumbs/images/g/74AAAOSwRahjtQBg/s-l225.webp",
    "rating": "4.5 out of 5 stars.",
    "rating_count": "80 product ratings"
  },
...
]

Ebay 的页面包含大量数据,在我们的示例 Ebay 网络抓取工具中,我们将坚持使用最重要的数据字段,但本指南中涵盖的技术可用于抓取 Ebay 的任何其他部分。

设置

在本教程中,我们将使用 Python 和两个重要的社区库:

  • httpx – HTTP 客户端库,它将让我们与 ebay.com 的服务器通信并检索原始页面数据。
  • parsel – HTML 解析库,它将帮助我们使用CSS 选择器Xpath解析网络抓取的原始 HTML 数据。

这些包可以通过pip install命令轻松安装:

$ pip install httpx parsel

或者,可以随意换成httpx任何其他 HTTP 客户端包,例如requests,因为我们只需要基本的 HTTP 功能,这些功能几乎可以在每个库中互换。至于,parsel另一个很好的选择是beautifulsoup包。

抓取Ebay列表

让我们从解析单个 Ebay 列表页面开始。 为此,我们将使用httpx检索产品的 HTML 页面并parsel使用 CSS 选择器解析它。 首先,我们可以将 Ebay 列表分为两种类型

  • 具有多种变体的列表- 例如科技设备、衣服、鞋子。有多种选择的事物。
  • 具有单一变体的列表- 通常是没有选项的简单产品。比如玩具或二手物品。

让我们从单一变体列表开始,因为它们要简单得多。对于这个例子,让我们使用这个产品ebay.com/itm/332562282948

将从 Ebay.com 抓取的字段标记
我们将捕获最重要的字段:定价、描述以及产品和卖家详细信息

在上图中,我们标记了我们的字段并构建 CSS 选择器来选择这些字段,我们可以使用浏览器的开发人员工具(F12按键或右键单击 ->inspect选项): 有了这个,我们在 Python 中的单一列表 Ebay 抓取器将看起来像这样:

from parsel import Selector
import httpx

def parse_item(sel: Selector):
    # parsing shortcuts to avoid repetition:
    css_join = lambda css: "".join(sel.css(css).getall()).strip()  # join all selected elements
    css = lambda css: sel.css(css).get("").strip()  # take first selected element and strip of leading/trailing spaces

    item = {}
    item["url"] = css('link[rel="canonical"]::attr(href)')
    item["id"] = item["url"].split("/itm/")[1].split("?")[0]  # we can take ID from the URL
    item["price"] = css('span[itemprop="price"] .ux-textspans ::text')
    item["price_converted"] = css("span.x-price-approx__price ::text")  # ebay automatically converts price for some regions

    item["name"] = css_join("h1 span::text")
    item["seller_name"] = css_join("div[data-testid=str-title] a ::text")
    item["seller_url"] = css("div[data-testid=str-title] a::attr(href)").split("?")[0]
    item["photos"] = sel.css('.ux-image-filmstrip-carousel-item.image img::attr("src")').getall()  # carousel images
    item["photos"].extend(sel.css('.ux-image-carousel-item.image img::attr("src")').getall())  # main image
    # description is an iframe (independant page). We can keep it as an URL or scrape it later.
    item["description_url"] = css("div.d-item-description iframe::attr(src)")
    if not item["description_url"]:  # description can be in 2 locations - check both
        item["description_url"] = css("div#desc_div iframe::attr(src)")
    # feature details from the description table:
    feature_table = sel.css("div.ux-layout-section__item--table-view")
    features = {}
    for ft_label in feature_table.css(".ux-labels-values__labels"):
        # iterate through each label of the table and select first sibling for value:
        label = "".join(ft_label.css(".ux-textspans::text").getall()).strip(":\n ")
        ft_value = ft_label.xpath("following-sibling::div[1]")
        value = "".join(ft_value.css(".ux-textspans::text").getall()).strip()
        features[label] = value
    item["features"] = features
    return item

response = httpx.get("https://www.ebay.com/itm/332562282948")
selector = Selector(response.text)
item = parse_item(selector)
print(item)
示例输出
{
  "url": "https://www.ebay.com/itm/332562282948",
  "id": "332562282948",
  "price": "US $13.94",
  "price_converted": "",
  "name": "Sanei Kirby 5.5\" Plush Stuffed Doll (KP01) - Kirby Adventure All Star Collection",
  "seller_name": "ToysCollections",
  "seller_url": "https://www.ebay.com/str/huskylover228",
  "photos": [
    "https://i.ebayimg.com/images/g/ITEAAOSw9p9ajK16/s-l500.jpg"
  ],
  "description_url": "https://vi.vipr.ebaydesc.com/ws/eBayISAPI.dll?ViewItemDescV4&item=332562282948&t=1653362457000&category=69528&seller=the_northeshop&excSoj=1&excTrk=1&lsite=0&ittenable=true&domain=ebay.com&descgauge=1&cspheader=1&oneClk=2&secureDesc=1",
  "features": {
    "Condition": "New: A brand-new, unused, unopened, undamaged item (including handmade items). See the seller's ... Read moreNew: A brand-new, unused, unopened, undamaged item (including handmade items). See the seller's listing for full details. See all condition definitions",
    "Brand": "unbranded",
    "Type": "Plush",
    "UPC": "4905330122810",
    "Featured Refinements": "Kirby Plush",
    "Recommended Age Range": "4+",
    "Gender": "Boys & Girls",
    "Character Family": "Kirby Adventure"
  }
}

在上面的示例中,我们进行了一些基于 CSS 选择器的基本 HTML 解析,以提取商品详细信息,例如价格、名称、功能和照片。 接下来,对于具有变体的产品,我们必须更进一步并提取隐藏的网络数据——让我们来看看如何做到这一点。

Ebay变体数据

Ebay 的列表可以通过称为变体的功能包含多种产品。 例如,让我们以这个 iPhone 清单为例:ebay.com/itm/393531906094

ebay.com 变体选项的标记
带有变体的列表有多个选择选项

我们可以看到几个变体选项:型号、存储容量和颜色。每次我们选择不同的选项时,价格都会更新。我们怎么能刮这个? 每次我们选择不同的选项时,Ebay 都会使用 javascript 以不同的价格更新页面。这意味着价格数据在 javascript 变量中的某处可用。我们所要做的就是提取这个变量来抓取变体数据集。 因此,为了捕获变体,我们将提取隐藏的变体数据并将其与我们之前的 HTML 抓取工具配对:

import json
from parsel import Selector


def find_json_objects(text: str, decoder=json.JSONDecoder()):
    """Find JSON objects in text, and generate decoded JSON data"""
    pos = 0
    while True:
        match = text.find("{", pos)
        if match == -1:
            break
        try:
            result, index = decoder.raw_decode(text[match:])
            yield result
            pos = match + index
        except ValueError:
            pos = match + 1


def parse_variants(sel: Selector) -> dict:
    # find script that contains itemVariationsMaps variable:
    script = sel.xpath('//script[contains(., "itemVariationsMap")]/text()').get()
    if not script:
        return {}

    # find all JSON objects in the script text
    all_data = list(find_json_objects(script))
    # find one JSON object that contains itemVariantionsMaps variable:
    variants = next(d for d in all_data if "itemVariationsMap" in str(d))["itemVariationsMap"]

    # extract option values for mapping variant trait ids to human labels
    selections = defaultdict(dict)
    for selection in sel.css(".x-msku__box-cont select"):
        name = selection.xpath("@selectboxlabel").get()
        selection_data = {}
        for option in selection.xpath("option"):
            value = int(option.xpath("@value").get())
            if value == -1:  # that's the placeholder
                continue
            label = option.xpath("text()").get().strip()
            label = label.split("(Out ")[0]
            selections[name][value] = label

    # map variant trait ids to human labels
    for variant_id, variant in variants.items():
        for trait, trait_id in variant["traitValuesMap"].items():
            variant["traitValuesMap"][trait] = selections[trait][trait_id]

    # parse variants to something more usable
    parsed_variants = {}
    for variant_id, variant in variants.items():
        label = " ".join(variant["traitValuesMap"].values())
        parsed_variants[label] = {
            "id": variant_id,
            "price": variant["price"],
            "price_converted": variant["convertedPrice"],
            "vat_price": variant["vatPrice"],
            "quantity": variant["quantity"],
            "in_stock": variant["inStock"],
            "sold": variant["quantitySold"],
            "available": variant["quantityAvailable"],
            "watch_count": variant["watchCount"],
            "epid": variant["epid"],
            "top_product": variant["topProduct"],
            "traits": variant["traitValuesMap"],
        }
    return parsed_variants
运行代码和示例输出
response = httpx.get("https://www.ebay.com/itm/393531906094")
selector = Selector(response.text)
item = parse_item(selector)
item['variants'] = parse_variants(selector)
print(item)

在这个 Ebay 爬虫示例中,我们使用隐藏的网络数据解析技术来提取包含列表变体数据的 javascript 变量。 我们使用选项名称进一步扩展了它,并使用基本的 Python 数据类型对其进行了清理,使其更易于呈现。 接下来,让我们看看如何使用搜索系统在 Ebay 上查找列表。

要开始抓取 Ebay 的搜索,首先让我们看一下它的工作方式。 当我们输入搜索关键字时,我们可以看到 Ebay 将我们重定向到搜索结果所在的不同 URL。例如,如果我们搜索该词,我们将被带到类似于ebay.com/sch/i.html?_nkw=iphone&_sacat=0iphone的 URL  此页面使用多个 URL 参数来定义搜索查询:

  • _nkw用于搜索关键字。
  • _sacar是类别限制。
  • _sop是排序类型。
  • _pgn是页码。
  • _ipg是每页的列表(默认为 60)。

我们可以通过点击并探索搜索来找到更多参数,尽管对于这个例子让我们坚持使用这 5 个参数。

import asyncio
import math
import httpx
from typing import TypedDict, List, Literal
from urllib import urlencode

from parsel import Selector


session = httpx.AsyncClient(follow_redirects=True)


class ProductPreviewResult(TypedDict):
    """type hint for search scrape results for product preview data"""

    url: str  # url to full product page
    title: str
    price: str
    shipping: str
    list_date: str
    subtitles: List[str]
    condition: str
    photo: str  # image url
    rating: str
    rating_count: str


def parse_search(sel: Selector) -> List[ProductPreviewResult]:
    """parse ebay's search page for listing preview details"""
    previews = []
    # each listing has it's own HTML box where all of the data is contained
    listing_boxes = sel.css(".srp-results li.s-item")
    for box in listing_boxes:
        # quick helpers to extract first element and all elements
        css = lambda css: box.css(css).get("").strip()
        css_all = lambda css: box.css(css).getall()
        previews.append(
            {
                "url": css("a.s-item__link::attr(href)").split("?")[0],
                "title": css(".s-item__title>span::text"),
                "price": css(".s-item__price::text"),
                "shipping": css(".s-item__shipping::text"),
                "list_date": css(".s-item__listingDate span::text"),
                "subtitles": css_all(".s-item__subtitle::text"),
                "condition": css(".s-item__subtitle .SECONDARY_INFO::text"),
                "photo": css("img.s-item__image-img::attr(src)"),
                "rating": css(".s-item__reviews .clipped::text"),
                "rating_count": css(".s-item__reviews-count span::text"),
            }
        )
    return previews


SORTING_MAP = {
    "best_match": 12,
    "ending_soonest": 1,
    "newly_listed": 10,
}


async def scrape_search(
    query,
    max_pages=1,
    category=0,
    items_per_page=240,
    sort: Literal["best_match", "ending_soonest", "newly_listed"] = "newly_listed",
) -> List[ProductPreviewResult]:
    """Scrape Ebay's search for product preview data for given"""

    def make_request(page):
        return "https://www.ebay.com/sch/i.html?" + urlencode(
            {
                "_nkw": query,
                "_sacat": category,
                "_ipg": items_per_page,
                "_sop": SORTING_MAP[sort],
                "_pgn": page,
            }
        )

    first_page = await session.get(make_request(page=1))
    results = parse_search(first_page)
    if max_pages == 1:
        return results
    # find total amount of results for concurrent pagination
    total_results = first_page.selector.css(".srp-controls__count-heading>span::text").get()
    total_results = int(total_results.replace(",", ""))
    total_pages = math.ceil(total_results / items_per_page)
    if total_pages > max_pages:
        total_pages = max_pages
    other_pages = [session.get(make_request(page=i)) for i in range(2, total_pages + 1)]
    for response in asyncio.as_completed(other_pages):
        response = await response
        try:
            results.extend(parse_search(response))
        except Exception as e:
            print(f"failed to scrape search page {response.url}")
    return results
运行代码和示例输出
session = httpx.AsyncClient()


async def run():
    search_results = await scrape_search("iphone", items_per_page=60, max_pages=2)
    print(search_results)


if __name__ == "__main__":
    asyncio.run(run())
这将导致类似于以下的数据集:

[
    {
        "url": "https://www.ebay.com/itm/354493525522",
        "title": "Apple iPhone 11 - 128GB - Black (Unlocked) A2111 (CDMA + GSM)",
        "price": "$1,200.99",
        "shipping": "+$25.00 shipping",
        "list_date": "Jan-3 04:32",
        "subtitles": [
            "Apple iPhone 11",
            "128 GB",
            "Unlocked"
        ],
        "condition": "Pre-Owned",
        "photo": "https://i.ebayimg.com/thumbs/images/g/m5QAAOSwrsxjtB~R/s-l225.webp",
        "rating": "4.5 out of 5 stars.",
        "rating_count": "68 product ratings"
    },
    ...  # trucated for the blog
]

在上面的示例中,我们为 Ebay 的搜索编写了一个小爬虫。我们使用 Python 的urlencode函数将字典参数转换为 URL 参数来构建一个搜索 URL。 然后,我们使用 CSS 选择器解析抓取的数据。首先,我们选择了所有列表框容器并遍历它们以安全地提取每个列表的详细信息。 如果我们想扩展此搜索数据集,我们可以进一步使用上一节中的列表抓取器来提取完整​​的列表详细信息。

常问问题

为了总结本指南,让我们看一下有关如何从 ebay 抓取数据的一些常见问题:

是的。Ebay 的数据是公开的——以缓慢、尊重的速度抓取 Ebay 属于道德抓取定义。 也就是说,在存储个人数据(例如姓名或位置等卖家个人详细信息)时,请注意欧盟的 GDRP 合规性。有关更多信息,请参阅我们的网页抓取合法吗?文章。

如何抓取 Ebay.com?

为了网络爬取 Ebay,我们可以采用本文中介绍的抓取技术。每个 ebay 列表都包含相关产品,我们可以提取这些产品并将其输入到我们的抓取循环中,从而将我们的抓取器变成一个能够找到新细节进行抓取的抓取器。

有EbayAPI吗?

否。虽然 Ebay 确实有一个私有目录 API,但它仅包含产品 ID 等元数据字段。对于产品价格和其他详细信息,唯一的方法是按照本指南中的说明抓取 Ebay。

Ebay抓取摘要

在本指南中,我们仅使用 Python 和一些社区包编写了一个用于产品列表数据的Python Ebayhttpx抓取程序:用于检索内容和parsel解析内容。 我们发现了两种类型的产品列表:单变体和多变体。对于前者,我们使用 CSS 选择器从 HTML 中解析列表数据。然而,对于后者,我们不得不使用隐藏的网络数据抓取技术从隐藏的 javascript 变量中提取变异数据。 为了在 Ebay 上查找列表,我们了解了搜索系统的工作原理以及我们如何通过复制其行为来抓取它。

Written by 河小马

河小马是一位杰出的数字营销行业领袖,广告中国论坛的重要成员,其专业技能涵盖了PPC广告、域名停放、网站开发、联盟营销以及跨境电商咨询等多个领域。作为一位资深程序开发者,他不仅具备强大的技术能力,而且在出海网络营销方面拥有超过13年的经验。