Scrapy

Title:

Scrapy

License:

BSD 2-Clause "Simplified" or "FreeBSD" License (BSD-2-Clause)Open Source

Owner:

Scrapy developers

Keywords:

Cataloging, Discovery

Contributor:

Preserveware Editor

Date:

08/06/2017

Metrics:

2,082 views , 7 Downloads

Scrapy is an application framework for crawling web sites and extracting structured data which can be used for a wide range of useful applications, like data mining, information processing or historical archival.

Even though Scrapy was originally designed for web scraping, it can also be used to extract data using APIs (such as Amazon Associates Web Services) or as a general purpose web crawler.

In order to show you what Scrapy brings to the table, we’ll walk you through an example of a Scrapy Spider using the simplest way to run a spider.

Here’s the code for a spider that scrapes famous quotes from website http://quotes.toscrape.com, following the pagination:

import scrapy


class QuotesSpider(scrapy.Spider):
    name = \"quotes\"
    start_urls = [
        \'http://quotes.toscrape.com/tag/humor/\',
    ]

    def parse(self, response):
        for quote in response.css(\'div.quote\'):
            yield {
                \'text\': quote.css(\'span.text::text\').extract_first(),
                \'author\': quote.xpath(\'span/small/text()\').extract_first(),
            }

        next_page = response.css(\'li.next a::attr(\"href\")\').extract_first()
        if next_page is not None:
            yield response.follow(next_page, self.parse)

 

 

Code repository

https://github.com/scrapy/scrapy

Installation instructions

Scrapy runs on Python 2.7 and Python 3.3 or above.

If you’re using Anaconda or Miniconda, you can install the package from the conda-forge channel, which has up-to-date packages for Linux, Windows and OS X.

To install Scrapy using conda, run:

conda install -c conda-forge scrapy

Alternatively, if you’re already familiar with installation of Python packages, you can install Scrapy and its dependencies from PyPI with:

pip install Scrapy

Note that sometimes this may require solving compilation issues for some Scrapy dependencies depending on your operating system, so be sure to check the Platform specific installation notes.

We strongly recommend that you install Scrapy in a dedicated virtualenv, to avoid conflicting with your system packages.

Leave a Reply

Your email address will not be published.

Report Post