Frameworks
There are several popular web scraping frameworks of varying complexity and whether to use a framework or not depends on a few key factors:
Pros
- Frameworks come with many batteries-included like automatically configuring request headers, rate limiting, proxy switching etc.
- Community plugins and documentation helps to solve popular problems.
- Easy to scale up.
Cons
- Learning curve.
- Frameworks are often very opaque making it harder to debug and understand the scraping process.
- Hard to patch weak points for avoiding blocking.
In summary, frameworks are best for medium-sized average web scrapers. Here's a list of popular web scraping frameworks:
| language | framework | highlights |
|---|---|---|
| Python | scrapy | most popular web scraping framework, big community, feature rich |
| autoscraper | automatic parsing via fuzzy matching | |
| Go | colly | simple, aimed at crawling |
| gospider | similar to colly | |
| dataflowkit | integrated browser automation | |
| ferret | custom DSL, integrated browser automation (Chrome) | |
| geziyor | scrapy-like | |
| PHP | panther | integrated browser automation |
| php-spider | extendible | |
| Ruby | spidr | simple, aimed at crawling |
| wombat | custom DSL | |
| NodeJS | ayakashi | custom DSL, extendible |