Go to top

Ideas

This is a list of ideas for applications.

If you wish to apply, learn how to participate!

Scrapy

Handle 429s properly

Easy
Description

Currently scrapy doesn’t handle 429s properly. So, whenever we get 429 response code, we should update throttling configs and concurrency to adapt to the new rate.

Expected Result

A new middleware/extension that will handle 429 response codes and adjust request rates properly.

Expected Time

175 hours

Required Skills HTTP
Mentors Adrian, Andrey
GitHub Issue #4424

Static Analysis Tooling

Medium
Description

While using Scrapy, there are certain common issues that are hard to detect. For example, a typo in the name of a setting.

Expected Result

Build a list of common issues in code using Scrapy that could be detected using static code analysis, and build a tool or extend an existing tool to detect those.

Expected Time

175 hours or 350 hours depending on the chosen scope

Required Skills Abstract Syntax Tree, Regular Expressions
Mentors Adrian, Andrey
GitHub Issue #4421

Improve cookie handling

Hard
Description

There are different aspects of cookie handling in Scrapy that we should improve.

Expected Result

Update the handling of cookies by Scrapy to meet modern web standards followed by web browsers, and make it easier for Scrapy users to work with cookies.

Expected Time

175 hours or 350 hours depending on the chosen scope

Required Skills HTTP
Mentors Adrian, Andrey
GitHub Issue #5431

Add TLS 1.3 support

Hard
Description

Scrapy does not support TLS 1.3, and it is important that we do to keep up with servers that drop support for older TLS versions

Expected Result

Update our HTTP 1.1 downloader to support TLS 1.3 connections.

Expected Time

350 hours

Stretch Goals

Update our HTTP/2 downloader to support TLS 1.3 connections as well.

Required Skills HTTP, Twisted
Mentors Adrian, Andrey
GitHub Issue #4821

Parsel

HTML5 Support

Easy
Description

When you inspect a website element in a web browser, you get a DOM-based HTML tree that is different from the actual, underlying HTML tree. This makes it difficult to translate what you find in a web browser into an XPath or CSS expression that can work in Parsel. More so when the underlying HTML is actually broken.

Expected Result

Extend Parsel to support different HTML parsers, and add support for additional HTML parsers.

Expected Time

175 hours

Required Skills HTML, Interface Design
Mentors Andrey, Adrian
GitHub Issue #83