Browse
Ideas
This is a list of ideas for applications.
If you wish to apply, learn how to participate!
Scrapy
Handle 429s properly
Description | Currently scrapy doesn’t handle 429s properly. So, whenever we get 429 response code, we should update throttling configs and concurrency to adapt to the new rate. |
Expected Result | A new middleware/extension that will handle 429 response codes and adjust request rates properly. |
Expected Time | 175 hours |
Required Skills | HTTP |
Mentors | Adrian, Andrey |
GitHub Issue | #4424 |
Static Analysis Tooling
Description | While using Scrapy, there are certain common issues that are hard to detect. For example, a typo in the name of a setting. |
Expected Result | Build a list of common issues in code using Scrapy that could be detected using static code analysis, and build a tool or extend an existing tool to detect those. |
Expected Time | 175 hours or 350 hours depending on the chosen scope |
Required Skills | Abstract Syntax Tree, Regular Expressions |
Mentors | Adrian, Andrey |
GitHub Issue | #4421 |
Improve cookie handling
Description | There are different aspects of cookie handling in Scrapy that we should improve. |
Expected Result | Update the handling of cookies by Scrapy to meet modern web standards followed by web browsers, and make it easier for Scrapy users to work with cookies. |
Expected Time | 175 hours or 350 hours depending on the chosen scope |
Required Skills | HTTP |
Mentors | Adrian, Andrey |
GitHub Issue | #5431 |
Add TLS 1.3 support
Description | Scrapy does not support TLS 1.3, and it is important that we do to keep up with servers that drop support for older TLS versions |
Expected Result | Update our HTTP 1.1 downloader to support TLS 1.3 connections. |
Expected Time | 350 hours |
Stretch Goals | Update our HTTP/2 downloader to support TLS 1.3 connections as well. |
Required Skills | HTTP, Twisted |
Mentors | Adrian, Andrey |
GitHub Issue | #4821 |
Parsel
HTML5 Support
Description | When you inspect a website element in a web browser, you get a DOM-based HTML tree that is different from the actual, underlying HTML tree. This makes it difficult to translate what you find in a web browser into an XPath or CSS expression that can work in Parsel. More so when the underlying HTML is actually broken. |
Expected Result | Extend Parsel to support different HTML parsers, and add support for additional HTML parsers. |
Expected Time | 175 hours |
Required Skills | HTML, Interface Design |
Mentors | Andrey, Adrian |
GitHub Issue | #83 |