-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP cache not working in some cases #125
Comments
I got someone else last week reporting the same issue. I could not reproduce it either. I wonder if it is a simple matter of getting a random empty body response due to an undetected ban. The default cache policy will not ignore an HTTP 200 response even if its body is empty, so it will be cached and fail as reported when trying to use In those cases, creating a custom cache policy that ignores empty responses should be relatively easy (subclass the DummyPolicy, override |
One thing that does happen is that responses from the cache do not use the original response class, so you cannot access |
@kmike @wRAR I see 2 possible approaches here:
The first approach seems the best to me, but I am not 100% sure how to best implement such an API in a way that plays well with the existing storage classes (DBM and file system) in a backward-compatible way. The second approach would be easier to implement, and cover most practical scenarios, but make corner case scenarios rather difficult (e.g. supporting additional Response classes beyond those from scrapy-zyte-api, or storing the cache somewhere other than the file system). Any thoughts? |
@Gallaecio I'm hitting this exact issue right now, and would love to know if going with option 2 is the preferred option for now. If yes, where to start? i see the raw_api_response is not cached here, so it should be straightforward? https://github.com/scrapy/scrapy/blob/67ab8d4650c1e9212c9508803c7b5265e166cbaa/scrapy/extensions/httpcache.py#L371-L382 |
Yes, you could implement option 2 on your own, subclass
|
I have seen 2 people now having trouble with HTTP cache in combination with scrapy-zyte-api.
They set
HTTPCACHE_ENABLED
toTrue
, and they getNotSupported("Response content isn't text")
.I could not reproduce the issue myself with the tutorial spider, so I am not sure why it happens.
If someone affected by this issue can provide a minimal, reproducible example, that would be great!
The text was updated successfully, but these errors were encountered: