Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix metadata deserialization in async mode for PGVector #125

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

shamspias
Copy link

Problem

When using asynchronous methods with PGVector (async_mode=True), the metadata field retrieved from the database may be of type Fragment (from asyncpg) or other non-dict types. This causes a ValidationError when the Document class expects metadata to be a dictionary.

Solution

This pull request modifies the _results_to_docs_and_scores method to ensure that metadata is correctly converted into a dictionary before creating Document instances. The method now handles different possible types of metadata and attempts to deserialize it into a dict.

Changes

  • Updated _results_to_docs_and_scores method in PGVector class to handle metadata deserialization for different types (e.g., dict, str, Fragment).

Testing

  • Tested with async_mode=True and confirmed that the metadata field is correctly deserialized and no longer causes validation errors.
  • Ensured that the change does not affect the behavior when async_mode=False.

Related Issues

@jaimeescano
Copy link

Facing this very same issue. Wondering when this commit could be merged.
Big credits to @shamspias for providing the fix.

Regards

@eyurtsev eyurtsev self-assigned this Oct 15, 2024
@@ -1058,17 +1058,38 @@ async def asimilarity_search_with_score_by_vector(

def _results_to_docs_and_scores(self, results: Any) -> List[Tuple[Document, float]]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a relevant unit test(s) for this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @eyurtsev,

Thank you for your feedback. I've added unit tests to cover metadata deserialization in asynchronous operations, specifically targeting the _results_to_docs_and_scores method. The new tests ensure that metadata is correctly deserialized and that Document instances receive the proper metadata when using async_mode=True.

Please review the updated tests and let me know if you have any further comments or suggestions.

Best regards,

@shamspias
Copy link
Author

@eyurtsev, please let me know if there is anything else I need to do. If not, please merge it.

@simadimonyan
Copy link

Did you fix the issue? I have related: langchain-ai/langchain#28029

@shamspias
Copy link
Author

Did you fix the issue? I have related: langchain-ai/langchain#28029

yes I did fixed the issue you can use the branch if you like to

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants