Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Generic dataclasses #259

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

mvanderlee
Copy link
Contributor

@mvanderlee mvanderlee commented Mar 11, 2024

Fixes: #230, possbily #267, #269

Building on the fantastic work of @onursatici and @dairiki in PRs #172 and #232

Currently broken on py3.6, but once #257 is merged I'll rebase and thus remove py3.6 and py3.7 support.

I've tested this with tox on python versions 38, 39, 310, 311, 312

@mvanderlee mvanderlee mentioned this pull request Apr 5, 2024
@Spoonrad
Copy link

Exciting PR, I've been waiting a long time for generic support in dataclasses :) Do you have any update please?

@mvanderlee mvanderlee marked this pull request as ready for review June 24, 2024 23:07
@mvanderlee
Copy link
Contributor Author

@dairiki @lovasoa This is ready for review.

@Spoonrad Consider this the update

@mvanderlee
Copy link
Contributor Author

mvanderlee commented Jun 25, 2024

I came up with 1 additional usecase, with 3 solutions that are somewhat related to Annotated and Generics.
I have existing NewTypes like this:

DelimitedListStr = NewType('DelimitedListStr', list[str], field=lambda *args, **kwargs: DelimitedList(mf.String(), *args, **kwargs))

I came up with these 3 solutions for this. One uses generics, the other doesn't.

  1. Using Generics:
DelimitedListStr = Annotated[List[str], DelimitedList[mf.String]]

Edit: In the following comments we've decided against supporting callables like options 2 and 3.

  1. Using partial:
DelimitedListStr = Annotated[List[str], functools.partial(DelimitedList, mf.String)]
  1. Using callable:
DelimitedListStr = Annotated[List[str], lambda *args, **kwargs: DelimitedList(mf.String, *args, **kwargs)]

Support for each has been added, including unittests.
I also broke out the generic handling into it's own file.

@dairiki
Copy link
Collaborator

dairiki commented Jun 26, 2024

  1. Using partial:
DelimitedListStr = Annotated[List[str], functools.partial(DelimitedList, mf.String)]
  1. Using callable:
DelimitedListStr = Annotated[List[str], lambda *args, **kwargs: DelimitedList(mf.String, *args, **kwargs)]

As API choices, these make me nervous.

Recognizing generic callables as (potential) schema field annotations, and then testing them with a zero-arg call, as is done here

def _is_callable_marshmallow_field(obj) -> bool:
"""Checks if the object is a callable and if the callable returns a marshmallow field"""
if callable(obj):
try:
potential_field = obj()
return _is_marshmallow_field(potential_field)
except Exception:
return False

seems both confusing and dangerous.

  • The test call uses potentially different arguments than the callable is intended to be called with.
    • The programmer must know to support zero-arg calls
    • There is no guarantee that the real call (with arguments) will return the same thing as the zero-arg test call
  • The callable (which could be an annotation unrelated to marshmallow_dataclass) might have side-effects

Can the same functionality be obtained by creating a custom Marshmallow Field subclass? E.g. something like:

class DelimitedStrListField(DelimitedList):
    def __init__(self, *args, **kwargs):
        super().__init__(mf.String, *args, **kwargs)

DelimitedListStr = Annotated[List[str], DelimitedStrListField]

@mvanderlee
Copy link
Contributor Author

Right.. We don't yet know if the annotation is supposed to be a marshmallow Field compatible callable. So we could be calling anything including eraseDiskDrive().

The class approach certainly works, I was trying to use a one liner. But the partial approach works for that, and is safer from an API perspective.
We could introduce a special callable wrapper, so the user can use a callable but has to explicitly tell us about it.

DelimitedListStr = Annotated[List[str], MaFieldCallable(lambda *args, **kwargs: DelimitedList(mf.String, *args, **kwargs))]

but I don't see the value in that. over using either the partial or custom class approach.

This approach was unsafe.
See PR lovasoa#259 for more details
@dairiki
Copy link
Collaborator

dairiki commented Jun 26, 2024

The class approach certainly works, I was trying to use a one liner. But the partial approach works for that, and is safer from an API perspective.

I hadn't noticed that you explicitly check for a partial applied to a Field (here). That is safe.

I still think it adds unnecessary clutter to the API.

Also, allowing a partial but not an equivalent arbitrary callable seems potentially confusing.
It doesn't add any capability to the API, other than to enable specific one-liners.

We could introduce a special callable wrapper, so the user can use a callable but has to explicitly tell us about it.

A value of that would be that it is more explicitly readable: it makes the intent of the annotation more clear to the unindoctrinated reader.


I'm in favor of stopping at recognizing annotations that are either subclasses or instances of Field. That's an understandable, simple-to-explain API that's fairly self-evident, even to those unfamiliar with it.

I don't really see the extra three lines required to define a custom Field subclass as too verbose. (As well, doing so provides a good place to add a docstring, if that would help clarify the intent.)

@mvanderlee
Copy link
Contributor Author

mvanderlee commented Jun 26, 2024

I'll switch to using the generic approach myself. So I'm fine with removing the partial approach.
I mostly wanted to support an undocumented feature that works with NewType.

allowing a partial but not an equivalent arbitrary callable seems potentially confusing.

Agreed. I did investigate further but couldn't find a satisfactor way to get typehints of lambdas, even when using the Callable typehint.

While I agree that 3 lines isn't bad, given that I have multiple one lines currently, each would take up 3 + 2 blank lines (style guide) So instead of 5 lines, it'd be 25 lines. Which is harder to read at a glance.

But again, generics fixes my usecase and I can't find a satisfactory answer to the valid concerns being brought up.

Copy link
Collaborator

@dairiki dairiki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that this review is complete, but I'm about to leave on a road-trip, so wanted to get this out.

marshmallow_dataclass/__init__.py Outdated Show resolved Hide resolved
marshmallow_dataclass/generic_resolver.py Outdated Show resolved Hide resolved
marshmallow_dataclass/__init__.py Outdated Show resolved Hide resolved
marshmallow_dataclass/generic_resolver.py Outdated Show resolved Hide resolved
obj,
schema_ctx: _SchemaContext,
) -> type:
"""typing.get_type_hints doesn't work with generic aliasses. But this 'hack' works."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this function should be renamed _resolve_forward_type_refs (or similar).

Get_type_hints returns a dict containing the type hints for all members of a class (while resolving forward type references found along the way).
The purpose of this function, by way of contrast, is to resolve forward type references in a single type hint. (If there are no forward type references in obj, this function (I think) returns obj unchanged.)

(Spelling nit: "aliases")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the forward refs have already been resolved. This really exists purely to get the typehint for generics.
maybe _get_type_hint_of_generic_object ?

import typing

T = typing.TypeVar('T')

class A(Generic[T]):
  a: T

class B(A[int]):
  pass

print(typing.get_type_hints(B))
print(typing.get_type_hints(A[int]))

>=========================
{'a': ~T}
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[51], [line 12](vscode-notebook-cell:?execution_count=51&line=12)
      [9](vscode-notebook-cell:?execution_count=51&line=9)   pass
     [11](vscode-notebook-cell:?execution_count=51&line=11) print(typing.get_type_hints(B))
---> [12](vscode-notebook-cell:?execution_count=51&line=12) print(typing.get_type_hints(A[int]))

File ~\.pyenv\pyenv-win\versions\3.11.3\Lib\typing.py:2347, in get_type_hints(obj, globalns, localns, include_extras)
   ~/.pyenv/pyenv-win/versions/3.11.3/Lib/typing.py:2345)         return {}
   ~/.pyenv/pyenv-win/versions/3.11.3/Lib/typing.py:2346)     else:
-> ~/.pyenv/pyenv-win/versions/3.11.3/Lib/typing.py:2347)         raise TypeError('{!r} is not a module, class, method, '
   ~/.pyenv/pyenv-win/versions/3.11.3/Lib/typing.py:2348)                         'or function.'.format(obj))
   ~/.pyenv/pyenv-win/versions/3.11.3/Lib/typing.py:2349) hints = dict(hints)
   ~/.pyenv/pyenv-win/versions/3.11.3/Lib/typing.py:2350) for name, value in hints.items():

TypeError: __main__.A[int] is not a module, class, method, or function.
def _get_generic_type_hints(obj) -> type:
    """typing.get_type_hints doesn't work with generic aliases. But this 'hack' works."""

    class X:
        x: obj  # type: ignore[name-defined]

    return typing.get_type_hints(X)['x']

print(_get_generic_type_hints(B))
print(_get_generic_type_hints(A[int]))

>=========================
<class '__main__.B'>
__main__.A[int]

Copy link
Collaborator

@dairiki dairiki Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your last examples, _get_generic_type_hints is just the identity, right?

>>> _get_generic_type_hints(B) is B
True
>>> _get_generic_type_hints(A[int]) is A[int]
True

But, the value of _get_generic_type_hints comes in resolving forward references:

>>> _get_generic_type_hints(A["int"])
__main__.A[int]
>>> _get_generic_type_hints(A["int"]) is A[int]
False

Copy link
Collaborator

@dairiki dairiki Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe what is really wanted is (untested)

def _get_generic_type_hints(generic_alias):
    class X(generic_alias): pass

    return typing.get_type_hints(X)

Or, in one line:

    return typing.get_type_hints(types.new_class("_", bases=(generic_alias,)))

That version of _get_generic_type_hints then matches the signature of typing.get_type_hints. As things stand, it does not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, good catch! You're absolutely right.

  1. This just worked 🤦‍♂️ I clearly spend too long trying to make it work to see the obvious solution.
(
    type_hints[field.name]
    if not is_generic_type(clazz)
    else field.type
)
  1. But didn't work for A["int"] and no tests caught that yet. So I'll add a tests and rename the function as per your recommendation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, if that works, maybe just roll this into _get_type_hints so that there's a single _get_type_hints function that can be used for any dataclass, be it generic or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def _get_generic_type_hints(generic_alias):
    class X(generic_alias): pass

    return typing.get_type_hints(X)

Does not work.

print(_get_generic_type_hints(A[int]))

>=====================
{'a': ~T}

Instead of {'a': __main__.A[int]}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not work.

That's too bad.

I still think that refactoring so that _get_type_hints will work with generic aliases of dataclasses as well as plain dataclasses is probably worth it. It will move all the is_generic_type special-casing out of

type_hints = {}
if not is_generic_type(clazz):
type_hints = _get_type_hints(clazz, schema_ctx)
attributes.update(
(
field.name,
_field_for_schema(
(
type_hints[field.name]
if not is_generic_type(clazz)
else _get_generic_type_hints(field.type, schema_ctx)
),
_get_field_default(field),
field.metadata,
base_schema,
),
)
for field in fields
if field.init or include_non_init
)

Copy link
Contributor Author

@mvanderlee mvanderlee Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree but I've already sunk too many hours into this particular problem. If it is even possible to do, it'll have to be someone else as I don't have the knowledge to take this further.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dairiki I found a way around it by resolving the forward refs when we get the dataclass fields. We now no longer call get_type_hints at all anymore as it's relevant code is internalized.

We were already looping over the mro and fields just like get_type_hints.

marshmallow_dataclass/generic_resolver.py Outdated Show resolved Hide resolved
* Differentiate between Generics and container type functions
* Tie get_args and get_origin functions to Annotated import.
* Rename function and add test to clarify forward ref use case
I don't want to loop over the fields multiple times so internalized the relevant code from typing.get_type_hints into the generic_resolver
@mvanderlee
Copy link
Contributor Author

Bump.

@Syntaf
Copy link

Syntaf commented Aug 1, 2024

+1 to this PR, love this @mvanderlee -- we were just recently dealing with this issue and ended up creating our own bespoke field that handled this, but if this gets merged in we'd love to bump our version and use this.

@lovasoa
Copy link
Owner

lovasoa commented Aug 1, 2024

@mvanderlee I'm really sorry, I don't have a lot of bandwidth.

@dairiki , maybe you can review this?

@dairiki
Copy link
Collaborator

dairiki commented Aug 1, 2024

@mvanderlee I'm really sorry, I don't have a lot of bandwidth.

@dairiki , maybe you can review this?

I apologize as well. For the past month I've been mostly out of town and swamped with other concerns and activities.
I'm home again now, but still trying to catch up. I will try to find time for this in the next couple of weeks (but no guarantees).

@mvanderlee
Copy link
Contributor Author

Gentle bump

@lovasoa
Copy link
Owner

lovasoa commented Sep 22, 2024

@dairiki , feel free to merge if you think it's okay

@dairiki
Copy link
Collaborator

dairiki commented Sep 23, 2024

@lovasoa @mvanderlee I apologize profusely for my lack of attention to this. (In my defense, I have been busy: travel, family and pet medical issues, and a lost month due to COVID infection.)

I've attempted to start a review on this a couple of times. Each time I get bogged down by what seems to be mostly a re-implementation of typing.get_type_hints into the new code in generic_resolver.py. (The re-implementation of get_type_hints, I think, is in _resolve_typevars. I haven't had to time to fully digest all of what's in generic_resolver.py, so could be wrong.)

I haven't found a great way to phrase this, so I will just say it:
My main concern is that of maintenance headache. I fear that the code in generic_resolver.py will be fragile with respect to upstream changes in implementation details of typing. (Where our current use of typing.get_type_hints, as well as the typing-inspect, and typeguard packages mostly — or, at least, partially — isolate us from those changes.)

I could be wrong about that. (I haven't yet fully digested the new code in generic_resolver.py.)

Or perhaps, the rest of you are not as bothered as I am with attempting to keep a level of abstraction (that is not under our maintenance) between our code and the implementation details of typing.

Feel free to discuss... But that is what I've been getting stuck on.

Copy link
Collaborator

@dairiki dairiki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few other minor comments I've had in the queue for awhile.

marshmallow_dataclass/__init__.py Outdated Show resolved Hide resolved
marshmallow_dataclass/__init__.py Outdated Show resolved Hide resolved
@@ -268,6 +289,8 @@ def add_schema(_cls=None, base_schema=None, cls_frame=None, stacklevel=1):
"""

def decorator(clazz: Type[_U], stacklevel: int = stacklevel) -> Type[_U]:
_check_decorated_type(clazz)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_check_decorate_type requires only that isinstance(clazz, type).
Do we want to require that clazz is a dataclass (isinstance(clazz, type) and dataclasses.is_dataclass(clazz)) here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so because we already have a big warning when a class is not a dataclass in _internal_class_schema
So non-dataclasses are allowed, but not supported

Copy link
Collaborator

@dairiki dairiki Sep 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just run a simple test.

You are correct in that decorating non-data class classes with add_schema works — at least in simple cases. (That doesn't necessarily mean that we should allow it.)

It appears, however, that, as things stand in this PR, no big warning is emitted in that case. Further investigation reveals that get_resolved_dataclass_fields "just works" (with no warnings emitted) even if its argument is not a dataclass. (I have some suspicion that perhaps fields from the classes __mro__ are not properly handled in that case, but I haven't looked close enough to say for sure.)

In any case, I think that either:

  • We should disallow using the add_schema decorator on non-dataclasses. (Why allow it if it's unsupported/untested?)
  • Or, we need a test to ensure the warning is, in fact, emitted when it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had to do some digging.

  1. In today's version, 8.7.1, I can call class_schema(NonDataclass) just fine. It only shows the warning if one of it's fields is not a dataclass. i.e.: Nested non dataclasses.
  2. This goes back to when the warning was originally added: e31faa8
  3. The behaviour still works the same with this PR.

I don't disagree with removing support for non-dataclasses, but don't see why that should be part of this PR.

@dairiki
Copy link
Collaborator

dairiki commented Sep 28, 2024

I fear that the code in generic_resolver.py will be fragile with respect to upstream changes in implementation details of typing.

As an example, how well does this PR cope with PEP 696 (coming October 7 in Python 3.13)?
(I don't know the answer. It may work fine.)

In any case, since the 3.13 release is imminent, we should be testing with that, too.

@mvanderlee
Copy link
Contributor Author

The generic_resolver itself does a lot more than get_type_hints Without it only shallow generics can be supported and any nesting or overrides can't be resolved.

The merging with get_type_hints was an optimization to remove some looping over the fields, but can be easily reverted. mvanderlee@4ef0bdf#diff-b59699b116d22f6ae7d49d5fc2084e62a695db0849c341c1a0840b7ce2202eefR139

The above 'merge' does break in 3.13 due to a warning being raised. The added type_params from 3.12 that will be required as an arg in typing._eval_type in 3.15 and trigger a warning in 3.13. It worked fine after I reverted to before that commit.

Adding support for TypeVars with defaults would probably be a fair bit of work mainly because of the recursion caused by allowing TypeVars as defaults. Or if the there are multiple defaults that should override each other.

@mvanderlee
Copy link
Contributor Author

mvanderlee commented Sep 29, 2024

@dairiki I have a second branch with the get_type_hints merge commit reverted and py 3.13 TypeVar defaults added:
https://github.com/mvanderlee/marshmallow_dataclass/tree/generic-dataclasses-get-type-hints

If that matches your expectations I can update this PR with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TypeError when creating a marshmallow schema from dataclass inherited from generic class
6 participants