Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate treenode module. #8757
Migrate treenode module. #8757
Changes from 6 commits
065db0e
f167b99
6bd492f
b62a21a
32053b6
32e7453
7f2a178
b4fb773
548536b
b9685d7
59da654
b821fca
4530973
c03d373
ccd5374
4830cd4
e8459bc
4238ce2
db264c0
b4acb0d
0f4b38a
e7ea2b4
f2f327f
9cbaf3b
9db7040
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're exposing it (and keep it as a pure path, which I think we should), should we move this class to a separate module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we exposing
NodePath
? I suggested it, but after talking with @etienneschalk decided maybe it wasn't particularly helpful xarray-contrib/datatree#205.What's the logic for making it a pure path? The posix path is to ensure the slashes are in a consistent direction (remember this isn't a real filepath so shouldn't change automatically on windows systems).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this, but I believe the difference is that
PurePosixPath
doesn't have methods that require read/write access (liketouch
,remove
,mkdir
,iterdir
, ...), whilePosixPath
does.The main difference between filesystems and
datatree
is that there's only a single filesystem instance at a time, while you can have multipleDataTree
objects at the same time. This means that aNodePath
inheriting fromPosixPath
would have to be created from theDataTree
object and would require a lot of discipline to not be confusing.On the other hand, a
NodePath
that inherits fromPurePosixPath
would help a bit with path manipulations (name
,root
,joinpath()
,parts
,parent
,parents
, ...) that we would otherwise have to useposixpath
for. I didn't check ifposixpath
has the same functionality asPurePosixPath
, though, and they might also not be as convenient to use.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I wrote my above comment too quickly - we both agree it should stay as a pure path.
We could also consider adding extra methods to this
NodePath
class if it simplifies other parts of the code.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related issue: xarray-contrib/datatree#283 (Consistency between DataTree methods and pathlib.PurePath methods)
To me the graal would be to have datatree more and more compatible with
PurePosixPath
(whatNodePath
already does by inheriting from it, but we can imagine having an API accepting both strings andPurePosixPath
, without the need to exposeNodePath
publicly).As @keewis mentioned , there is a single instance of the filesystem whereas there can be multiple DataTrees. So a single DataTree could act like a "small filesystem on its own" and implement the
PosixPath
methods.The idea here is to delegate to
pathlib
the diffcult task of choosing the best names for methods that help manipulating trees. For an audience of developers that are already used to writing scripts withpathlib
, this would make working with datatree automatic! And this also gives plenty of ideas of features. Also, with hierarchical-formats such as Zarr that rely on the filesystem, openable by datatree, the proximity is innate.Concrete example: working with multi-resolution rasters (a usecase of datatree). By using f-strings and glob, you can write resolution independant code once, like you would have done by manipulating a data structure on the file system
The schema on the pathlib doc (arrows are inheritance):
How I see it:
(I removed
Path
for simplification)I am not 100% sure it fully makes sense... The nuance is that when you instanciate a Path, it is always implicit that it is bound to the filesystem, while each DataTree instantiation creates its own data representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @etienneschalk for all your thoughts here!
I strongly feel that DataTree / NodePath objects should not be associated with concrete filesystem paths.
However this
seems like a reasonable idea, and I think we should continue the discussion of that in xarray-contrib/datatree#283.
For now I don't think there is anything we need to do in this PR specifically - we can revisit the API choices / accepting path-like types in a future PR.