Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

top-level functions for reading, creating data #2463

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Nov 4, 2024

adds async and sync read_array, read_group, and read functions. read_array will read an array from storage. read_group does the same, but for groups, and read will try to read arrays or groups at the storage path. These functions wrap their respective creation functions, passing mode=r in each case.

I still need to add tests.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@d-v-b d-v-b changed the title Feat/read funcs functions for reading data Nov 4, 2024
@d-v-b
Copy link
Contributor Author

d-v-b commented Nov 4, 2024

in writing this PR I noticed that we don't have a create_array function, but we do have create_group. I don't like the asymmetry. We should fix that (by adding a create_array func).

@d-v-b
Copy link
Contributor Author

d-v-b commented Nov 5, 2024

in writing this PR I noticed that we don't have a create_array function, but we do have create_group. I don't like the asymmetry. We should fix that (by adding a create_array func).

I'm wrong, we don't have create_group! But we should! I think we should actually deprecate file-mode semantics across the board in favor of create_x, read_x functions.

@d-v-b d-v-b marked this pull request as ready for review November 5, 2024 16:44
@d-v-b
Copy link
Contributor Author

d-v-b commented Nov 5, 2024

This is ready for review. To summarize:

I created 5 new top-level exports:

  • read_array
  • read_group
  • read
  • create_array
  • create_group

The goal for these functions is to directly support common access patterns without forcing users to worry about the semantics of the mode keyword, and to guard against accessing zarr data with an unintended access mode (e.g., getting mutable access to an array you only want to read).

@d-v-b d-v-b changed the title functions for reading data top-level functions for reading, creating data Nov 5, 2024
@d-v-b d-v-b requested review from jhamman and TomAugspurger and removed request for jhamman November 5, 2024 17:33
src/zarr/core/array.py Outdated Show resolved Hide resolved
src/zarr/core/group.py Outdated Show resolved Hide resolved
tests/test_api.py Outdated Show resolved Hide resolved
@d-v-b d-v-b requested a review from dstansby November 7, 2024 19:58
@d-v-b
Copy link
Contributor Author

d-v-b commented Nov 12, 2024

Some feedback from the community meeting last week:

  • One point of contention in this PR was the polymorphic read function, which returns either an array or group, depending on what's in the hierarchy. Personally I think this is invaluable for navigating unknown Zarr hierarchies, but others thought it was unnecessary or confusing. Does anyone else have thoughts?
  • Should this PR contain deprecation warnings for functions that are being made less essential by this PR (e.g., open and create)?

@d-v-b
Copy link
Contributor Author

d-v-b commented Nov 22, 2024

from the community meeting:

  • create_array should alias create
  • we should use these new API functions to compactly express sharding, e.g. create_array(shards=..., chunks=...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant