Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data access: add and clarify the documentation #27

Open
troyraen opened this issue Apr 26, 2024 · 2 comments
Open

Data access: add and clarify the documentation #27

troyraen opened this issue Apr 26, 2024 · 2 comments

Comments

@troyraen
Copy link
Contributor

Specific suggestions and questions to think about:

  1. State clearly that all (public) NASA data is accessible from Fornax, regardless of where the data lives (archive's in-house or cloud storage).
  2. Explain when and why users should care about where the datasets live.
  3. Use case 1: "I really care about X dataset and already know how to access it."
    • More than likely, they're accessing in-house data. What basic instructions can we provide to help them determine when and why they should go to the effort of looking for it somewhere else (i.e., in cloud storage)?
  4. Use case 2: "I really care about Y targets (stars, galaxies, ...) and am doing a mass search for data in NASA archives."

Some background:

Currently ~all of the data being put in cloud storage by NASA archives is a copy of what they're already serving from their in-house storage. If the user wants to access the cloud copy, they'll usually have to make an explicit choice to do this. But they've probably never had to make this kind of choice before and the current documentation is not very clear about what is available from where. Users may assume that if the data is available in cloud storage, that's what they'll automatically be accessing without having to do anything different or proactive.

This confusion is compounded when we point users to the NASA-NAVO Workshops Notebooks. It is a very useful overview for NASA data, but AFAIK it doesn't contain any information about accessing data from cloud storage. Since the Fornax documentation emphasizes cloud-hosted data and NAVO documentation doesn't even mention it, this can lead the user to assume that by following the NAVO tutorials they are already accessing cloud-hosted data.

@jkrick
Copy link
Collaborator

jkrick commented Apr 29, 2024

Users may assume that if the data is available in cloud storage, that's what they'll automatically be accessing without having to do anything different or proactive.

Can we make it a thing that all users in Fornax automatically accesses cloud data if it exists? Some environmental variable type idea? Because I think that would help keep much of this information away from the user.

@troyraen
Copy link
Contributor Author

troyraen commented Apr 30, 2024

That would be great, though I don't see a solution right now. The ways I know how to tell people to load data all involve having to know the path (and thus, the location) like pd.read_parquet('path-to-catalog') or astropy.io.fits.open('path-to-image'). Also, accessing cloud storage sometimes requires different arguments to handle the different filesystem and/or the permissions/credentials.

I don't think any one solution will ever work for all use cases because there are so many different ways to access data. But potentially in a more narrow context... I know there is ongoing cloud-access related work happening in the astropy + VO universe. Maybe some option for this is envisioned? I'm not up on the details enough to know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants