diff --git a/README.md b/README.md index ca1a00d..99491f1 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # Fornax Science Console -##What is the Fornax Science Console? +## What is the Fornax Science Console? The Fornax Science Console is a NASA-funded web-based application that provides access to a limited amount of cloud computing via JupyterLab, which offers access to Jupyter Notebooks, Jupyter Console, and the terminal (command line). Users will need to register to login to the system, but usage is free. Once logged in, users will have access to data sets curated by repositories around the world, and can upload moderate amounts of private data. To get started quickly, users can choose from a variety of example Jupyter notebooks as well as pre-installed software environments. These can be modified to suit user needs. @@ -27,134 +27,182 @@ Users of the Fornax Science Console will have access to data curated and publish Under construction: How can users get a list of pre-installed software without logging into the Fornax Science Console? ## Getting started -### How to get an account? - * The platform is currently available by invitation only. -### How to Log in? - * Log in at https://daskhub.fornaxdev.mysmce.com/ - ### How to choose an instance? - There are several options for the size of the compute. Please select the smallest that you can use for testing and exploration. Do not use the larger images unless you have already tested a smaller subset of the analysis on a smaller compute instance. -### How to end a session? - * Before loggin out, please shut down your server. This is an important step which insures the server you are using doesn't keep running in the background, thereby wasting resources. - * Go to `File` Menu and click on `hub control panel` as in the below image, which will bring up the option to `stop my server`(in red). After stopping the server, please `logout` in the upper right of the jupyterhub window. + +### How can I get an account on the Fornax Science Console? + +The platform is currently available by invitation only. + +### How do I log into the Fornax Science Console? + +Once you have your login credentials, enter them at: + +https://daskhub.fornaxdev.mysmce.com/ + +You will be offered several options for the size of the compute. Please use `mini` or `standard` size for writing, debugging, or testing code before switching to larger sizes for full runs of code at scale. `On demand` means an AWS server that starts when the user asks for it, and runs as long as you continue to use and pay for it. This is in contrast to the `spot` servers at AWS which are used for short runs and are cheaper, but can be revoked at any time (per AWS needs). Some of the options are marked as "Use with approval". Please contact the person that invited you to the platform to obtain permission to use these instances. + +### How do I end a JupyterHub session? + * Before logging out, please shut down your server. This is an important step which insures the server you are using doesn't keep running in the background, thereby wasting resources. + * Go to the `File` Menu and click on `hub control panel` as in the below image, which will bring up the option to `stop my server`(in red). After stopping the server, please `logout` in the upper right of the JupyterHub window. ![ ](./static/images/hub_control_panel.png) -### How to choose which size server to open upon login? - * Make sure to use `mini` or `standard` size for writing/debugging/testing before switching to larger sizes for full runs of code at scale - * `On demand` means an AWS server that starts when the user asks for it, and runs as long as you continue to use and pay for it. This is in contrast to the `spot` servers at AWS which are used for short runs and are cheaper, but can be revoked at any time (per AWS needs) - * 128 core: do not use unless given permission + ### What is a kernel and how to choose one? * In Jupyter, kernels are the background processes that execute cells and return results for display. * To select the kernel on which you want to run your Notebook, go to the Kernel menu and choose Change Kernel. You can also click directly on the name of the active kernel to switch to another. + ### How will my analysis be limited by Memory? *If your workload exceeds your server size, your server may be allowed to use additional resources temporarily. This can be convenient but should not be relied on. In particular, be aware that your job may be killed automatically and without warning if its RAM needs exceed the alloted memory. This behavior is not specific to Fornax or AWS, but users may encounter it more often on the science console due to the flexible machine sizing options. (Your laptop needs to have the max amount of memory that you will ever use while working on it. On the science console, you can choose a different server size every time you start it up -- this is much more efficient, but also requires you to be more aware of how much CPU and RAM your tasks need.) -## Navigating jupyter lab -### How to start a new notebook? +## Navigating JupyterLab + +### How do I start a new notebook? * The blue `+` in the upper left brings you to the launcher where you can start a new, empty notebook or open a terminal window ![new launcher](./static/images/new_launcher.png) -### How to get a terminal window? + +### How do I open a terminal window? * The blue `+` in the upper left brings you to the launcher where you can start a new notebook or open a terminal window ![terminal](./static/images/terminal.png) -### How to upload data into Fornax? + +### How do I upload data into Fornax? * The `uparrow` in the upper left allows you to upload data. If it is a large amount of data, consider creating a zip or tar archive first. ![upload_button](./static/images/upload_button.png) * From within Jupyter Lab, you can also use a terminal to transfer data with the usual methods (`scp`, `wget`, `curl` should all work). -### What are our storage limits for uploaded data? + +### What is the storage limit for uploaded data? * Current default is 10GB (Feb 2024) -### How to download data from the plaltform to my local machine? + +### How do I download data from the Fornax Science Console to my local machine? * If it is a large amount of data, consider creating a zip or tar archive first. If it is a small file, you can right click on the file name in the file browser and scroll to `Download` ![right_click_download](./static/images/right_click_download.png) + ### Home directory * When you log into the science console for the first time, the active directory is your `$HOME` directory. It contains preexisting folders like `efs/` and `s3/` with shared data. You may also create your own directories and files here. Your edits outside of the shared folders are not visible to other users. + ### Does work persist between sessions? * Files in your home directory will persist between sessions. * pip installs will persist across kernel restarts, but not across logging out and back in. * If you want software installs to be persistent, consider setting up an environment: See below under "Making a conda environment that persists across sessions" -### What is the info at the bottom of the jupyterlab window + +### What is the information at the bottom of the JupterLab window? * The github branch is listed as well as the name of the kernel in use - * the kernel is listed as either 'idle' or 'busy' which is useful to know if your kernel is working or has crashed. -### How to share data from inside Fornax with (international) collaborators? - * Download them to favorite storage place (university Box account) or put in AWS cloud (put $$ in your proposals to cover this) + * the kernel is listed as either 'idle' or 'busy' which is useful to know if your kernel is working or has crashed. + +### How do I share data from inside Fornax with collaborators? + +Download them to favorite storage place (university Box account) or put in AWS cloud. + ### Is there a way to go directly from Fornax to a University's Box account? - * Any publicly accessible web service can be reached from Fornax through the HTTPS protocol, e.g., APIs, wget, etc. + * Any publicly accessible web service can be reached from Fornax through the HTTPS protocol, e.g., APIs, wget, etc. + ### Is there a way to go directly from Fornax into a different AWS bucket that a project may pay for? - * Any publicly available bucket is visible from Fornax as it would be on your laptop. If you require an access key to see into the bucket from your laptop, you will also need that on Fornax. + * Any publicly available bucket is visible from Fornax as it would be on your laptop. If you require an access key to see into the bucket from your laptop, you will also need that on Fornax. + ### How to know what computing resources are available on Fornax? * in jupyter hub - open a terminal window by going to the file folder in the upper left, clicking on the plus sign * `nproc` will give you the number of processors * `cat /proc/cpuinfo` will give you more detailed info on the processors * `free -h` will give the amount of RAM available/used * `cat /proc/meminfo` will give more detailed info on the amount of RAM available/used -### How to save my notebook as a python script? + +### How can I save my notebook as a Python script? * from the command line: `jupyter nbconvert --to script notebookname.ipynb` + ### Save your work! - * the Fornax Science Console will cull servers after a user is inactive for a certain amount of time - + * the Fornax Science Console will cull servers after a user is inactive for a certain amount of time. + ### How long will the server stay active if not in use? -### How to run a notebook non-interactively? - * We are working on providing a job queue. -### How to open a plot (png, pdf, etc.) either generated by a notebook or uploaded ? + +Under Construction. + +### How can I run a notebook non-interactively? + +Under Construction. + +### How can I open a plot (e.g. png, pdf) that I generated in a notebook or uploaded? * double clicking on them in the file browser will open them in a new tab + ### Will notebooks that run on Fornax also work on my laptop? * In general, yes, but you need to have a python environment setup in the same way as on it is on Fornax. * see below under "Can I run the container from Fornax on my own personal computer/laptop?" + ### Who covers costs when working in Fornax? * NASA will pay for the work that you do, but please be mindful of those costs. + ### How to know what costs are being incurred? - * We are working on a cost dashboard. + * We are working on a cost dashboard. + ### Is it possible to do code development in emacs or vi or some other IDE? * Emacs or vi is possible from the terminal * The JupyterLab interface also has its own editor. - * If you prefer to develop elsewhere, you can push your changes to a publicly available repo (e.g., GitHub) and synchronize that to a location on your home directory on Fornax. -### Is there a limit to the number of packages a user can install? + * If you prefer to develop elsewhere, you can push your changes to a publicly available repo (e.g., GitHub) and synchronize that to a location on your home directory on Fornax. + +### Is there a limit to the number of packages I can install? * There is a limit on the space a user has access to, but not the number of packages, and packages are usually small. ## Data Access + ### How to add my own data? * see [above](#How-to-upload-data-into-fornax?) + ### Where should data be stored on Fornax? * See "Home Directory" [above](#Home-directory) -### How to access images in the cloud? + +### How can I access cloud-hosted images? * [Tutorial](https://github.com/spacetelescope/tike_content/blob/main/content/notebooks/data-access/data-access.ipynb) notebook on STScI data - * Where is Abdu's similar notebook with pyvo tools that was used for the July2023 HQ demo? - * placeholder for Brigitta's SIA access notebook -### How to access catalogs in the cloud? + * Under Construction: Where is Abdu's similar notebook with pyvo tools that was used for the July2023 HQ demo? + * Under Construction: placeholder for Brigitta's SIA access notebook + +### How can I access cloud-hosted catalogs? * [Tutorials on IRSA](https://irsa.ipac.caltech.edu/docs/notebooks/) ## Managing Software + ### Making a conda environment that persists across sessions * If the pre-installed environments don't have the software you need, you can create your own persistent environment available across multiple sessions. * follow [conda documentation](https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html) * specifically [managing environments](https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html#managing-envs) + ### How to get a list of what software is already pre-installed on the Fornax Science Console? - * Software is installed in miniconda environments. You can use "[conda list](https://conda.io/projects/conda/en/latest/commands/list.html)" to list the contents of each. + * Software is installed in miniconda environments. You can use "[conda list](https://conda.io/projects/conda/en/latest/commands/list.html)" to list the contents of each. + ### How to install my own software? * Persistent User-Installed Software * See above("Making a conda environment that persists across sessions") * Non-persistent User-Installed Software * you can !pip install your favorite software from inside a notebook. This installed software will stay through kernel restarts, but will not be persistent if you stop your server and restart it (logging out and back in) unless you specify the - - user option, which will put the software in your home directory. Note that an install done in one compute environment may or may not work in a container opened using another environment, even if the directory is still there. Conda environments are useful to manage these. * For the tutorial notebooks we tend to have a requirements.txt file in the repo which lists all the software dependencies. Then the first line in the notebook is `!pip install -r requirements.txt` That way other people can run the notebook and will know which software is involved. + ### What is the terminal command to list package version info using pip? * `pip show packagname` + ### Is it possible to launch apps from icons? Like MOPEX or SPICE * These apps are unavailable in Fornax + ### Is it possible to run licensed software (IDL) in Fornax? * licensed software is not possible in Fornax + ### Is it possible to bring my own docker image? - * This is not currently possible. + * This is not currently possible. + ### Is it possible to run the container from Fornax on my own personal computer/laptop? - * Yes. The images are all on the AWS Elastic Container Registry. **Need a link and more instructions** + * Yes. The images are all on the AWS Elastic Container Registry. + * + * Under Construction: Need a link and more instructions ## [Examples and Tutorials](https://fornax-navo.github.io/fornax-demo-notebooks/) + ### Fully worked science use cases * [Forced photometry](https://github.com/fornax-navo/fornax-demo-notebooks/tree/main/forced_photometry/) * [Light curves](https://github.com/fornax-navo/fornax-demo-notebooks/tree/main/light_curves/) * [ML dimensionality reduction](https://github.com/fornax-navo/fornax-demo-notebooks/blob/main/light_curves/ML_AGNzoo.md) + * ### Cloud * [STScI](https://github.com/spacetelescope/tike_content/blob/main/content/notebooks/data-access/data-access.ipynb) * [IRSA Cloud Access Introduction](https://irsa.ipac.caltech.edu/docs/notebooks/cloud-access-intro.html) * [Parquet info from IRSA](https://irsa.ipac.caltech.edu/docs/notebooks/wise-allwise-catalog-demo.html) * [Image cutouts](https://docs.astropy.org/en/stable/io/fits/usage/cloud.html#using-cutout2d-with-cloud-hosted-fits-files) + * ### Optimizing code for CPU usage (CPU profiling) * profiliing within Fornax is possible, however vizualizing the profile is not yet possible * profiling needs to be done on a .py script, and not a jupyter notebook @@ -163,21 +211,28 @@ Under construction: How can users get a list of pre-installed software without l * On your local computer command line: `python -m snakeviz output_profile_name.prof` * documentation for snakeviz: https://jiffyclub.github.io/snakeviz/ * This really only looks at CPU usage + ### Optimizing code for memory usage [(memory profiling)](https://towardsdatascience.com/profile-memory-consumption-of-python-functions-in-a-single-line-of-code-6403101db419) * inside the notebook: * `pip install -U memory_profiler` * `from memory_profiler import profile` * above the function you want to check add this line: @profile * run the script: python -m memory_profiler .py > mem_prof.txt + ### Optimizing code for multiple CPUs with parallelization * Python built in [multiprocessing](https://irsa.ipac.caltech.edu/docs/notebooks/Parallelize_Convolution.html) * [Dask gateway](https://gateway.dask.org) - * How to [scale up](Troy's new notebook) a notebook to big data + * How to [scale up] a notebook to big data + ### [MAST science examples](https://github.com/spacetelescope/tike_content/blob/main/markdown/science-examples.md) + ### HEASARC [sciserver_cookbooks](https://github.com/HEASARC/sciserver_cookbooks/blob/main/Introduction.md) + ### [Cross matching two large catalogs](https://github.com/IPAC-SW/ipac-sp-notebooks/blob/main/gaia_cross_SEIP/gaia_cross_SEIP.ipynb) + ### [Work with theoretical catalogs](https://irsa.ipac.caltech.edu/data/theory/Cosmosims/gator_docs/CosmoDC2_Mock_V1_Catalog.html) -### How should users contribute to tutorials? + +### How can I contribute to existing Open-Source Fornax notebook tutorials? * open issue or PR on Fornax Github [repo](https://github.com/fornax-navo/fornax-demo-notebooks) ## Troubleshooting @@ -193,10 +248,11 @@ Under construction: How can users get a list of pre-installed software without l ## Parallel and Distributed Processing Since one of the main drivers for using Fornax is the advantage of multiple CPUs, we provide here additional information on how to efficiently use those CPUs. + ### Terminology -* CPU - * Processing chip. Quality and number of CPU determines compute power -- the rate at which computations can be performed. +* CPU: + * Central Processing Unit. Quality and number of CPU determines compute power -- the rate at which computations can be performed. * Node * A single machine within a network of machines used for distributed processing. * Parallel or Distributed Processing @@ -206,7 +262,7 @@ Since one of the main drivers for using Fornax is the advantage of multiple CPUs * Worker * An entity that completes a chunk of work (data + instructions). It runs in the background and must be managed using (e.g.,) python ``multiprocessing`` or Dask. -### When to use distributed or parallel processing? +### When should I use distributed or parallel processing? 1. Your dataset is very large, but could be split into subsets that can be processed individually. * The [forced photometry notebook](https://github.com/fornax-navo/fornax-demo-notebooks/blob/main/forced_photometry/multiband_photometry.md) is an example of this. It gathers a large number of images and then processes them all using the same piece of code (photometry extraction). The pipeline is parallelized by running workers that execute the same code on different images.