Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Galaxy Tool Report XML API #19039

Open
qchiujunhao opened this issue Oct 21, 2024 · 2 comments
Open

Galaxy Tool Report XML API #19039

qchiujunhao opened this issue Oct 21, 2024 · 2 comments

Comments

@qchiujunhao
Copy link

qchiujunhao commented Oct 21, 2024

Objective

Develop a new feature for the Galaxy platform that enables tool developers to create a standard HTML report by specifying a folder containing CSV and image files. This feature will extend the Galaxy tool development XML API, allowing tool developers to generate visually appealing reports automatically.

Feature Description

This feature will introduce a new attribute in the tool definition XML file that enables developers to specify a folder containing the output files. The API will then parse the files from the provided folder path and generate a report that includes tables (from CSV files) and images (e.g., plots). The goal is to offer an easy, flexible way for developers to produce comprehensive tool reports without writing additional custom reporting code.

Tool XML API Addition

A new <output> tag will be added to the XML API for tool development. Tool developers will define the output folder path for their reports in the following manner:

<output>
    <report name="tool_report" format="html" label="Custom Tool Report" input_folder="/path/to/folder_containing_files" ... />
</output>

name: Name of the report (e.g., “tool_report”).
format: Specifies the report format (currently supporting only HTML).
label: Custom label for the report.
input_folder: Path to the folder containing CSV files and images to be included in the report.

Report Generation Workflow

Tool Developer Configuration: The developer defines the output report by adding the tag to the tool XML file, specifying the folder containing the output files.

API File Processing: - The API will access the folder path provided in the tag. - It will search for CSV files and image files within the folder.

HTML Report Construction: - CSV files will be used to generate data tables, which will be included in the report. - Image files will be included in the report to provide visual insights into the tool’s output.

Output Rendering: The HTML report is generated and saved according to the developer’s specifications in the XML file.

Possible Implementation

Code Placement in Galaxy Code Base:

  • The code for XML Parsing and Configuration Handling should be placed in the section of the Galaxy codebase responsible for XML parsing and tool definition. Specifically, this should be added to the tool_util/parser module, which handles XML parsing for tools.
  • The File Handling and Data Extraction logic and HTML Report Generation functions should be added as a new module under lib/galaxy/tool_util/report. This will keep the report generation code modular and maintainable.
  • (need to do more research) The Integration with Galaxy code should be integrated into the Galaxy job execution system. Specifically, modifications should be made to ensure that the report generation is triggered after the job execution completes.
  1. XML Parsing and Configuration Handling:

    • Extend the existing Galaxy XML parser to include support for the new <report> tag under <output>.
    • Extract attributes such as name, format, label, and folder from the XML configuration.
    
    def parse_tool_xml(xml_path):
        ...
        return report_config
    
  2. Convert images to base64 format so that they can be embedded directly into the HTML report and be viewable anywhere.

import pandas as pd
from pathlib import Path

def get_files_from_folder(folder_path):
    folder = Path(folder_path)
    csv_files = list(folder.glob("*.csv"))
    image_files = list(folder.glob("*.png")) + list(folder.glob("*.jpg"))
    return csv_files, image_files

def convert_image_to_base64(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

def process_csv_to_html(csv_path):
    df = pd.read_csv(csv_path)
    table_name = Path(csv_path).stem
    return f"<h2>{table_name}</h2>" + df.to_html(index=False)
  1. HTML Report Generation:
    • Use Python's jinja2 templating engine to generate the HTML report.
    • Create a reusable HTML template that can dynamically include tables and images.
from jinja2 import Environment, FileSystemLoader
import base64

def generate_html_report(report_config, csv_files, image_files):
    env = Environment(loader=FileSystemLoader('templates'))
    template = env.get_template('report_template.html')

    tables = [process_csv_to_html(csv) for csv in csv_files]
    images = [
        {
            "data": convert_image_to_base64(image),
            "name": Path(image).stem
        } for image in image_files
    ]

    report_html = template.render(
        label=report_config["label"],
        tables=tables,
        images=images
    )
    with open(f"{report_config['name']}.html", "w") as f:
        f.write(report_html)

  • Example HTML template (templates/report_template.html):
<!DOCTYPE html>
<html lang="en">
<head>
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/normalize/8.0.1/normalize.min.css">
    <style>
        ...
    </style>
    <meta charset="UTF-8">
    <title>{{ label }}</title>
</head>
<body>
    <h1>{{ label }}</h1>
    {% for table in tables %}
        <div class="table-container">
            {{ table | safe }}
        </div>
    {% endfor %}
    {% for image in images %}
        <div class="image-container">
            <h2>{{ image.name }}</h2>
            <img src="data:image/png;base64,{{ image.data }}" alt="{{ image.name }}">
        </div>
    {% endfor %}
</body>
</html>
  1. Integration with Galaxy:
    • The generated HTML report should be saved in a location accessible to the Galaxy instance.
    • trigger the report generation function after the tool's execution completes.

Considerations

The implementation will leverage existing Galaxy workflow report code where possible to streamline development and avoid reinventing features that are already supported.

  • Templating System
  • ...
  • ...

However, the overall functionality required for this feature differs significantly from the existing workflow report code, which is tightly integrated with the Galaxy workflow system and designed for entire workflows rather than individual tool outputs. Therefore, a separate implementation is recommended to meet the requirements of this new feature.

Next Steps:

  • Define a mechanism for handling errors (e.g., missing folder, unsupported file types).
  • Test the feature in a variety of tool development scenarios to ensure robustness.
@jmchilton
Copy link
Member

I really think we don't want to do HTML for this. We have an ecosystem around a Galaxy Flavored Markdown. https://docs.google.com/presentation/d/1hftxpWrzKrNaPZV0VIn6bdKPPrAApWpp02VAWL-fig8/edit#slide=id.g63815beb57_0_311. These are slides I made for it years ago but the last slide I started thinking about this application and tool-side implication of the Markdown we allow for workflows and pages. This has several features that I think would make it better for this application. One is security - we are confident about rendering the Markdown safely so I think we can bypass all the security implications that this idea would entail. Beyond that - the feature I really want out of this is to be able to construct bits and pieces of "reporting" content in tools and then paste them together in a workflow report. The syntax for the Markdown references IDs in the page format but references workflow inputs and outputs in the workflow version. We could use all the same parsing and ideas that we do for referencing workflows inputs and outputs - for the tool inputs and outputs - the symmetry would be really nice.

@bgruening
Copy link
Member

@qchiujunhao this tool might actually help you for your use-case: galaxyproject/tools-iuc#6460

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants