Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utils/html: Provide full control of allowed HTML elements via the configuration file #1007

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pkvach
Copy link
Contributor

@pkvach pkvach commented Apr 16, 2024

Checklist

  • All new and existing tests are passing
  • (If docs changes needed:) I have updated the documentation accordingly.
  • I have added an entry to CHANGES.rst because this is a user-facing change or an important bugfix
  • I have written proper commit message(s)

What changes does this Pull Request introduce?

These changes provide full control over the management of allowed-elements and allowed-attributes through the configuration file.

  • Removed the hard-coded list of allowed HTML elements and attributes in the sanitizer, making it fully configurable through the configuration file.
  • Updated the default list of allowed elements and attributes in isso-dev.cfg, server-config.rst, and isso.cfg.
  • Adjusted tests in test_html.py to reflect the new configurable approach for allowed elements and attributes.

Why is this necessary?

Fixes #751

@pkvach pkvach changed the title utils/html: Remove the hard-coded list of allowed elements and attributes utils/html: Provide full control over the management of allowed-elements and allowed-attributes through the configuration file Apr 16, 2024
@pkvach pkvach force-pushed the fix/remove-hardcoded-allowed-elements-attributes branch from b2136e1 to fd92ccb Compare April 16, 2024 07:35
@pkvach pkvach changed the title utils/html: Provide full control over the management of allowed-elements and allowed-attributes through the configuration file utils/html: Provide full control of allowed-elements and allowed-attributes via the configuration file Apr 16, 2024
@pkvach pkvach force-pushed the fix/remove-hardcoded-allowed-elements-attributes branch from fd92ccb to 5cd17a7 Compare April 23, 2024 21:47
Copy link
Member

@ix5 ix5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for tackling this. I will write my more detailed thoughts on this down later.

For now:

  1. Keep in mind that this is a massive footgun for people who used this feature as intended and now all their styling is gone.
  2. The whole markdown rendering configuration is extremely un-intuitive and error prone. We essentially only pass through tunables to misaka and bleach instead of providing the admin with sane options.
  3. People expecting GitHub-flavored Markdown (GHFM) to just work are massively let down.

contrib/isso-dev.cfg Outdated Show resolved Hide resolved
docs/docs/reference/server-config.rst Outdated Show resolved Hide resolved
isso/utils/html.py Outdated Show resolved Hide resolved
@pkvach pkvach force-pushed the fix/remove-hardcoded-allowed-elements-attributes branch from 5cd17a7 to 1cd4a0c Compare April 24, 2024 18:59
@pkvach
Copy link
Contributor Author

pkvach commented Apr 25, 2024

Thank you for tackling this. I will write my more detailed thoughts on this down later.

For now:

  1. Keep in mind that this is a massive footgun for people who used this feature as intended and now all their styling is gone.
  2. The whole markdown rendering configuration is extremely un-intuitive and error prone. We essentially only pass through tunables to misaka and bleach instead of providing the admin with sane options.
  3. People expecting GitHub-flavored Markdown (GHFM) to just work are massively let down.

Thanks for the thoughtful feedback! I appreciate you taking the time to consider the impact on existing users.

Backwards compatibility is a valid concern. If you have any ideas on how we might mitigate disruption or implement the changes more seamlessly, I'd be happy to hear them.

@ix5
Copy link
Member

ix5 commented Apr 29, 2024

Backwards compatibility is a valid concern. If you have any ideas on how we might mitigate disruption or implement the changes more seamlessly, I'd be happy to hear them.

My biggest concern with this particular PR would be users who have copied a sample config file a few years ago that contained allowed-elements = and now nothing renders at all.

I have a few ideas, none of them perfect:

  1. Use a different parameter name (e.g. allowed-html-elements) for the "new" behavior and use it only if the new config key is non-empty
  2. Compare user-supplied allowed-elements against the default list and warn (in the the server logs) if the user-supplied list is missing default items, or if it is significantly shorter (e.g. it only contains 3 items, then the user most likely wanted the old behavior)

The config parser silently merging the default config with a user-supplied one also doesn't make things easier here.


As for general thoughts on our markdown rendering: Do we really want to make incremental improvements to this pile of hacks or would energy better be spent overhauling the thing and making it radically more user-friendly?

I'm happy to review and merge PRs for incremental fixes, but it feels more like treading water than tackling the real issue. Note also that both misaka as well as bleach (see here, thanks to #936 (comment)) are deprecated and might not survive the jump to Python 3.13 or whatever comes next.

@pkvach
Copy link
Contributor Author

pkvach commented May 3, 2024

Thank you for your thoughts on this.

  1. Use a different parameter name (e.g. allowed-html-elements) for the "new" behavior and use it only if the new config key is non-empty

I've been thinking about that option as well, and I'm leaning more towards it. I will try to move in that direction.

@pkvach pkvach marked this pull request as draft May 3, 2024 05:18
@pkvach pkvach marked this pull request as ready for review May 6, 2024 12:57
@pkvach pkvach changed the title utils/html: Provide full control of allowed-elements and allowed-attributes via the configuration file utils/html: Provide full control of allowed HTML elements via the configuration file May 6, 2024
@pkvach
Copy link
Contributor Author

pkvach commented May 6, 2024

Made changes to PR with alternative option: added new configuration option strictly-allowed-html-elements to specify only allowed HTML tags in generated output.

@ix5
Copy link
Member

ix5 commented May 22, 2024

Hey @pkvach, your solution is well-written code-wise, but I still would like to receive some (more) input from actual users regarding their experiences and expectations with markdown/HTML allowlist handling.

How do other commenting systems (Commento, Remark42, ...) handle element/attribute whitelisting, if at all?

@pkvach
Copy link
Contributor Author

pkvach commented May 24, 2024

Thanks for the feedback, @ix5 .
I agree that we can take our time with these changes, as there is no rush here.
As for other commenting systems, I can do some research to see how they handle whitelisting.

- Added new configuration option "strictly-allowed-html-elements" to specify only allowed HTML tags in the generated output.
- Allowed "mark" and "u" elements for "highlight" and "underline" Markup extensions.
- Updated "allowed-elements" in configuration files to include "tr".

Fixes isso-comments#751
@pkvach pkvach force-pushed the fix/remove-hardcoded-allowed-elements-attributes branch from 3cab0c4 to 863c4b7 Compare May 26, 2024 09:49
@pkvach
Copy link
Contributor Author

pkvach commented May 28, 2024

@ix5
Copy link
Member

ix5 commented Jun 30, 2024

Thank you for your extensive research! At the moment, I'm too much occupied with IRL things to delve deeper into this and make a decision. Maybe one of @jelmer, @BBaoVanC, @stefangehn @fluffy-critter would care to comment and try to form some kind of consensus.

Copy link
Contributor

@stefangehn stefangehn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the change in general but keeping both configuration variants around looks confusing to me. I would rather suggest to deprecate the old setting and replace it with the new one.

what about this logic:

  • introduce allowed-html-elements key and have a default value for it based on the hardcoded list in the middle of the code
  • add the list of allowed-elements on top of what the
    allowed-html-elements already provided
  • warn when any items from allowed-elements were added, pointing users to migrate their config to allowed-html-elements in the future
  • drop the allowed-elements key from the default/example configuration file so no new users start using it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[config] "allowed-elements", "allowed-markup" should replace, not amend default allowlists
3 participants