Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

option to disable escaping #33

Closed
4 tasks done
ninja7dev opened this issue Jul 12, 2024 · 10 comments
Closed
4 tasks done

option to disable escaping #33

ninja7dev opened this issue Jul 12, 2024 · 10 comments
Labels
🤷 no/invalid This cannot be acted upon 👎 phase/no Post cannot or will not be acted on

Comments

@ninja7dev
Copy link

Initial checklist

Problem

I'm trying to extract the text from a markdown chunk and getting "H\&M" instead of "H&M".
The actual code:

import { remark } from 'remark';
import strip from 'strip-markdown';

const content = remark()
  .use(strip)
  .processSync('**H&M**');
console.log(String(content));

Solution

Could we please make an option to disable the char escaping, for those who need it?
I saw some older issues/discussions created around this repo but couldn't find a working solution.

Alternatives

@github-actions github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Jul 12, 2024
@wooorm
Copy link
Member

wooorm commented Jul 12, 2024

Why do you want what you want?

Indeed, this plugin still generates markdown. That is intended. It removes the working things, so headings turn into paragraph.

Some humans want what you want. Some users want what is happening right now.

It’s not possible to do what you want here. This is not the place that deals with escapes.

@ninja7dev
Copy link
Author

ninja7dev commented Jul 12, 2024

Why do you want what you want?

I need to display to the users a plain-text representation from the markdown. Hence, it should be user-friendly, easy to read, the text from headings and paragraphs should have their own rows etc.

Some humans want what you want. Some users want what is happening right now.

I understand and agree. In fact this is exactly why I was asking for a configurable option, so the default behavior doesn't need to be changed. I went through the older discussions that were related to escaping.

It’s not possible to do what you want here. This is not the place that deals with escapes.

Okay. Could you please point me to the right direction, how could i achieve what i need, using remark and maybe some of the related plugins?
I find it a bit hard to believe that this isn't doable this with such variety of tools.

@wooorm
Copy link
Member

wooorm commented Jul 15, 2024

I find it a bit hard to believe that this isn't doable this with such variety of tools.

It’s vague what “this” is. You call it “plain-text representation”. HTML is a standard. Markdown mostly standards but also at least reference implementations and agreements. Plain-text is not a standard.

For me to answer you thoroughly, we’d need to cover all possible markdown input cases. Talk about what output you want for each possibility. Then we can make that. Now I don’t know what “representation” you have in your head.

I need to display to the users a plain-text representation from the markdown.

As you say “display”, why not get HTML out:

import { unified } from "unified";
import remarkParse from "remark-parse";
import stripMarkdown from "strip-markdown";
import remarkRehype from "remark-rehype";
import rehypeStringify from "rehype-stringify";

const file = unified()
  .use(remarkParse)
  .use(stripMarkdown)
  .use(remarkRehype)
  .use(rehypeStringify)
  .processSync("**H&M**");

console.log(String(file));
<p>H&#x26;M</p>

@ninja7dev
Copy link
Author

Let me try to clarify: what i mean by plain-text representation / version is, actually, only the text with no markup (be that markdown or HTML).
Examples:
markdown heading -> normal text
bold text -> normal text
italic text -> normal text
blockquote -> normal text
paragraph -> normal text
image -> removed
link -> normal text

However, as i mentioned in a previous message, a block part like the heading should be on its own row, not mixing up with the next paragraph, for example.

@wooorm
Copy link
Member

wooorm commented Jul 15, 2024

Well, that’s all a bit short, but I think that matches the HTML example I have, as it gives you HTML that you can display to users, those <p>s are the paragraphs you talk about.

@remcohaszing
Copy link
Member

@ninja7dev It looks like you’re not trying to produce markdown, so you don’t need to handle escaping. I think you’re looking for something like this:

import {fromMarkdown} from 'mdast-util-from-markdown'
import {toString} from 'mdast-util-to-string'

const ast = fromMarkdown('**H&M**')
const content = toString(ast)

@ninja7dev
Copy link
Author

Well, that’s all a bit short, but I think that matches the HTML example I have, as it gives you HTML that you can display to users, those <p>s are the paragraphs you talk about.

It seems like this still returns markup (HTML).
I was looking to obtain, as i wrote in my previous message, only the text with no markup (be that markdown or HTML). (i meant only the text with neither markdown, nor HTML, nor any other type of markup.

@ninja7dev It looks like you’re not trying to produce markdown, so you don’t need to handle escaping. I think you’re looking for something like this:

import {fromMarkdown} from 'mdast-util-from-markdown'
import {toString} from 'mdast-util-to-string'

const ast = fromMarkdown('**H&M**')
const content = toString(ast)

Thank you, but I've had previously tried this solution. It results in inlining headings, paragraphs, lists into a big paragraph of text.
It's not preserving the text from the block type parts on their own rows.

@remcohaszing
Copy link
Member

Thank you, but I've had previously tried this solution. It results in inlining headings, paragraphs, lists into a big paragraph of text. It's not preserving the text from the block type parts on their own rows.

So I understand it’s close to what you want? It’s just not exactly what you want. mdast-util-to-string is not very complex. You can create your own implementation and use that as a base.

@wooorm wooorm closed this as not planned Won't fix, can't repro, duplicate, stale Sep 19, 2024
@wooorm wooorm added the 🤷 no/invalid This cannot be acted upon label Sep 19, 2024

This comment has been minimized.

@wooorm
Copy link
Member

wooorm commented Sep 19, 2024

Closing as this issue seems not actionable after discussing it for a while.

@github-actions github-actions bot added 👎 phase/no Post cannot or will not be acted on and removed 🤞 phase/open Post is being triaged manually labels Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤷 no/invalid This cannot be acted upon 👎 phase/no Post cannot or will not be acted on
Development

No branches or pull requests

3 participants