Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: More lenient parsing #200

Open
Alfus opened this issue Nov 11, 2023 · 6 comments
Open

Feature Request: More lenient parsing #200

Alfus opened this issue Nov 11, 2023 · 6 comments

Comments

@Alfus
Copy link
Contributor

Alfus commented Nov 11, 2023

To provide good completion suggestions, an ast is needed to know if the cursor is in an option or not. However common completion cases do not parse in protocompile, for example:

  • An empty option block: int32 foo [<cursor>]; (done)
  • An option without an "= ": int32 foo [bar<cursor>]; (done)
  • An option with a trailing ",": int32 foo [deprecated = true, <cursor>]; (done)
  • An option without a trailing semicolon: int32 foo [bar<cursor>] (done)
  • An option name with a trailing ".": int32 foo [(bar.<cursor>)]; or int32 foo [foo.<cursor>] (done)
  • A type reference with a trailing ".": foo.<cursor>
@Alfus
Copy link
Contributor Author

Alfus commented Nov 13, 2023

Import cases:

  • empty import statement: import "<cursor>"; (done)
  • import statement without a trailing ';': import "<cursor>" (done)

Alfus added a commit that referenced this issue Dec 1, 2023
While still producing an error, allowing the rpc or option ast nodes to
still be generated. See
#200

Will follow up with diffs of a similar pattern for the other body
contexts.
@kralicky
Copy link

kralicky commented Dec 2, 2023

Hey @Alfus 👋
I have an implementation of this in my fork of protocompile, but I went about it differently, and I'm curious what your thoughts are.

I decided on changing the parser grammar to require semicolons to terminate most declarations, then inserting them from the lexer wherever they are technically required by the grammar. I found that having the grammar be as unambiguous as possible made it much less likely I'd run into shift/reduce problems, especially when combining this with other unrelated grammar changes like permitting trailing commas, extension names with mismatched parentheses ([foo = bar, (<cursor>]) etc.

IIRC there were also a couple places where I was unable to make the grammar unambiguous without doing something like treating '\n' as a token, which gets super weird.

There are some drawbacks to this method, mostly that handling syntax errors when they aren't actually syntax errors is trickier. But you might find this strategy easier overall.

Technically the "best" solution is rolling your own parser, but that's obviously a lot of work.

What are your thoughts?

Alfus added a commit that referenced this issue Dec 4, 2023
While still producing an error. See
#200
Alfus added a commit that referenced this issue Dec 4, 2023
Alfus added a commit that referenced this issue Dec 5, 2023
While still producing an error, see
#200.
Alfus added a commit that referenced this issue Dec 7, 2023
While still reporting an error, see:
#200
A little different from the others, as oneof and extensions don't
support empty decls.
Alfus added a commit that referenced this issue Dec 7, 2023
While still producing an error, see:
#200
Alfus added a commit that referenced this issue Dec 7, 2023
While still producing an error, see:
#200
@Alfus
Copy link
Contributor Author

Alfus commented Dec 8, 2023

I considered several techniques like this (for example injecting a special cursor token), though anything that modifies the input will also make source positions inaccurate.

Alfus added a commit that referenced this issue Dec 8, 2023
While still producing an error, see
#200.
Alfus added a commit that referenced this issue Dec 8, 2023
While still producing an error, see:
#200
Alfus added a commit that referenced this issue Dec 8, 2023
While still reporting an error. The important case for code completion
is extension type names. See
#200
@Alfus
Copy link
Contributor Author

Alfus commented Dec 8, 2023

Only remaining issue from the original list is making a field with only a type work:

message Foo {
  my.type.<cursor>
}

@kralicky
Copy link

kralicky commented Dec 8, 2023

Here is how I implemented that one:

messageFieldDecl : fieldCardinality notGroupElementTypeIdent identifier '=' _INT_LIT ';' {
		$$ = ast.NewFieldNode($1.ToKeyword(), $2, $3, $4, $5, nil, $6)
	}
	| fieldCardinality notGroupElementTypeIdent identifier '=' _INT_LIT compactOptions ';' {
		$$ = ast.NewFieldNode($1.ToKeyword(), $2, $3, $4, $5, $6, $7)
	}
	| msgElementTypeIdent identifier '=' _INT_LIT ';' {
		$$ = ast.NewFieldNode(nil, $1, $2, $3, $4, nil, $5)
	}
	| msgElementTypeIdent identifier '=' _INT_LIT compactOptions ';' {
		$$ = ast.NewFieldNode(nil, $1, $2, $3, $4, $5, $6)
	}
// new code below
	| fieldCardinality notGroupElementTypeIdent identifier '=' ';' {
		$$ = ast.NewIncompleteFieldNode($1.ToKeyword(), $2, $3, $4, nil, nil, $5)
	}
	| fieldCardinality notGroupElementTypeIdent identifier ';' {
		$$ = ast.NewIncompleteFieldNode($1.ToKeyword(), $2, $3, nil, nil, nil, $4)
	}
	| fieldCardinality notGroupElementTypeIdent ';' {
		$$ = ast.NewIncompleteFieldNode($1.ToKeyword(), $2, nil, nil, nil, nil, $3)
	}
	| msgElementTypeIdent identifier '=' ';' {
		$$ = ast.NewIncompleteFieldNode(nil, $1, $2, $3, nil, nil, $4)
	}
	| msgElementTypeIdent identifier ';' {
		$$ = ast.NewIncompleteFieldNode(nil, $1, $2, nil, nil, nil, $3)
	}
	| msgElementTypeIdent ';' {
		$$ = ast.NewIncompleteFieldNode(nil, $1, nil, nil, nil, nil, $2)
	}

(NewIncompleteFieldNode here returns the same *ast.FieldNode, but handles missing nodes differently)

Deciding what to do with invalid fields is tricky, I decided to skip them in the parser so they don't end up in descriptors. This means I can't ctrl-click or hover on them etc, but I can still handle them in the formatter, generate semantic tokens for them, use them in completion logic, etc.

Regarding source positions, I was initially concerned about that too, but so far I have not encountered any issues.

kralicky pushed a commit to kralicky/protocompile that referenced this issue Feb 7, 2024
While still producing an error, allowing the rpc or option ast nodes to
still be generated. See
bufbuild#200

Will follow up with diffs of a similar pattern for the other body
contexts.
kralicky pushed a commit to kralicky/protocompile that referenced this issue Feb 7, 2024
kralicky pushed a commit to kralicky/protocompile that referenced this issue Feb 7, 2024
kralicky pushed a commit to kralicky/protocompile that referenced this issue Feb 7, 2024
kralicky pushed a commit to kralicky/protocompile that referenced this issue Feb 7, 2024
…fbuild#216)

While still reporting an error, see:
bufbuild#200
A little different from the others, as oneof and extensions don't
support empty decls.
kralicky pushed a commit to kralicky/protocompile that referenced this issue Feb 7, 2024
kralicky pushed a commit to kralicky/protocompile that referenced this issue Feb 7, 2024
kralicky pushed a commit to kralicky/protocompile that referenced this issue Feb 7, 2024
kralicky pushed a commit to kralicky/protocompile that referenced this issue Feb 7, 2024
kralicky pushed a commit to kralicky/protocompile that referenced this issue Feb 7, 2024
While still reporting an error. The important case for code completion
is extension type names. See
bufbuild#200
@Alfus
Copy link
Contributor Author

Alfus commented Feb 29, 2024

New use case:

  • Missing field id: int32 foo;

Alfus added a commit that referenced this issue Mar 1, 2024
While still returning an error. See
#200
Note that the error occurs during validation instead of parsing to
eventually allow algorithmic field tag assignments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants