Feature Request: More lenient parsing #200

Alfus · 2023-11-11T23:32:04Z

To provide good completion suggestions, an ast is needed to know if the cursor is in an option or not. However common completion cases do not parse in protocompile, for example:

An empty option block: int32 foo [<cursor>]; (done)
An option without an "= ": int32 foo [bar<cursor>]; (done)
An option with a trailing ",": int32 foo [deprecated = true, <cursor>]; (done)
An option without a trailing semicolon: int32 foo [bar<cursor>] (done)
An option name with a trailing ".": int32 foo [(bar.<cursor>)]; or int32 foo [foo.<cursor>] (done)
A type reference with a trailing ".": foo.<cursor>

The text was updated successfully, but these errors were encountered:

Alfus · 2023-11-13T20:54:38Z

Import cases:

empty import statement: import "<cursor>"; (done)
import statement without a trailing ';': import "<cursor>" (done)

While still producing an error, allowing the rpc or option ast nodes to still be generated. See #200 Will follow up with diffs of a similar pattern for the other body contexts.

kralicky · 2023-12-02T01:56:04Z

Hey @Alfus 👋
I have an implementation of this in my fork of protocompile, but I went about it differently, and I'm curious what your thoughts are.

I decided on changing the parser grammar to require semicolons to terminate most declarations, then inserting them from the lexer wherever they are technically required by the grammar. I found that having the grammar be as unambiguous as possible made it much less likely I'd run into shift/reduce problems, especially when combining this with other unrelated grammar changes like permitting trailing commas, extension names with mismatched parentheses ([foo = bar, (<cursor>]) etc.

IIRC there were also a couple places where I was unable to make the grammar unambiguous without doing something like treating '\n' as a token, which gets super weird.

There are some drawbacks to this method, mostly that handling syntax errors when they aren't actually syntax errors is trickier. But you might find this strategy easier overall.

Technically the "best" solution is rolling your own parser, but that's obviously a lot of work.

What are your thoughts?

While still producing an error. See #200

… options (#212) While still producing an error. See #200

While still producing an error, see #200.

While still reporting an error, see: #200 A little different from the others, as oneof and extensions don't support empty decls.

While still producing an error, see: #200

Alfus · 2023-12-08T02:13:15Z

I considered several techniques like this (for example injecting a special cursor token), though anything that modifies the input will also make source positions inaccurate.

While still producing an error, see #200.

While still producing an error, see: #200

While still reporting an error. The important case for code completion is extension type names. See #200

Alfus · 2023-12-08T17:59:41Z

Only remaining issue from the original list is making a field with only a type work:

message Foo {
  my.type.<cursor>
}

kralicky · 2023-12-08T20:28:40Z

Here is how I implemented that one:

messageFieldDecl : fieldCardinality notGroupElementTypeIdent identifier '=' _INT_LIT ';' {
		$$ = ast.NewFieldNode($1.ToKeyword(), $2, $3, $4, $5, nil, $6)
	}
	| fieldCardinality notGroupElementTypeIdent identifier '=' _INT_LIT compactOptions ';' {
		$$ = ast.NewFieldNode($1.ToKeyword(), $2, $3, $4, $5, $6, $7)
	}
	| msgElementTypeIdent identifier '=' _INT_LIT ';' {
		$$ = ast.NewFieldNode(nil, $1, $2, $3, $4, nil, $5)
	}
	| msgElementTypeIdent identifier '=' _INT_LIT compactOptions ';' {
		$$ = ast.NewFieldNode(nil, $1, $2, $3, $4, $5, $6)
	}
// new code below
	| fieldCardinality notGroupElementTypeIdent identifier '=' ';' {
		$$ = ast.NewIncompleteFieldNode($1.ToKeyword(), $2, $3, $4, nil, nil, $5)
	}
	| fieldCardinality notGroupElementTypeIdent identifier ';' {
		$$ = ast.NewIncompleteFieldNode($1.ToKeyword(), $2, $3, nil, nil, nil, $4)
	}
	| fieldCardinality notGroupElementTypeIdent ';' {
		$$ = ast.NewIncompleteFieldNode($1.ToKeyword(), $2, nil, nil, nil, nil, $3)
	}
	| msgElementTypeIdent identifier '=' ';' {
		$$ = ast.NewIncompleteFieldNode(nil, $1, $2, $3, nil, nil, $4)
	}
	| msgElementTypeIdent identifier ';' {
		$$ = ast.NewIncompleteFieldNode(nil, $1, $2, nil, nil, nil, $3)
	}
	| msgElementTypeIdent ';' {
		$$ = ast.NewIncompleteFieldNode(nil, $1, nil, nil, nil, nil, $2)
	}

(NewIncompleteFieldNode here returns the same *ast.FieldNode, but handles missing nodes differently)

Deciding what to do with invalid fields is tricky, I decided to skip them in the parser so they don't end up in descriptors. This means I can't ctrl-click or hover on them etc, but I can still handle them in the formatter, generate semantic tokens for them, use them in completion logic, etc.

Regarding source positions, I was initially concerned about that too, but so far I have not encountered any issues.

While still producing an error, allowing the rpc or option ast nodes to still be generated. See bufbuild#200 Will follow up with diffs of a similar pattern for the other body contexts.

While still producing an error. See bufbuild#200

… options (bufbuild#212) While still producing an error. See bufbuild#200

While still producing an error, see bufbuild#200.

…fbuild#216) While still reporting an error, see: bufbuild#200 A little different from the others, as oneof and extensions don't support empty decls.

While still producing an error, see: bufbuild#200

While still producing an error, see bufbuild#200.

While still producing an error, see: bufbuild#200

While still reporting an error. The important case for code completion is extension type names. See bufbuild#200

Alfus · 2024-02-29T15:57:16Z

New use case:

Missing field id: int32 foo;

While still returning an error. See #200 Note that the error occurs during validation instead of parsing to eventually allow algorithmic field tag assignments.

Alfus mentioned this issue Nov 17, 2023

Tolerate missing semicolons in the service body #206

Merged

This was referenced Dec 1, 2023

Tolerate empty compact options #211

Merged

Tolerate missing ';' after package, imports, file options, and method options #212

Merged

Tolerate missing ';' in enum elements #213

Merged

Alfus added a commit that referenced this issue Dec 4, 2023

Tolerate empty compact options (#211)

ec8c634

While still producing an error. See #200

Alfus added a commit that referenced this issue Dec 4, 2023

Tolerate missing ';' after package, imports, file options, and method…

e79e10e

… options (#212) While still producing an error. See #200

Alfus added a commit that referenced this issue Dec 5, 2023

Tolerate missing ';' in enum elements (#213)

a343bd3

While still producing an error, see #200.

This was referenced Dec 5, 2023

Tolerate missing ';' in oneof options/fields and extension fields #216

Merged

Tolerate missing ';' in message body #218

Merged

Tolerate missing '=' value in compact options #219

Merged

Alfus added a commit that referenced this issue Dec 7, 2023

Tolerate missing ';' in oneof options/fields and extension fields (#216)

340bf80

While still reporting an error, see: #200 A little different from the others, as oneof and extensions don't support empty decls.

Alfus added a commit that referenced this issue Dec 7, 2023

Tolerate missing ';' in message body (#218)

ebf3519

While still producing an error, see: #200

Alfus added a commit that referenced this issue Dec 7, 2023

Tolerate missing '=' value in compact options (#219)

1cb5ed7

While still producing an error, see: #200

This was referenced Dec 7, 2023

Tolerate trailing ',' in compact options #221

Merged

Tolerate trailing '.' in option names #222

Merged

Tolerate trailing '.' in unambiguous type name cases #224

Merged

Alfus added a commit that referenced this issue Dec 8, 2023

Tolerate trailing ',' in compact options (#221)

508b83b

While still producing an error, see #200.

Alfus added a commit that referenced this issue Dec 8, 2023

Tolerate trailing '.' in option names (#222)

4c57c26

While still producing an error, see: #200

Alfus added a commit that referenced this issue Dec 8, 2023

Tolerate trailing '.' in unambiguous type name cases (#224)

93a9ef1

While still reporting an error. The important case for code completion is extension type names. See #200

kralicky pushed a commit to kralicky/protocompile that referenced this issue Feb 7, 2024

Tolerate empty compact options (bufbuild#211)

3dbf5f1

While still producing an error. See bufbuild#200

kralicky pushed a commit to kralicky/protocompile that referenced this issue Feb 7, 2024

Tolerate missing ';' after package, imports, file options, and method…

de1e17f

… options (bufbuild#212) While still producing an error. See bufbuild#200

kralicky pushed a commit to kralicky/protocompile that referenced this issue Feb 7, 2024

Tolerate missing ';' in enum elements (bufbuild#213)

885d1ff

While still producing an error, see bufbuild#200.

kralicky pushed a commit to kralicky/protocompile that referenced this issue Feb 7, 2024

Tolerate missing ';' in message body (bufbuild#218)

93dcc4a

While still producing an error, see: bufbuild#200

kralicky pushed a commit to kralicky/protocompile that referenced this issue Feb 7, 2024

Tolerate missing '=' value in compact options (bufbuild#219)

7a7c2a1

While still producing an error, see: bufbuild#200

kralicky pushed a commit to kralicky/protocompile that referenced this issue Feb 7, 2024

Tolerate trailing ',' in compact options (bufbuild#221)

7dcb3dc

While still producing an error, see bufbuild#200.

kralicky pushed a commit to kralicky/protocompile that referenced this issue Feb 7, 2024

Tolerate trailing '.' in option names (bufbuild#222)

7186508

While still producing an error, see: bufbuild#200

kralicky pushed a commit to kralicky/protocompile that referenced this issue Feb 7, 2024

Tolerate trailing '.' in unambiguous type name cases (bufbuild#224)

3e95be1

While still reporting an error. The important case for code completion is extension type names. See bufbuild#200

Alfus mentioned this issue Feb 29, 2024

Tolerate missing field tags #246

Merged

Alfus added a commit that referenced this issue Mar 1, 2024

Tolerate missing field tags (#246)

f4c4a6f

While still returning an error. See #200 Note that the error occurs during validation instead of parsing to eventually allow algorithmic field tag assignments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: More lenient parsing #200

Feature Request: More lenient parsing #200

Alfus commented Nov 11, 2023 •

edited

Loading

Alfus commented Nov 13, 2023 •

edited

Loading

kralicky commented Dec 2, 2023

Alfus commented Dec 8, 2023 •

edited

Loading

Alfus commented Dec 8, 2023

kralicky commented Dec 8, 2023

Alfus commented Feb 29, 2024

Feature Request: More lenient parsing #200

Feature Request: More lenient parsing #200

Comments

Alfus commented Nov 11, 2023 • edited Loading

Alfus commented Nov 13, 2023 • edited Loading

kralicky commented Dec 2, 2023

Alfus commented Dec 8, 2023 • edited Loading

Alfus commented Dec 8, 2023

kralicky commented Dec 8, 2023

Alfus commented Feb 29, 2024

Alfus commented Nov 11, 2023 •

edited

Loading

Alfus commented Nov 13, 2023 •

edited

Loading

Alfus commented Dec 8, 2023 •

edited

Loading