-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct datatypes for string expressions #1636
base: master
Are you sure you want to change the base?
Correct datatypes for string expressions #1636
Conversation
Co-authored-by: Johannes Kalmbach <[email protected]>
Co-authored-by: Johannes Kalmbach <[email protected]>
Update from original Qlever
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1636 +/- ##
==========================================
- Coverage 89.25% 89.21% -0.05%
==========================================
Files 372 374 +2
Lines 34818 35768 +950
Branches 3931 4044 +113
==========================================
+ Hits 31076 31909 +833
- Misses 2470 2548 +78
- Partials 1272 1311 +39 ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
Hi, You can above click at |
auto resultLiteral = ExportQueryExecutionTrees::idToLiteralOrIri( | ||
qec->getIndex(), id, LocalVocab{}); | ||
EXPECT_EQ(resultLiteral.value().toStringRepresentation(), | ||
"\"some^^<http://www.w3.org/2001/XMLSchema#string>\""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have a typo here, the some must be in quotes so "\"some\"^^<http...>"
("some"^^ without the surrounding technical quotes). Please don't forget to also adapt this in your input kg.
"\"some^^<http://www.w3.org/2001/XMLSchema#string>\""); | ||
} | ||
|
||
// TODO: Case Literal With Datatype not equal String |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly the same thing, use "\"some\"^^<http://www.dadudeldu.com/NoSuchDatatype>"
, It was only a typo in the syntax of your literals with the escaping..
Conformance check passed ✅Test Status Changes 📊
|
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A first round on everything but the tests.
Work on my comments, and contact me once you are done or are left with questions.
// Converts an Id to a LiteralOrIri based on its type and value. | ||
// For VocabIndex or LocalVocabIndex: Return Literal or Iri. If | ||
// `onlyReturnLiteralsWithXsdString` is true, return only literals (no IRIs) | ||
// with no datatype or datatype `xsd:string`; otherwise, return any literal, | ||
// but strip datatypes other than `xsd:string`. For Double, Int, Bool, Date, | ||
// or GeoPoint: Return the literal without the datatype. If | ||
// `onlyReturnLiteralsWithXsdString` is true return `std::nullopt`. For | ||
// Undefined Id: Always return `std::nullopt` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment can be shortened to be more concise and precise:
- You never return the datatypes, (always strip them) unless they are
xsd:string
, so if the literal has any other datatype (this includes IDs that directly store their value, like Doubles), the datatypes are always empty. - If the
onlyReturn...
is set, then all IRIs and all literals that have a datatype that is notstring
(again including the encoded IDs)become
nullopt. Then also add, that these semantics are useful for the string expressions in
StringExpressions.cpp`.
// thrown. | ||
// If `onlyReturnLiteralsWithXsdString` is `true`, returns `std::nullopt`. | ||
// If `onlyReturnLiteralsWithXsdString` is `false`, removes datatypes from | ||
// literals (e.g., `42^^xsd:integer` becomes `"42"`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// literals (e.g., `42^^xsd:integer` becomes `"42"`). | |
// literals (e.g. the integer `42` is converted to the plain literal `"42"`). |
static std::optional<LiteralOrIri> idToLiteralOrIriForEncodedValue( | ||
Id id, bool onlyReturnLiteralsWithXsdString = false); | ||
|
||
// Checks and processes a LiteralOrIri based on the given flags. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Checks and processes a LiteralOrIri based on the given flags. | |
// A helper function for the `<InsertNameHere>` function. ... remainder of comment. |
ExportQueryExecutionTrees::idToLiteralOrIriForEncodedValue( | ||
Id id, bool onlyReturnLiteralsWithXsdString) { | ||
auto optionalStringAndType = idToStringAndTypeForEncodedValue(id); | ||
if (!optionalStringAndType || onlyReturnLiteralsWithXsdString) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First check for onlyReturn
before you call the somewhat expensive idTo...
function.
std::string_view( | ||
reinterpret_cast<const char*>(word.getDatatype().data()), | ||
word.getDatatype().size()) == XSD_STRING; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this exactly the asStringViewUnsafe(word.getDatatype()) == XSD_STRING
function?
// Value getter for `isBlank`. | ||
struct IsBlankNodeValueGetter : Mixin<IsBlankNodeValueGetter> { | ||
using Mixin<IsBlankNodeValueGetter>::operator(); | ||
Id operator()(ValueId id, const EvaluationContext*) const { | ||
return Id::makeFromBool(id.getDatatype() == Datatype::BlankNodeIndex); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an artifact of you not having merge in the master branch for quite some time?
struct LiteralOrIriValueGetter : Mixin<LiteralOrIriValueGetter> { | ||
using Mixin<LiteralOrIriValueGetter>::operator(); | ||
|
||
std::optional<LiteralOrIri> operator()(ValueId, | ||
const EvaluationContext*) const; | ||
|
||
std::optional<LiteralOrIri> operator()(const LiteralOrIri& s, | ||
const EvaluationContext*) const { | ||
return s; | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also need a second valueGetter which has the semantics of the onlyReturnXsdStrings
.
(so one value getter for each of your configurations of the configurations of your functions in ExportQueryExecutionTrees.cpp
.
std::optional<LiteralOrIri> LiteralOrIriValueGetter::operator()( | ||
Id id, const EvaluationContext* context) const { | ||
// true means that immediately returns nullopt for everything that is not a | ||
// literal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is out of sync. it says true means
but the only thing I see in the call is a false
.
if (s->isLiteral()) { | ||
if (s->hasLanguageTag()) { | ||
descriptor = std::string(asStringViewUnsafe(s->getLanguageTag())); | ||
} else if (s->hasDatatype()) { | ||
descriptor = | ||
ad_utility::triple_component::Iri::fromIrirefWithoutBrackets( | ||
asStringViewUnsafe(s->getDatatype())); | ||
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be more efficient ( you get your own copy of the LiteralOrIri after all), to not create a new LiteralOrIri,
But to implement the SubStr
directly inside the Literal
class, s.t. you can reuse the allocation etc, this should make things somewhat faster. So a method Literal::setSubtr(...)
would be the way to go.
using SubstrExpression = NARY<3, FV<SubstrImpl, LiteralOrIriValueGetter, | ||
NumericValueGetter, NumericValueGetter>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make the case SUBSTR(STR(?something))
more efficient, you can implement something like the StringExpressionImpl
, which does exactly the same thing, but uses your two value getters (the one is still missing), depending on whether the child is a string or not.
Ich weiß leider immer noch nicht, wie ich ich zum Testen ein Literal mit Datatype erzeugen kann.
Also in Zeile ExportExecutionTreesTest.cpp Zeile 1678 + Zeile 1686.
Gibt es hier in Github die Möglichkeit, dass ich zu einzelnen Codestellen Kommentare (Fragen an dich) einfüge, damit es übersichtlicher ist. Ich habe die Funktion nur in den einzelnen Commits gefunden, aber bis man die dann findet. Also wie mache ich es am besten?