Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct datatypes for string expressions #1636

Draft
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

DuDaAG
Copy link
Contributor

@DuDaAG DuDaAG commented Nov 22, 2024

Ich weiß leider immer noch nicht, wie ich ich zum Testen ein Literal mit Datatype erzeugen kann.

Also in Zeile ExportExecutionTreesTest.cpp Zeile 1678 + Zeile 1686.

Gibt es hier in Github die Möglichkeit, dass ich zu einzelnen Codestellen Kommentare (Fragen an dich) einfüge, damit es übersichtlicher ist. Ich habe die Funktion nur in den einzelnen Commits gefunden, aber bis man die dann findet. Also wie mache ich es am besten?

Copy link

codecov bot commented Nov 22, 2024

Codecov Report

Attention: Patch coverage is 78.40909% with 19 lines in your changes missing coverage. Please review.

Project coverage is 89.21%. Comparing base (6fba76f) to head (52ef1f5).
Report is 12 commits behind head on master.

Files with missing lines Patch % Lines
src/engine/ExportQueryExecutionTrees.cpp 81.81% 3 Missing and 7 partials ⚠️
...e/sparqlExpressions/SparqlExpressionValueGetters.h 33.33% 6 Missing ⚠️
src/engine/sparqlExpressions/StringExpressions.cpp 85.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1636      +/-   ##
==========================================
- Coverage   89.25%   89.21%   -0.05%     
==========================================
  Files         372      374       +2     
  Lines       34818    35768     +950     
  Branches     3931     4044     +113     
==========================================
+ Hits        31076    31909     +833     
- Misses       2470     2548      +78     
- Partials     1272     1311      +39     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@joka921
Copy link
Member

joka921 commented Nov 22, 2024

Hi, You can above click at Files changed, then you get the changed files (by default from all commits, which is fine for what you want), then you can select rows and attach comments to those. Make sure to finish your(self-review) if the comments are marked as "pending", because otherwise we cannot see them.

auto resultLiteral = ExportQueryExecutionTrees::idToLiteralOrIri(
qec->getIndex(), id, LocalVocab{});
EXPECT_EQ(resultLiteral.value().toStringRepresentation(),
"\"some^^<http://www.w3.org/2001/XMLSchema#string>\"");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a typo here, the some must be in quotes so "\"some\"^^<http...>" ("some"^^ without the surrounding technical quotes). Please don't forget to also adapt this in your input kg.

"\"some^^<http://www.w3.org/2001/XMLSchema#string>\"");
}

// TODO: Case Literal With Datatype not equal String
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly the same thing, use "\"some\"^^<http://www.dadudeldu.com/NoSuchDatatype>", It was only a typo in the syntax of your literals with the escaping..

@sparql-conformance
Copy link

Conformance check passed ✅

Test Status Changes 📊

Number of Tests Previous Status Current Status
2 Failed Intended

Details: https://qlever.cs.uni-freiburg.de/sparql-conformance-ui?cur=52ef1f59404472318302664cdde0a3b45e97d91f&prev=5f28e832043b22102ad6e6e2a90bf93ff6846169

Copy link

sonarcloud bot commented Nov 24, 2024

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A first round on everything but the tests.
Work on my comments, and contact me once you are done or are left with questions.

Comment on lines +73 to +80
// Converts an Id to a LiteralOrIri based on its type and value.
// For VocabIndex or LocalVocabIndex: Return Literal or Iri. If
// `onlyReturnLiteralsWithXsdString` is true, return only literals (no IRIs)
// with no datatype or datatype `xsd:string`; otherwise, return any literal,
// but strip datatypes other than `xsd:string`. For Double, Int, Bool, Date,
// or GeoPoint: Return the literal without the datatype. If
// `onlyReturnLiteralsWithXsdString` is true return `std::nullopt`. For
// Undefined Id: Always return `std::nullopt`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment can be shortened to be more concise and precise:

  1. You never return the datatypes, (always strip them) unless they are xsd:string, so if the literal has any other datatype (this includes IDs that directly store their value, like Doubles), the datatypes are always empty.
  2. If the onlyReturn... is set, then all IRIs and all literals that have a datatype that is not string (again including the encoded IDs) becomenullopt. Then also add, that these semantics are useful for the string expressions in StringExpressions.cpp`.

// thrown.
// If `onlyReturnLiteralsWithXsdString` is `true`, returns `std::nullopt`.
// If `onlyReturnLiteralsWithXsdString` is `false`, removes datatypes from
// literals (e.g., `42^^xsd:integer` becomes `"42"`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// literals (e.g., `42^^xsd:integer` becomes `"42"`).
// literals (e.g. the integer `42` is converted to the plain literal `"42"`).

static std::optional<LiteralOrIri> idToLiteralOrIriForEncodedValue(
Id id, bool onlyReturnLiteralsWithXsdString = false);

// Checks and processes a LiteralOrIri based on the given flags.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Checks and processes a LiteralOrIri based on the given flags.
// A helper function for the `<InsertNameHere>` function. ... remainder of comment.

ExportQueryExecutionTrees::idToLiteralOrIriForEncodedValue(
Id id, bool onlyReturnLiteralsWithXsdString) {
auto optionalStringAndType = idToStringAndTypeForEncodedValue(id);
if (!optionalStringAndType || onlyReturnLiteralsWithXsdString) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First check for onlyReturn before you call the somewhat expensive idTo... function.

Comment on lines +370 to +372
std::string_view(
reinterpret_cast<const char*>(word.getDatatype().data()),
word.getDatatype().size()) == XSD_STRING;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this exactly the asStringViewUnsafe(word.getDatatype()) == XSD_STRING function?

Comment on lines +158 to +163
// Value getter for `isBlank`.
struct IsBlankNodeValueGetter : Mixin<IsBlankNodeValueGetter> {
using Mixin<IsBlankNodeValueGetter>::operator();
Id operator()(ValueId id, const EvaluationContext*) const {
return Id::makeFromBool(id.getDatatype() == Datatype::BlankNodeIndex);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an artifact of you not having merge in the master branch for quite some time?

Comment on lines +146 to +156
struct LiteralOrIriValueGetter : Mixin<LiteralOrIriValueGetter> {
using Mixin<LiteralOrIriValueGetter>::operator();

std::optional<LiteralOrIri> operator()(ValueId,
const EvaluationContext*) const;

std::optional<LiteralOrIri> operator()(const LiteralOrIri& s,
const EvaluationContext*) const {
return s;
}
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need a second valueGetter which has the semantics of the onlyReturnXsdStrings.
(so one value getter for each of your configurations of the configurations of your functions in ExportQueryExecutionTrees.cpp.

std::optional<LiteralOrIri> LiteralOrIriValueGetter::operator()(
Id id, const EvaluationContext* context) const {
// true means that immediately returns nullopt for everything that is not a
// literal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is out of sync. it says true means but the only thing I see in the call is a false.

Comment on lines +216 to +224
if (s->isLiteral()) {
if (s->hasLanguageTag()) {
descriptor = std::string(asStringViewUnsafe(s->getLanguageTag()));
} else if (s->hasDatatype()) {
descriptor =
ad_utility::triple_component::Iri::fromIrirefWithoutBrackets(
asStringViewUnsafe(s->getDatatype()));
}
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more efficient ( you get your own copy of the LiteralOrIri after all), to not create a new LiteralOrIri,
But to implement the SubStr directly inside the Literal class, s.t. you can reuse the allocation etc, this should make things somewhat faster. So a method Literal::setSubtr(...) would be the way to go.

Comment on lines +248 to +249
using SubstrExpression = NARY<3, FV<SubstrImpl, LiteralOrIriValueGetter,
NumericValueGetter, NumericValueGetter>>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make the case SUBSTR(STR(?something)) more efficient, you can implement something like the StringExpressionImpl, which does exactly the same thing, but uses your two value getters (the one is still missing), depending on whether the child is a string or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants