Release v9 · kernelci/kcidb

Another major release. Most-visible changes are listed below.

After this release we'll be improving our CI/CD to shorten our development cycle, so we can make smaller and more frequent releases.

Schema

Switch to using v4 schema, released with kcidb-io v3. Changes from v3 schema include:
- Rename revisions to checkouts to better represent what is actually submitted, improve correlation, and prevent data loss. The checkouts are identified purely by origin-generated IDs, similarly to builds and tests. The commit hash only appears in git_commit_hash field now, and the patchset hash gets its own field.
  
  NOTE: the submitting CI systems that test and send revisions more than once are urged to upgrade to v4 schema to avoid revision ID-inherited checkouts overwriting each other.
- Add patchset_hash field to checkouts to store the patchset hash, which was previously a part of revision ID.
  
  NOTE: you need to set patchset_hash to empty string, if you have no patches applied on top of the commit you checked out, otherwise your data might not appear in reports and dashboards.
- Rename the checkout's patch_mboxes field to patchset_files to better correspond to the new patchset_hash field.
- Rename all description fields to comment. The description name had the meaning of describing each object overall. However we have other, dedicated fields describing objects in detail, and we'd rather use those to generate our own description, consistently, regardless of the submitter, and use the comment field to augment that description.
- Add log_url field to tests. It is meant to contain the URL pointing to a plain-text log file with the highest-level overview of the test's execution, similar to the log_url field in builds and checkouts. All the other log and output files should go into output_files.
- Add log_excerpt field to all objects, meant to contain the part of the object's log (normally referenced by log_url), that was most relevant to its status. E.g. patch errors for a failed checkout, compiler errors for a failed build, error messages for a failed test. It could also be git am output for a successful checkout, the last hundred lines of a successful build, or a test suite summary for a successful test.
- Remove the publishing_time field from checkouts, as nobody is sending them, it's not really possible to know a commit's publishing time in git, and there are no maillist-posted patches being submitted yet, for which that could be possible.
Support validating I/O JSON against a specific schema version with kcidb-validate. Thank you, @pawiecz!
Support outputting a specific version of the schema with kcidb-schema. Thank you, @effulgentstar!
Support specifying the version of the schema to upgrade I/O data to, with kcidb-upgrade.

Database

Separate the database client and database drivers. This allows implementing support for more databases, and pseudo-databases.

Switch the library to accepting a single string specifying the driver and its parameters for opening a database, instead of BigQuery-specific project ID and dataset name. Switch all the database-accessing command-line tools to accepting just one option: -d/--database, specifying the driver and its parameters, instead of the two BigQuery-specific options: -p/--project and -d/--dataset.

E.g. instead of running: kcidb-query -p kernelci-production -d kernelci05 -c redhat:122398712, run: kcidb-query -d bigquery:kernelci-production.kernelci05 -c redhat:122398712.

Use the --database-help option with any database-accessing tool to print documentation on all drivers and their parameters (thank you, @amfelso).
Add null driver, which just discards loaded data, and returns no data for queries, which is useful for testing and development.
Add SQLite database driver (sqlite), supporting all the operations we use on BigQuery. This simplifies development and testing of subscriptions and notifications by removing the need for BigQuery access.
Add json database driver - an extension of the SQLite driver, always storing the database in-memory, and pre-loading it with JSON I/O data from stdin. This lets us implement command-line tools simulating notification generation directly from the JSON generated by a CI system, without the need to create or access a database explicitly.
Add object de-duplication when either loading into, or querying from the database. As previously, if there are two objects with the same ID being loaded into, or queried from the database, and a field's value is present in both of them (is not NULL in both of them), then the used value will be picked out of those two non-deterministically.
Replace BigQuery tables with views returning de-duplicated objects. Prefix the original table names with _. This makes querying the BigQuery database easier in code, manually, and in our Grafana dashboards.
Remove support for querying database objects using LIKE patterns matching their IDs, from both the library and the command-line tools, since nothing and nobody was using that, and since that simplifies the code.
Remove the kcidb-db-complement tool, since the "complement" operation is no longer required by the new ORM. Thank you, @mharyam!

ORM

Implement a new ORM layer to support representing results of any query as Python objects (e.g. revisions aggregated from checkouts), and summarizing results (e.g. giving a build/test PASS/FAIL for a revision). Use a custom "pattern" syntax inside the ORM and with command-line tools, to specify the objects to query or notify about.

E.g. >checkout[redhat:12398712]#>*# pattern matches the checkout with ID redhat:12398712 and all its children objects (builds and tests), and e.g. >test[kernelci:8768ad33f]<*$ matches the ultimate parent (revision) of a test with ID kernelci:8768ad33f.

Use the --pattern-help option with any ORM-using tool (e.g. kcidb-notify) to print the pattern's ABNF syntax and some examples.
Add kcidb-oo-query tool, which outputs the internal object-oriented representation of database objects matching the specified ORM "pattern", and is useful for debugging and developing the ORM layer.

Notifications

Rework our notifications to aggregate results coming from multiple CI systems for the same revision, and to summarize build and test results into a compact message. Support subscription-specific notification templates, allowing sharing and reusing of various pieces and macros with others.
Add a minimal HTML version to notification messages, to force some clients (e.g. GMail and groups.io) to use fixed-width fonts, for correct formatting. Thank you, @effulgentstar!
Remove the kcidb-summarize and kcidb-describe tools, since the notion of "canonical" text rendering of database objects has been removed from the new ORM.
Add kcidb-ingest tool, which generates notifications for objects created or modified by loading the input data into a (temporary) database. This emulates the notification-generation process deployed to Google Cloud without requiring it, and helps with developing and testing subscriptions and notifications.

Miscellaneous

Fold the kcidb-mq-publisher-* and kcidb-mq-subscriber-* tools into kcidb-mq-io-publisher and kcidb-mq-io-subscriber respectively. This reduces the number of KCIDB executables.
Add kcidb-mq-pattern-publisher and kcidb-mq-pattern-subscriber tools for managing ORM Pattern message queues used in our Google Cloud deployment.
Automate Google Cloud deployment and start doing test deployments in CI.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v9

Schema

Database

ORM

Notifications

Miscellaneous

Contributors