Another major release. Most-visible changes are listed below.
After this release we'll be improving our CI/CD to shorten our development cycle, so we can make smaller and more frequent releases.
Schema
- Switch to using v4 schema, released with kcidb-io v3. Changes from v3 schema include:
-
Rename
revisions
tocheckouts
to better represent what is actually submitted, improve correlation, and prevent data loss. The checkouts are identified purely by origin-generated IDs, similarly to builds and tests. The commit hash only appears ingit_commit_hash
field now, and the patchset hash gets its own field.NOTE: the submitting CI systems that test and send revisions more than once are urged to upgrade to v4 schema to avoid revision ID-inherited checkouts overwriting each other.
-
Add
patchset_hash
field to checkouts to store the patchset hash, which was previously a part of revision ID.NOTE: you need to set
patchset_hash
to empty string, if you have no patches applied on top of the commit you checked out, otherwise your data might not appear in reports and dashboards. -
Rename the checkout's
patch_mboxes
field topatchset_files
to better correspond to the newpatchset_hash
field. -
Rename all
description
fields tocomment
. Thedescription
name had the meaning of describing each object overall. However we have other, dedicated fields describing objects in detail, and we'd rather use those to generate our own description, consistently, regardless of the submitter, and use thecomment
field to augment that description. -
Add
log_url
field to tests. It is meant to contain the URL pointing to a plain-text log file with the highest-level overview of the test's execution, similar to thelog_url
field in builds and checkouts. All the other log and output files should go intooutput_files
. -
Add
log_excerpt
field to all objects, meant to contain the part of the object's log (normally referenced bylog_url
), that was most relevant to its status. E.g. patch errors for a failed checkout, compiler errors for a failed build, error messages for a failed test. It could also begit am
output for a successful checkout, the last hundred lines of a successful build, or a test suite summary for a successful test. -
Remove the
publishing_time
field from checkouts, as nobody is sending them, it's not really possible to know a commit's publishing time in git, and there are no maillist-posted patches being submitted yet, for which that could be possible.
-
- Support validating I/O JSON against a specific schema version with
kcidb-validate
. Thank you, @pawiecz! - Support outputting a specific version of the schema with
kcidb-schema
. Thank you, @effulgentstar! - Support specifying the version of the schema to upgrade I/O data to, with
kcidb-upgrade
.
Database
-
Separate the database client and database drivers. This allows implementing support for more databases, and pseudo-databases.
Switch the library to accepting a single string specifying the driver and its parameters for opening a database, instead of BigQuery-specific project ID and dataset name. Switch all the database-accessing command-line tools to accepting just one option:
-d
/--database
, specifying the driver and its parameters, instead of the two BigQuery-specific options:-p
/--project
and-d
/--dataset
.E.g. instead of running:
kcidb-query -p kernelci-production -d kernelci05 -c redhat:122398712
, run:kcidb-query -d bigquery:kernelci-production.kernelci05 -c redhat:122398712
.Use the
--database-help
option with any database-accessing tool to print documentation on all drivers and their parameters (thank you, @amfelso). -
Add
null
driver, which just discards loaded data, and returns no data for queries, which is useful for testing and development. -
Add SQLite database driver (
sqlite
), supporting all the operations we use on BigQuery. This simplifies development and testing of subscriptions and notifications by removing the need for BigQuery access. -
Add
json
database driver - an extension of the SQLite driver, always storing the database in-memory, and pre-loading it with JSON I/O data from stdin. This lets us implement command-line tools simulating notification generation directly from the JSON generated by a CI system, without the need to create or access a database explicitly. -
Add object de-duplication when either loading into, or querying from the database. As previously, if there are two objects with the same ID being loaded into, or queried from the database, and a field's value is present in both of them (is not NULL in both of them), then the used value will be picked out of those two non-deterministically.
-
Replace BigQuery tables with views returning de-duplicated objects. Prefix the original table names with
_
. This makes querying the BigQuery database easier in code, manually, and in our Grafana dashboards. -
Remove support for querying database objects using LIKE patterns matching their IDs, from both the library and the command-line tools, since nothing and nobody was using that, and since that simplifies the code.
-
Remove the
kcidb-db-complement
tool, since the "complement" operation is no longer required by the new ORM. Thank you, @mharyam!
ORM
-
Implement a new ORM layer to support representing results of any query as Python objects (e.g. revisions aggregated from checkouts), and summarizing results (e.g. giving a build/test PASS/FAIL for a revision). Use a custom "pattern" syntax inside the ORM and with command-line tools, to specify the objects to query or notify about.
E.g.
>checkout[redhat:12398712]#>*#
pattern matches the checkout with IDredhat:12398712
and all its children objects (builds and tests), and e.g.>test[kernelci:8768ad33f]<*$
matches the ultimate parent (revision) of a test with IDkernelci:8768ad33f
.Use the
--pattern-help
option with any ORM-using tool (e.g.kcidb-notify
) to print the pattern's ABNF syntax and some examples. -
Add
kcidb-oo-query
tool, which outputs the internal object-oriented representation of database objects matching the specified ORM "pattern", and is useful for debugging and developing the ORM layer.
Notifications
- Rework our notifications to aggregate results coming from multiple CI systems for the same revision, and to summarize build and test results into a compact message. Support subscription-specific notification templates, allowing sharing and reusing of various pieces and macros with others.
- Add a minimal HTML version to notification messages, to force some clients (e.g. GMail and groups.io) to use fixed-width fonts, for correct formatting. Thank you, @effulgentstar!
- Remove the
kcidb-summarize
andkcidb-describe
tools, since the notion of "canonical" text rendering of database objects has been removed from the new ORM. - Add
kcidb-ingest
tool, which generates notifications for objects created or modified by loading the input data into a (temporary) database. This emulates the notification-generation process deployed to Google Cloud without requiring it, and helps with developing and testing subscriptions and notifications.
Miscellaneous
- Fold the
kcidb-mq-publisher-*
andkcidb-mq-subscriber-*
tools intokcidb-mq-io-publisher
andkcidb-mq-io-subscriber
respectively. This reduces the number of KCIDB executables. - Add
kcidb-mq-pattern-publisher
andkcidb-mq-pattern-subscriber
tools for managing ORM Pattern message queues used in our Google Cloud deployment. - Automate Google Cloud deployment and start doing test deployments in CI.