I have a many-to-many relationship between releases and artifacts, where a given release is associated with multiple artifacts, and a given artifact is associated with multiple releases.
I understand how to model this: I have a releases table with an ID column:
CREATE TABLE releases (
release_uuid uuid PRIMARY KEY
);
and an artifacts table with an ID column:
CREATE TABLE artifacts (
artifact_uuid uuid PRIMARY KEY,
hash bytea
-- other data
);
and a joining table release_artifacts that has foreign key columns from each of the others:
CREATE TABLE release_artifacts (
id serial PRIMARY KEY,
release_uuid uuid REFERENCES releases(release_uuid) NOT NULL,
artifact_uuid uuid REFERENCES artifacts(artifact_uuid) NOT NULL,
UNIQUE (release_uuid, artifact_uuid)
);
What I want to do is find a release "containing" a given set of artifacts, so that I can warn about duplicate releases. That is, for artifacts A1, A2, and A3, what release(s) Rx is defined by exactly those three artifacts? More visually, given the release_artifacts table:
release ID | artifact ID
-----------+------------
R1 | A1
R1 | A2
R1 | A3
R2 | A4
R2 | A2
R2 | A3
what search can I perform with A1, A2, A3 as the input that would give me back R1? A search on A2, A3 would return NULL. Or do I need a different model? I assume this would be easier if the release_artifacts table mapped a release to an array of artifact IDs, but then I lose the referential integrity with the artifact table.
I don't need maximum performance or maximal concurrency protection, but I'd be happy if those things don't significantly increase the complexity of the query. This is in a Postgres 9.6 database, though I'd consider that a version floor.