-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the relationship between Sample and Investigation many-to-many #231
Comments
I agree with your proposed changes except in the area of constraints. I don't think a Sample needs to have any constraints. Regarding a new constraint on Facility, I agree that most other ICAT entities have this but this is (mostly!) because that makes sense. With an archaeological sample, for example, wouldn't it make sense that the sample might be taken to a number of facilities for analysis and is therefore not tied to any of them? I can see that this is unlikely to ever happen in a single ICAT, but nonetheless the data model should be right and we should allow for that possibility. That just leaves name as a constraint and there are likely to be duplicates of the names that people give to their samples (which is fine) so that constraint is not required either. |
Interesting point! In fact, I suggested this constraint not because it would be particularly sensible, but in the lack of any better solution. With the relation to investigation changed to be many-to-many, we can't have investigation any more in the constraint, so I removed it from the existing constraint and that's it. In general, it is beneficial to have a constraint in all entity types, because it allows to reliably refer to an object in ICAT based on attribute values. That makes many things easier, including serializing and deserializing ICAT content. Regarding the facility, that is a strange thing anyway. The schema imposes to have one facility object in each ICAT, but it does not serve any particular purpose. I assume there is no production ICAT having more then one facility object. Most entity classes directly or indirectly depend on the facility and those that relate directly to it also have it in the constraint. So I also kept the facility here, just for the sake of consistency with the remaining schema. But I do appreciate your argument. And I admit that I'm not very happy with that constraint either. |
Maybe an alternative solution would be to take the opportunity to make a clear cut: set the constraint to be just The only drawback I can see is that we would need to change the |
I agree with @kevinphippsstfc. I was also thinking along the same lines as @RKrahl to use the |
I agree that if you are using PIDs or want to start using PIDs for Samples then this is a sensible approach. I can also see that getting an upgrade script to assign the ID attribute to the PID will work. However, what happens for creating Samples once the upgrade has been done? Every piece of software creating samples is forced to implement something to create this (unique) field. This can of course be done and could be as simple as just calling a library method to generate a UUID, but it is not as simple as doing nothing, where your software continues to work because PID is an optional field that you don't bother setting. Or am I missing something obvious? |
No I don't think you are missing something. That is indeed a drawback of using this constraint. The benefit is that we have a way to reliably refer to sample objects in the ICAT later on. |
… in other words: either way you will break some piece of software. Either you break the software creating samples that is relying upon not having to bother setting the |
We may have the following case:
Then how do you distinguish between the two? |
It would make sense in any case to prefix a PID with a scheme, e.g. to set values like pid_scheme, pid_value = sample.pid.split(':', maxsplit=1) If you do this, you only need to define a dedicated scheme prefix for local, 'artificial' PIDs, such as |
In this case, does this make sense to have a field called PIDScheme or similar for example like IdentifierScheme used in DataCite (for Person PID, organization PID). This will be may be clearer that the we need to specify which scheme we are using. |
I don't think so. It would make things more complicated and I don't see an advantage over my suggestion to use a prefix. |
Well, from my experience free text field is very problematic in production systems. Having a way to specify how this field should be used would help. For example, the PID may not be spelled correctly and it will become very difficult to find the issue. |
I think this schema change will need more internal discussion within DLS. |
Update: the current proposal as implemented in #294 is: Add the following class:
|
Card | Class | Field |
---|---|---|
1,1 | Investigation | investigation |
1,1 | Sample | sample |
Change Sample
to:
Sample
A sample to be used in one or more investigations
Constraint: pid
Relationships:
Card | Class | Field |
---|---|---|
0,* | Dataset | datasets |
0,1 | SampleType | type |
0,* | SampleParameter | parameters |
0,* | InvestigationSample | investigationSamples |
Other fields:
Field | Type | Description |
---|---|---|
name | String[255] NOT NULL | |
pid | String[255] NOT NULL | A persistent identifier attributed to this sample |
Obviously: change the one-to-many relationship in Investigation
from Sample
to InvestigationSample
.
As discussed in the ICAT Schema Discussion on April 2nd and in the collaboration meeting today, this change should be in a version 7.0 release that we aim to make in the second half of this year. |
We have the
Sample
class in our schema to represent "a sample to be used in an investigation". It has many-to-one relationship withInvestigation
, e.g. each sample must be related to one and only one investigation.While this is certainly suitable for most situations, it may be problematic in some cases: if one individual sample has been the subject of more then one investigation, there is currently no way to properly represent this in our schema. I'd like to remind that we occasionally also have rather prominent samples such as archeological artifacts or pieces of art and it is not so improbable that the same item might be investigated more then once by independent groups using different techniques trying to answer unrelated questions.
I therefore suggest to add the following class to the schema:
InvestigationSample
Represents a many-to-many relationship between an investigation and a sample that has been used in that investigation
Constraint:
investigation
,sample
Relationships:
As a result, the
samples
relationship inInvestigation
would be changed to have the typeInvestigationSample
rather thenSample
. TheSample
class would need to be modified as follows:Sample
A sample to be used in one or more investigations
Constraint:
facility
,name
Relationships:
Other fields:
E.g. the many-to-one relation with
Investigation
would be replaced by a one-to-many relation withInvestigationSample
and I suggest to add a relationship withFacility
just for the sake of consistency with the current schema where most things are related directly or indirectly to the facility on top. The uniqueness constraint would need to be adapted as well. The attributes ofSample
would remain unchanged.The text was updated successfully, but these errors were encountered: