Improve design to accommodate Osrank & Registry #10

adinapoli-mndc · 2019-10-31T14:01:53Z

PR #8 made certain types more polymorphic but it also reshuffled a bit some of the existing traits. In articular, we added two new methods to the Node and Edge traits, respectively fn node_type(&self) -> types::NodeType and fn edge_type(&self) -> types::EdgeType, which return the concrete types.

This is something that @cloudhead wasn't very fond of, but that was initially justified by the fact that for Osrank is very convenient to be able to use the concrete types. The same concrete types are also used in the EdgeRef struct:

pub struct EdgeRef<'a, NodeId, EdgeId> {
    pub from: &'a NodeId,
    pub to: &'a NodeId,
    pub id: &'a EdgeId,
    pub edge_type: &'a EdgeType,
}

Here I am going to explain in full detail why I have done this, to leave a testament behind of my thought process so that this design can be improved.

The main reason why I did come up with those extra methods/fields in the first place is because I was under the impression the Engineering team settled on a design where we were going to use the concrete types as the "meeting point" between the Registry & Osrank; in such scenario is extremely useful for Osrank & Registry to share the same types. Not only that, but let's take as an example a real piece of code from Osrank:

impl<W, R> DynamicWeights for Network<W, R>
where
    W: Clone + Mul<Output = W> + Div<Output = W> + From<Weight>,
    R: Clone + Zero,
{
    fn dynamic_weight(
        &self,
        edge: &impl Edge<Self::Weight, <Self::Node as GraphObject>::Id, Self::EdgeData>,
        hyperparams: &types::HyperParameters<Self::Weight>,
    ) -> Self::Weight {
        let e_type = edge.edge_type();

        // Let's start by assigning this edge the stock default value, by
        // reading it from the hyperparams.
        let mut weight: Self::Weight = (hyperparams.get_param(&e_type.to_tag())).clone();

        // others can't be zero as there is at least one edge, i.e. the
        // input one.
        let others = edges_of_same_type(self, edge, Direction::Outgoing, e_type);

        let source_node = self
            .get_node(edge.source())
            .expect("dynamic_weight: source node not found.");

        // Then we need to do something different based on the type of edge.
        match e_type.to_tag() {
            types::EdgeTypeTag::ProjectToUserContribution => {
                // contrib is multiplied by the number of contributions of
                // the account to the project, divided by the total number
                // of contributions in the project.
                let total_project_contrib = source_node.node_type().total_contributions();
                let user_contribs = edge.edge_type().total_contributions();

                weight = weight * Weight::new(user_contribs, total_project_contrib).into()
            }
            types::EdgeTypeTag::UserToProjectContribution => {
                // contrib* and maintain* are multiplied by the number of
                // contributions of the account to the project, divided by
                // the total number of contributions of the account.
                let total_account_contrib = source_node.node_type().total_contributions();
                let user_contribs = edge.edge_type().total_contributions();

                weight = weight * Weight::new(user_contribs, total_account_contrib).into()
            }
            types::EdgeTypeTag::UserToProjectMembership => {
                // The weight is divided by the corresponding count of
                // outgoing edges of the same type on the node.
                weight = weight / others.into()
            }
            types::EdgeTypeTag::ProjectToUserMembership => {
                // contrib* and maintain* are multiplied by the number of
                // contributions of the account to the project, divided by
                // the total number of contributions of the account.
                let total_account_contrib = source_node.node_type().total_contributions();
                let user_contribs = edge.edge_type().total_contributions();

                weight = weight * Weight::new(user_contribs, total_account_contrib).into()
            }
            types::EdgeTypeTag::Dependency => {
                // The weight is divided by the corresponding count of
                // outgoing edges of the same type on the node.
                weight = weight / others.into()
            }
        }

        weight
    }
}

Here I was able to write this trait implementation in a fairly polymorphic way by the virtue of the
fact I could rely on calling .edge_type() and .node_type() and be sure they would return what I was expecting. In particular, I was able to write impl Edge<Self::Weight, <Self::Node as GraphObject>::Id, Self::EdgeData> and pass any type which implements that trait. If we were going to remove those node_type/edge_type methods (by the virtue of the fact the concrete types will probably be inside the NodeData/EdgeData, we would have to write something like this, at the very minimum:

impl<W, R> DynamicWeights for Network<W, R>
where
    W: Clone + Mul<Output = W> + Div<Output = W> + From<Weight>,
    R: Clone + Zero,
    Self::EdgeData: Into<types::EdgeType>,
    Self::NodeData: Into<types::NodeType>,
{
...

Then add proper std::convert::Into instances and finally in the code call:

let edge_type: types::EdgeType = edge.data().into();

Which is still do-able, albeit unfortunate.

As regards the EdgeRef type, note that we cannot write the following:

pub struct EdgeRef<'a, NodeId, EdgeId> {
    pub from: &'a NodeId,
    pub to: &'a NodeId,
    pub id: &'a EdgeId,
    pub edge_type: &'a EdgeData,
}

This is because an EdgeData exist only in the context of a Graph. We have two options here:

We make the EdgeRef polymorphic over the graph:

pub struct EdgeRef<'a, G> 
where
  G: Graph
{
    pub from: &'a Id<G::Node>,
    pub to: &'a Id<G::Node>,
    pub id: &'a Id<G::Edge>,
    pub edge_type: &'a G::EdgeData,
}

We make EdgeData (or even EdgeType at this point?) an extra type parameter:

pub struct EdgeRef<'a, NodeId, EdgeId, EdgeType> {
    pub from: &'a NodeId,
    pub to: &'a NodeId,
    pub id: &'a EdgeId,
    pub edge_type: &'a EdgeType, // This is not the concrete one but a free variable
}

I don't have an intuition on which method is better, but it looks like option 2. feels unnatural, as it makes sense to talk about a specific EdgeRef over a Graph G. Last but not least, the reason why I have added this edge_type to EdgeRef in the first place is because it's very handy to have it "pre-computed" in a situation like this:

                    for eref in network.edges_directed(&current_node_id, Direction::Outgoing) {
                        possible_edge_types.insert(eref.edge_type);
                    }

If we didn't have this, we would have to fetch the info from the graph, which is very inefficient:

                    for eref in network.edges_directed(&current_node_id, Direction::Outgoing) {
                        let edge_type = network.get_edge(eref.id()).unwrap().data().into();
                        possible_edge_types.insert(edge_type);
                    }

And that obviously also requires the Into<types::EdgeType> constraint.

I hope this is useful as a testament for @MeBrei and the rest of the crew :)

The text was updated successfully, but these errors were encountered:

adinapoli-mndc · 2019-10-31T14:25:40Z

Something else to consider: At the moment, we the following types:

pub enum NodeType {
    /// A user, eg. contributor, project member etc.
    User { contributions_to_all_projects: u32 },
    /// A project with users as members and contributors.
    Project { contributions_from_all_users: u32 },
}

The idea here is to store at the graphnode-level the number of contributions, to later use this information in the dynamic weight calculation in osrank. However, having this static information is not-so-great for when we actively prune the graph, namely during the TrustRank phase.

In the "real world" is conceivable we will prune the graph (via the TrustRank phase based on the seed set) and we would yield some sort of GraphView which holds into the references of just the nodes we are retaining. At this point we have a problem though, because the TrustRank phase might eliminate neighbours nodes of one of more nodes contained in the GraphView, which means the total number of contributions has now changed, which means it would be wrong to use the static information, and we cannot mutate the graph from the algorithm because we are passed a &'a G.

Now, truth to be told we could always clone the graph (as an interim hack) and mutate it inside the algorithm, but this is not very scalable. Rather, we should do what @MeBrei suggested in another comment: somehow (read, in an efficient way) we should compute this "total number of contributions for project X" and "all contributions of user U to all projects" on the fly, as in both cases this is just a matter of grabbing the outgoing/incoming edges of the right EdgeType and perform the sum over the contribution value of each Edge.

In such scenario it's perhaps convenient to have the concrete EdgeType and NodeType in scope (to perform such filtering on the graph-api side instead of leaving the burden of implementing this on the osrank side), which might tip the scale towards one or the other proposed-above design.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve design to accommodate Osrank & Registry #10

Improve design to accommodate Osrank & Registry #10

adinapoli-mndc commented Oct 31, 2019 •

edited

Loading

adinapoli-mndc commented Oct 31, 2019

Improve design to accommodate Osrank & Registry #10

Improve design to accommodate Osrank & Registry #10

Comments

adinapoli-mndc commented Oct 31, 2019 • edited Loading

adinapoli-mndc commented Oct 31, 2019

adinapoli-mndc commented Oct 31, 2019 •

edited

Loading