Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support dagitty objects #183

Closed
wants to merge 5 commits into from

Conversation

grasshoppermouse
Copy link

resolve #179

I'm an anthropologist, not a software engineer, so I hope I'm doing this right

@thomasp85
Copy link
Owner

Does dagitty objects not contain any node and edge attributes? (asking in earnest as I'm not familiar with the package)

@grasshoppermouse
Copy link
Author

I'm just learning about the package myself. dagitty is used to specify causal relationships among variables to aid designs of scientific studies. It is an R wrapper around a javascript library. Graphs are specified using the Graphviz dot language, where node and edge attributes are set with square brackets. Ideally, some R function already exists to parse a dot string and convert it to igraph, but I couldn't find one.

As far as I can tell, dagitty nodes can optionally have one of these 3 attributes: "exposure", "outcome", "latent". There are getter functions for those. Nodes can also optionally have x, and y coordinates for layout, and there is a function to get those.

Edges have 3 types: "->", "<->", "--" (directed, bidirectional, undirected), and there is a function to get those. But in tidygraph/igraph, don't all edges have to be either directed or undirected?

Edges can optionally have a "beta" attribute that sets the strength the causal relationship. As far as I can tell, there is no function to get those values. However, there are internal functions dagitty:::.vertexAttributes(g, a) and dagitty:::.edgeAttributes(g, a), where "g" is a dagitty object and "a" is any user-specified attribute.

So, I could add the node attributes exposure/outcome/latent, if they are present, I could add the node x & y coordinates, if present, I could add the edge direction attributes, and I could add node_attribute and edge_attribute arguments to get user-specified attributes, with defaults node_attribute=NULL and edge_attribute="beta".

How does that sound?

@szhorvat
Copy link

This is up to dagitty of course, but I wanted to say that DOT is a rather poor choice for data exchange. It is a format that is specific to a single software, Graphviz, and has features that make sense only for that one software. I once looked into how easy it would be to write a robust parser for this format for igraph. It would not be easy, and most likely it will never happen. The solution for these sorts of problems is to use a format which was designed for data exchange, not as data description language of one specific software. GraphML is a good choice for this. So are various formats built on JSON.

@thomasp85
Copy link
Owner

Yeah, we should get as much info into the tbl_graph as possible. It is true that tidygraph/igraph doesn't support bi-directional edges. And edges can't be undirected in a directed graph. Maybe it is best to add a "type" attribute to edges that encode if it is undirected, directed, or bidirectional since we can't really capture that information otherwise.

It seems weird that there are attributes in the structure that the user cannot get to, but if that is so we shouldn't extract them as we don't want to take a potentially breaking dependency on a package.

As for their choice of DOT, not much we can do about that. The whole point of tidygraph is to some extend to save people from relying on packages with questionable structure when possible :-)

@szhorvat
Copy link

szhorvat commented Nov 17, 2023

That comment was also partly an explanation of why igraph is unlikely to ever support reading DOT files. As for writing DOT files, that's already supported, and I consider that essential. We want to make it easy to plot igraph graphs with Graphviz.

…s to get an optional user specified attribute for nodes and edges (with a default of "beta" for edges)
@grasshoppermouse
Copy link
Author

I pushed an update to add node attributes that are accessible with dagitty functions, as well as one user-defined attribute for nodes and one for edges, with the latter set to default to "beta", which is an important, commonly used edge attribute.

@thomasp85
Copy link
Owner

Is there only ever one user-defined attribute for nodes and edges?

@grasshoppermouse
Copy link
Author

In principle, I believe there can be arbitrary numbers of user defined node and edge attributes. In practice, I'm not sure. The attributes are meant to specify aspects relevant to the theory of causal diagrams, and almost all have dedicated functions in dagitty (I think beta might be the only exception).

R/dagitty.R Outdated
Comment on lines 26 to 29
for (a in node_attr){
if (vctrs::vec_as_names(a, repair = 'unique') != a) stop('each node_attr must be a string of length > 0')
nodes[a] <- dagitty:::.vertexAttributes(x, a)$a
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no way to extract these without using a non-exported function we should simply ignore any additional attributes all-together. Same with edge attributes

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except that the "beta" edge attributes are important. Perhaps @jtextor, maintainer of the dagitty package, can suggest a solution, e.g., modifying the edges function to also extract the beta values.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that would be necessary on daggity's side. Using internal functions from other packages is not allowed, nor wise

R/dagitty.R Outdated

edges <- dagitty::edges(x)
if (is_empty(edges)){
edges <- tibble::tibble(from = int(), to = int())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
edges <- tibble::tibble(from = int(), to = int())
edges <- tibble::tibble(from = integer(), to = integer())

@grasshoppermouse
Copy link
Author

I found this fairly extensive package, which adds a basic as_tbl_graph method for dagitty objects, but doesn't add node or edge attributes to the tbl_graph object. It also converts dagitty objects to its tidy_dag objects, which preserve some attributes:

https://github.com/r-causal/ggdag

@thomasp85
Copy link
Owner

So, would you think there is still need for a method in tidygraph proper?

@jtextor
Copy link

jtextor commented Nov 30, 2023

Hi all, interesting discussion here. I'm certainly willing to update the package to export the edge attributes; I was planning to do this anyway.

Regarding my "poor choice" to use the dot syntax, I want to give a few arguments why I did this:

  • For causal inference applications it's very important to have flexibility with respect to edge types. There are at least 8 different types of edges supported by dagitty and more may be needed in the future.
  • The syntax can be extremely concise and readable. For instance "a->{b->{c->{d->{e->f}}}}" describes the complete graph with 6 nodes.

So for input by a human, I still feel it's really useful. I agree that it's less suitable as an exchange format and is complicated to parse (I had to write my own parser)

However, dagitty can already export to various other formats, and it would be quite simple to add a GraphML exporter. There is a function "toString" in the dagitty package that implements various output formats already. I could just add GraphML as a further option. The only obstacle is that GraphML only supports directed and undirected edges, as far as I can tell ... so I would have to add some custom nonstandard attributes.

Does your package support edges other than undirected and directed?

@grasshoppermouse
Copy link
Author

So, would you think there is still need for a method in tidygraph proper?

I'm curious to get @jtextor 's opinion, since he knows the dagitty ecosystem best: where should an as_tbl_graph.dagitty method live? In dagitty?, ggdag (which already has a barebones one)? or tidygraph?

@jtextor
Copy link

jtextor commented Nov 30, 2023

I have no strong opinion on this. But having now read the entire discussion I think it's the easiest solution to just export to GraphML, which then can be read by this package if developers are open to it. But then we'd have to agree on a way to encode the different type of edges, which is not part of the GraphML standard.

@grasshoppermouse
Copy link
Author

@jtextor, would you be willing to add an as.igraph.dagitty method to dagitty? I see you already suggest igraph, which has a huge number of useful functions, and tidygraph is based on igraph.

@thomasp85
Copy link
Owner

If daggity were to get an as.igraph() method it would work out of the box in tidygraph

@jtextor
Copy link

jtextor commented Dec 1, 2023

To some extent, this is already available. The R package "causaleffect" is based on igraph as well, and there is a function that converts to an igraph object that can be understood by causaleffect. E.g.,

dagitty::convert(dagitty::getExample("Shrier"),"causaleffect")

However the way in which <-> edges are represented in causaleffect is not straightforward. These are converted to two directed edges, both of which are given a special attribute.

I looked around the igraph manual and now they seem to have a somewhat-standard way to represent mixed graphs with directed, undirected and bi-directed edges. This covers most of what the causal inference community needs. So I'll add a method "as.igraph" that will follow that convention.

@thomasp85
Copy link
Owner

@jtextor thank you! As this solves support for as_tbl_graph() completely I'll be closing this PR. Still, thank you @grasshoppermouse for pushing this forward 🙏

@thomasp85 thomasp85 closed this Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: add support for class "dagitty" to as_tbl_graph
4 participants