Optimal adjustment sets
In this article, we show how to implement an algorithm for constructing the optimal adjustment set using CIfly. Covariate adjustment is a fundamental technique for identifying causal effects. Applying covariate adjustment to CPDAGs (a class of causal graphs that can be learned from data), allows us to learn causal effects from observational data by combining causal discovery and causal effect estimation. As there may be many different adjustment sets in a causal graph, it is an important question which sets are preferable over others. Henckel et al. (2022) provide a graphical criterion for an adjustment set guaranteed to be valid if there is a valid adjustment set for the target causal effect. Furthermore, the corresponding estimator is guaranteed to have the smallest asymptotic variance among all estimators corresponding to valid adjustment sets.
The main result in this regard is stated in the following theorem. Note that we focus here purely on the graphical aspects of this problem, for more statistical and causal background, we refer the reader to the original paper.
Theorem [Henckel et al. (2022)]
Consider disjoint node sets and in a CPDAG such that and the node set Then the following three statements hold:
- If there exists a valid adjustment set relative to in than so is .
- The causal effect estimator corresponding to has the smallest asymptotic variance among all estimators corresponding to valid adjustment sets for all causal models with CPDAG .
- Any other valid adjustment set with Property 2 is a superset of .
The criterion is formulated for CPDAGs, which may contain both directed and undirected edges. They do not contain directed cycles and satisfy additional properties, which we, however, do not discuss in this article. The graphical terminology in the criterion is introduced below.
Graphical terminology
Let us first revisit the notion of possible descendants, causal and forbidden sets. A path is possibly directed from to if every edge on it is either undirected or points towards the node in . For example, is, for and , a possibly directed path from to . We call a node a possible descendant of if there exists a possibly directed path from to . The set is the set of all possible descendants. In the criterion above we require that . This may seem like a cumbersome requirement but if a node is not a possible descendant of then there is no causal effect of on , which is why we may disregard such nodes.
The causal set relative to in , denoted , consists of all nodes lying on possibly directed paths from to excluding itself. The forbidden set relative to in , denoted , is the set of all possible descendants of the causal set.
Finally, we introduce one further notion which we will use to verify whether valid adjustment set relative to in exists. A proper path between and is one which contains exactly one node from (the starting point of the path). A valid adjustment set relative to in does not exist if there exists a proper possibly directed path starting with an undirected edge from a node to . We call nodes to which there exists such a path from non-amenable. In our documentation articles for ciflypy and ciflyr, we already used the rule table below for finding all non-amenable nodes as a running example.
Checking whether valid adjustment sets exist
Since the optimal adjustment set is only valid if a valid adjustment set exists we first check whether this is the case. To do so we rely on Theorem 5 and Corollary 27 of Perković et al. (2018) which states that a valid adjustment set exists if no node in is non-amenable and no node in is forbidden. For finding all non-amenable nodes we use the rule table below that already served as a running example in our documentation. It finds all nodes for which there exists a proper possibly directed path starting with an undirected edge from a node to . Effectively, these are the nodes that cannot be in set .
EDGES --> <--, ---
SETS X
COLORS init, yield
START ... [init] AT X
OUTPUT ... [yield]
... [init] | --- [yield] | next not in X
... [yield] | ---, --> [yield] | next not in X
To check amenability, we use this ruletable for set and check whether the intersection of returned nodes with is empty.
For verifying the other condition of the adjustment criterion, we need to compute the forbidden set which we do in multiple steps. First, we find all on a proper possibly directed path from to . Afterwards, we find the possible descendants of these . The first step can be done by finding all nodes on a proper possibly directed path from that does not contain a node in and, vice versa, all nodes from which there exists a proper possibly directed path to not containing a node in . By definition, the intersection of these two node sets are precisely the nodes which lie on a proper possibly directed path from to , that is, the causal set.
Hence, we need a table for finding nodes on proper possibly directed paths from with the option to stop before nodes of a passed set are reached. Clearly, this table can also be used to find possible descendants.
EDGES --> <--, ---
SETS X, W
START --> AT X
OUTPUT ...
... | -->, --- | next not in W
We also need to find the nodes from which there exists a proper possibly directed path to . For this, we can simply start the reachability algorithm at and traverse these paths the opposite way.
EDGES --> <--, ---
SETS X, W
START <-- AT X
OUTPUT ...
... | <--, --- | next not in W
Using these tables, we can compute the forbidden set (and the causal set) as follows.
anc = cf.reach(cpdag, {"X": Y, "W": X}, "./possible_ancestors_cpdag.txt")
des = cf.reach(cpdag, {"X": X, "W": []}, "./possible_descendants_cpdag.txt")
cn = set(anc).intersection(des)
forb = cf.reach(cpdag, {"X": cn}, "./possible_descendants_cpdag.txt")
anc <- reach(cpdag, list("X" = Y, "W" = X), "./possible_ancestors_cpdag.txt")
des <- reach(cpdag, list("X" = X, "W" = c()), "./possible_descendants_cpdag.txt")
cn <- intersect(anc, des)
forb <- reach(cpdag, list("X" = cn), "./possible_descendants_cpdag.txt")
To check that does not contain forbidden nodes, we use this code to compute the forbidden set and then check whether the intersection with is empty.
Constructing the optimal adjustment set
To construct the optimal set itself we need precisely the causal set and the forbidden set. We have already shown how to obtain those above. The only step that remains is computing the parents of the causal set, i.e., all nodes such that there exists an edge to some causal node . This is an easy task but it is nonetheless possible to inadvertantly implement a super-linear algorithm for it. In CIfly, however, it is rather easy to implement a linear-time algorithm by using the following rule table.
EDGES --> <--, ---
SETS X
COLORS init, yield
START <-- [init] AT X
OUTPUT <-- [yield]
<-- [init] | <-- [yield] | next not in X
Full implementation
In the final implementation, we use the four tables introduced above to check whether consists of possible descendants of , to verify whether a valid adjustment set exists and finally to construct the optimal set simultanously. In your code, assign the ruletables
variable according to the location of the rule tables in your project (in Python, the code uses a Path
object from the pathlib
library; in R, ruletables
is a string holding the path to the folder).
import ciflypy as cf
import ciflypy_examples.utils as utils
ruletables = utils.get_ruletable_path()
not_amenable_table = cf.Ruletable(ruletables / "not_amenable_cpdag.txt")
possible_ancestors_table = cf.Ruletable(ruletables / "possible_ancestors_cpdag.txt")
possible_descendants_table = cf.Ruletable(ruletables / "possible_descendants_cpdag.txt")
parents_table = cf.Ruletable(ruletables / "parents_cpdag.txt")
def optimal_adjustment(cpdag, X, Y):
descendants = cf.reach(cpdag, {"X": X}, possible_descendants_table)
if set(Y).difference(descendants):
return None
not_amenable = cf.reach(cpdag, {"X": X}, not_amenable_table)
ancestors = cf.reach(cpdag, {"X": Y, "W": X}, possible_ancestors_table)
cn = set(ancestors).intersection(descendants)
forbidden = cf.reach(cpdag, {"X": cn}, possible_descendants_table)
if set(forbidden).intersection(X) or set(Y).intersection(not_amenable):
return None
pre_opt = cf.reach(cpdag, {"X": cn}, parents_table)
return set(pre_opt).difference(set(forbidden).union(X))
library("ciflyr")
library("here")
source(here("R", "utils.R"))
ruletables <- getRuletablePath()
notAmenable <- parseRuletable(file.path(ruletables, "not_amenable_cpdag.txt"))
possibleAncestors <- parseRuletable(file.path(ruletables, "possible_ancestors_cpdag.txt"))
possibleDescendants <- parseRuletable(file.path(ruletables, "possible_descendants_cpdag.txt"))
parentsTable <- parseRuletable(file.path(ruletables, "parents_cpdag.txt"))
optimalAdjustment <- function(cpdag, X, Y){
des <- reach(cpdag, list("X" = X), possibleDescendants)
if (length(intersect(Y, des)) < length(Y)) {
return (NULL)
}
nam <- reach(cpdag, list("X" = X), notAmenable)
anc <- reach(cpdag, list("X" = Y, "W" = X), possibleAncestors)
cn <- intersect(anc, des)
forb <- reach(cpdag, list("X" = cn), possibleDescendants)
if (length(intersect(forb, X)) != 0 | length(intersect(nam, Y)) != 0) {
return (NULL)
}
pre_opt <- reach(cpdag, list("X" = cn), parentsTable)
return (setdiff(pre_opt, union(forb, X)))
}
You can also find this code in the CIfly Github repository, see the Python and R version. Finally, the following short code snippet shows how this function can be called:
# find an optimal adjustment set for nodes x = 2 and y = 4
# in the CPDAG with the specified directed and undirected edges
cpdag = {"-->": [(1, 4), (3, 4), (2, 4)], "---": [(0, 1), (0, 3), (1, 3)]}
print(optimal_adjustment(cpdag, [2], [4]))
# find an optimal adjustment set for nodes x = 3 and y = 5
# in the CPDAG with the specified directed and undirected edges
cpdag <- list("-->" = rbind(c(2, 5), c(4, 5), c(3, 5)), "---" = rbind(c(1, 2), c(1,4), c(2,4)))
print(optimalAdjustment(cpdag, c(3), c(5)))