This project is joint work with Yuan Yuan, PhD candidate at MIT, and the Facebook Core Data Science team. Learn more about CDS on the CDS team page.
Randomized experiments, or A/B tests, remain the gold standard for evaluating the causal effect of a policy intervention or product change. However, experimental settings, such as social networks, where users are interacting with and influencing one another, may violate conventional assumptions of no interference for credible causal inference. Existing solutions to the network setting include accounting for the fraction or count of treated neighbors in a user’s network, yet most current methods do not account for the local network structure beyond simply counting the number of neighbors.
Our study provides an approach that accounts for both the local structure in a user’s social network via motifs as well as the treatment assignment conditions of neighbors. We propose a two-part approach. We first introduce and employ causal network motifs, which are network motifs that characterize the assignment conditions in local ego networks. Then, we propose a tree-based algorithm for identifying different network interference conditions and estimating their average potential outcomes. Our approach can account for social network theories, such as structural diversity and echo chambers, and also can help specify network interference conditions that are suitable for each experiment. We test our method on a synthetic network setting and on a real-world experiment on a large-scale network, which highlight how accounting for local structures can better account for different interference patterns in networks.
As an example, Figure 1 illustrates four examples of network interference that could be captured by the local network structures and treatment assignment. The first two conditions are simply the cases where all neighbors are treated or nontreated, followed by the important network interference conditions suggested by structural diversity and complex contagion, respectively. In the case of structural diversity and echo chamber settings, the ego nodes in (c) and (d) have 1/2 neighbors treated but exhibit very different local structures, and the ego’s outcome may be different in these settings. We do not know which one is the dominant factor that drives most of the variance in the outcome.
Figure 1: Examples of network interference conditions across different local network structures. The star indicates a user and a circle represents a user’s friends. Solid circles indicate that a friend is in treatment and hollow circles indicate a friend is in control. For stars, the shaded indicates that it could be treated or control.
Given the large number of researcher degrees of freedom in existing approaches for network interference, such as choosing the threshold for an exposure condition, our approach provides a simple way to automatically specify exposure conditions. In this way, researchers no longer need to define exposure conditions a priori, and the exposure conditions generated by the algorithm are suitable for the given data and experiment. We believe that methodological innovation for addressing network interference concerns in A/B tests on networks will continue to be an important area for development, and accounting for network motifs with treatment assignment conditions provides a useful way to detect heterogeneous network interference effects.
Our study provides a two-step solution to automatically identify different exposure conditions while overcoming selection bias concerns, as will be explained in more detail in the section after Figure 2. First, for an A/B test on a network, we construct network motif features with treatment assignment conditions (i.e., causal network motifs) to provide a fine-grained characterization of the local network structure and potential interference conditions. Second, using the network motif characterization as input, we develop a tree-based algorithm to perform clustering and define the set D rather than allowing practitioners to explore that.
We introduce causal network motifs, which differ from conventional network motifs in two primary aspects. First, we focus on (1-hop) ego networks that include the ego node, with the methods generalizing to higher 𝑛-hop ego networks for 𝑛>1. Second, we consider the treatment assignment conditions of the user and their 𝑛-hop connections. We use the term “network motifs” to refer to conventional motifs without treatment assignment labels (or assignment conditions) and “causal network motifs” to refer to those with assignment conditions. Examples of network motifs are illustrated in Figure 2. We use these counts on an 𝑛-hop ego network to characterize the exposure condition of each observation.
Figure 2: Examples of causal network motifs. Stars represent egos and circles represent alters. Solid indicates the node being treated, hollow indicates control, and shaded indicates that it could be treated or control. The first patterns in each row are conventional network motifs without assignment conditions, or just called network motifs, followed by corresponding network motifs. Our interference vector is constructed by dividing the count of a causal network motif by the count of the corresponding causal network motif. The labels below each network motif indicate the naming: for example, an open triad where one neighbor is treated is named 3o-1.
After counting causal network motifs for each ego node in our network, our next step is to convert the counts to features, which will be used in the next section. Let X𝑖 denote an 𝑚-dimensional random vector, referred to as interference vector. The interference vector has an important requirement: Each element of the random vector is intervenable — that is, the random treatment assignment affects the value of each element of the vector. The requirement addresses the selection bias issue when we estimate the average potential outcomes.
We construct the interference vector in the following way. For each observation, for the count for each causal network motif (e.g., 2-1, 2-0, …, 3o-2, 3o-1, …), we normalize it by the count of the corresponding network motifs (e.g., dyads, open triads, closed triads, …). In this way, each element of X𝑖 is intervenable, and the support for each element is in [0, 1]. Note that when considering a network motif with many nodes, some observations may not have certain network motifs, and normalization cannot be performed. In these scenarios, we can either exclude this network motif from the interference vector or drop these observations if they take a really small proportion. Please refer to Figure 3 for an illustration of constructing the interference vector.
Figure 3: An example of ego network with treatment assignments and the corresponding interference vector. Stars represent egos and circles represent alters. Solid indicates the node being treated, hollow indicates control, and shaded indicates that it could be treated or control.
Then, our approach partitions [0, 1]m+1 and determines exposure conditions based on a decision tree regression. Decision trees can be used for clustering [1] and typically have good interpretability in the decision-making process [2]. Thus, it is a proper machine learning algorithm to solve the partitioning problem. Each leaf of the decision tree corresponds to a unique exposure condition (partition). Compared with conventional decision tree regression, we have several revisions to accommodate honest splitting, positivity, and so on.
Network interference is much more complicated than simply being described as the indirect effect. To examine and analyze heterogeneity of indirect effects in experimental data sets, we provide a two-step solution. We first propose and employ the causal network motifs to characterize the network interference conditions, and then develop a tree-based algorithm for partitioning. Our tree-based algorithm is interpretable in terms of highlighting which exposure conditions are important for defining potential outcomes, it addresses selection bias and positivity issues, and it avoids incorrect standard error concerns via honest splitting.
Practitioners using our approach may obtain important insights. For example, they could understand how to utilize social contagion for product promotion when they have constraints on the number of promos. Researchers may identify important network interference conditions that are not theorized in certain experimental settings.
Causal network motifs: Identifying heterogeneous spillover effects in A/B tests
Check out our open source implementation on GitHub.
Watch our presentation at the Web Conference 2021.
[1] Bing Liu, Yiyuan Xia, and Philip S Yu. 2000. Clustering through decision tree construction. In CIKM. 20–29.
[2] J. Ross Quinlan. 1986. Induction of decision trees. Mach Learn (1986).