Notes from Causal AI 2022 | Leonardo Mazzone

I am almost a year late. The next instalment of the Causal AI conference is happening in about a month.

Yet, I have only now managed to rework my notes on the 2022 edition so they are kind of readable by other humans. That’s how life goes sometimes.

The conference is organised by causaLens, a London-based company, among the first endeavours to make money by selling causal-inference-as-a-service.

Even if slightly out of date, every talk offers an interesting angle on where the field of causality is theoretically and how its results are being applied in practice. I didn’t watch all the talks (available on demand) and I don’t make any claim to accurately represent the point of view of the speakers, or even of having actually understood it. In fact, my notes mix without an ounce of shame biased summaries of what was said with personal reflections. What are you even doing here?

causaLens and its place in the community

Let’s start by discussing a broken promise. In this final keynote, causaLens announced they will soon release an open source library called cl-interfaces, which aspires to be a framework to harmonise many different implementations of algorithms across causal graph modelling, causal inference, counterfactual analysis and causal discovery. I found this to be very exciting news. There are now a few Python libraries out there, but they don’t talk to each other. I think DoWhy is an interesting project, which also attempts to provide a vendor-neutral interface, but only for causal inference, so that you need (for example) to pass a full DAG as a pre-requisite for any further analysis.

Alas, as of August 2023 no such package can be found on their GitHub page. Maybe next year is the year?

They talk about decisionOS, their commercial product which enables the implementation of higher-order components, basically chainable causal models for decision support. They discuss some of the research they have been driving, which is actually really good to see. Example: how to embed domain knowledge in the causal discovey process.

The idea of decisionOps is mentioned as the next leap following the model and data -centric approach to MLOps. Instead of just looking at prediction performance and data drift, the suggestion is to start tracking decisions and their impact with an engineering hat on.

The first half of the talk is marketing boilerplate about why causal AI is important. You’d expect this audience to already be on board with the whole idea. More generally, it feels like the community is still having to spend a lot of effort preaching the gospel to try and make it mainstream, hoping that the critical mass will eventually reduce the technical and cultural friction to adopting causal methods.

Comparing causal discovery methods

Nima Safaei from Scotia Bank introduces the concept of a Rashomon gap. I could not find the term referenced in the literature so it might be his own invention in this context. Epistemologically, the Rashomon effect arises when there are two conflincting explanations for a phenomenon. In data science that means two models, both accurate in their predictions, associated with inconsistent explanations of some data. Thus, the Rashomon gap identifies the delta between what a model thinks it’s learning, which determines how it predicts (explainability), and the causal functions that determine the true, actionable, relationships between variables (interpretability). He talks about measuring the gap in practice as 1 minus the correlation between the causal score (?) of a variable and the effective slope of that variable in the model (i.e. a function of Shapley values). The larger the correlation, the lower the gap. We want to minimise that gap.

He then talks about a framework he applied to compare the key drivers identified by various causal discovery methods in a project they worked on. Many different families of methods exist, e.g. graphical models, pair-wise dependency tests, etc. It appears like the various methods fundamentally disagree with each other (not great). I wish he had linked some papers or at least published a blog post to expand on this.

He makes the subtle point that in his domain (finance) they work with many unobserved (latent) variables, and that the causal structure of those variables is not understood. This doesn’t allow to establish whether or not the Markov condition holds. Again, I wish more detail had been provided.

Defining causality, intention, responsibility, blame

Joe Halpern discusses the problem of even defining causality in the first place, focussing not on type causality, i.e. the global causal relationship between variables (e.g. “smoking causes cancer”), but on token / actual causality, defined over specific events (“the fact that Willard smoked for 30 years caused him to get cancer”).

The jist is that even for token causality, a good definition has eluded humanity for millennia. He and Judea Pearl (the latter having won the Turing Award for basically inventing the field of causal AI) came up with a first definition in 2001 and new ones have had to be developed since then, over and over, to adapt to edge cases as they were surfaced. On one hand there seems to be a strong, coherent signal in whether we should consider A to be the cause of B when we ask groups of people. On the other hand, coming up with a definition that’s mathematically rigorous, simple to express, easy to apply, and capturing all the nuances and variety of factors expressed by human judgement, that’s really hard. Here are a few factors that are mentioned:

Edge cases in the mechanical view of causality: what if both Alice and Bob try to hit a bottle with a stone, but Alice’s gets to the bottle first. If Alice hadn’t thrown the stone, the bottle would have been toppled by Bob. So in a sense, Alice’s actions are irrelevant towards whether the bottle is or isn’t standing after someone has thrown. And yet, we want to somehow capture that Alice is indeed the cause.
Humans are not just interested in the binary notion of whether A is a cause or B, or its probabilistic equivalent. They are interested in the degree of blame, expressing the extent to which responsibility is shared, weighed by the contribution towards the final effect. So members of an execution squad have the same degree of cause towards death, but typically this is not the case. For example, one of the participants in an action could have more “power”.
Let’s say both A and B are a cause of C. If A happens more frequently and it happening is somehow “normalised” (perhaps even encoded, or a more “lawful” occurrence than B), then B should be seen as “more of a cause” than A.
Intentionality and knowledge play a role too. A doctor caused a patient to die by prescribing the wrong drug, but would they reasonably have been expected to know the result of their actions? Note how far this is from the purely mechanical idea of causation.

Causal inference has applications in many fields, and while some of the nuances above won’t be relevant for e.g. program verification, they will for any system taking decisions involving humans, and naturally for anything interacting with legal systems.

The journey to causal inference of an energy giant

Total talk about their journey to increasingly complex causal problems. The justification: “In operational settings, most of the questions that people have are causal questions”. Fair enough.

They started with the simplest setting of a binary treatment, which is addressed in the most standard way: propensity score matching.

Then they looked at continuous treatments, and heterogenous effects, i.e. how does your treatment affect differently various sub-populations? They compared multiple meta-learners and submitted a paper about this.

Unlike more traditional machine learning approaches, you can’t just evaluate your models with cross-validation. You don’t have data on your counterfactual worlds. So what they did is use semi-synthetic datasets simulated using physics-based engines / equations.

They also looked at multiple treatments (optimise multiple parameters at once), that’s the really complex setting one at which point they contacted causaLens.

Finally, the most interesting scenario: root cause analysis of system failures from sensor time series. On this, here is another paper they published. Prolific!

They have been doing all the work above without specifying DAGs explicitly, saying they haven’t needed to use that framework yet. A brave thing to utter with Joe Halpern in the room.

Question from the audience: how do you make people care about causal AI?

“It feels like talking about a third dimension to people who can only see two”

Answer: There’s a hierarchy of needs. If the organisation is still trying to nail the descriptive layer of data work, i.e. getting the data management and analytics right, no way you’ll get them on board. You need to already be doing predictive modelling somewhat successfully and not satisfied with where you are. Then you can go straight to the domain experts and convince them their most pressing questions are causal.

Nestle talks about Nestle

This was a surprisingly uninformative talk. Main messages: Nestle is a big company, and has started exploring Causal AI. And, they have big DAGs (think 50 variables), informed by subject matter experts. Because of their scale they can afford to throw in the model economic variables like GDP or unemployment rate for a country. They only perform causal discovery to check the DAG they have already constructed, they don’t think the algorithms are ready for determining the causal relationships fully. Also, the current open source offering in this space does not deal well with multi-treatment problems (which echoes the point from Total’s presentation).

Some applications they have tested:

optimising production and distribution to maximise sales, which involves modelling the effect of different (potentially overlapping) marketing promotions
the impact of different marketing budget allocation decisions on sales (measured as money spent on various channels given a fixed budget)

How causality can help decision-makers

Annie Hou talks about the success (or lack thereof) machine learning systems when concretely implemented in businesses, and how causality can help address some of the key challenges.

Interaction with business processes can corrupt the intended application of a model beyond recognition. As illustration she presents an anecdote from Models and Managers: The Concept of a Decision Calculus, a paper from the 1970s that is definitely worth a read.

A key dynamic is the obsession with big data followed by the inevitable disillusion. Most organisations never get data that is big enough on its own. We live in a world that’s constantly evolving, thus you need adaptable models, informed by domain knowledge, that can make the most of average or even small quantities of data.

Another element is the importance of a good interface. In her work with various organisations, “no matter how great the model was, if the results were just numbers, and were not visualised in a nice way, it didn’t really go anywhere”. Additionally, simpler, less accurate models that were more actionable brought much more value since they had a chance to be actually used to make decisions. Alas, one of the core properties of a successful machine learning system is how “shiny” it is, as non-technical colleagues won’t understand the details of the underlying machinery and thus won’t draw excitement from it. Some of the excitement needs to be manufactured with a good story in order to generate enough interest and recognition of your hard modelling work.

My final takeaway (which is well aligned with the causal focus on intervention/counterfactuals over the modelling of association) is the importance of better model designs to generate an output which can be embedded in effective decision-making processes: “It’s not about giving stakeholders an answer, it’s about giving the landscape of possible solutions”. Also: “help them build a defensible position”, i.e. empower leaders to use evidence-based work to strengthen their argument. Even this framing can present some challenges: “as much as you trust the data, people always trust their gut much more”. And apparently, of the many sectors she had worked with, clients in the defense space were the only ones who already understood there are many possible scenarios and were used at looking at evidence from multiple sides.