Machine Learning is usually dichotomized into two categories, passive (e.g., supervised learning) and active (e.g., reinforcement learning) which, by and large, are studied separately. Reality is more demanding. Passive and active modes of operation are but two extremes of a rich spectrum of data-collection modes (also called research designs) that generate the bulk of the data available in practical, large scale situations. In typical medical explorations, for example, data from multiple observations and experiments are collected, coming from distinct experimental setups, different sampling conditions, and heterogeneous populations. Similarly, in a more basic setting, a baby learns from its environment by both passively observing others and interacting with its environment by actively performing interventions.
In this tutorial, we will review concepts, principles, and mathematical tools that were found useful in reasoning about passive and active modes of interaction, how these modes relate to causal and counterfactual reasoning, and how these results have been used in data-intensive sciences. In particular, we will introduce the data-fusion problem, which is concerned with piecing together multiple datasets collected under heterogeneous conditions (to be defined) so as to obtain valid answers to queries of interest. The tutorial will include discussions on issues of confounding control, policy analysis, misspecification tests, heterogeneity, selection bias, generalizability, and reinforcement learning.
The following topics will be emphasized:
Presenter: Elias Bareinboim
Time: Tuesday, May 8, 9am-1pm
Location: Room XX
Slides: 1pp, 3pp
We will be following our recent survey on data-fusion, which contains pointers to the key results on the topic.