VII. The Contrived Counter-Factual

Populations and Variables

There would not be much point to science unless it produced knowledge that was otherwise not available to us. The goal of science is to generalize into the past, present and future so as to reliably infer things that we cannot observe directly. Yes, some areas of science are of a localized, fine-grained variety, but the goal is ultimately to use this in order to detect patterns that latch onto system regularities, allowing us to re-engineer Nature in adaptive, self-serving ways.

The brain categorizes objects and events into "populations". Each member's state of the population may differ on an indefinite number of dimensions, or "variables". The distribution of variable states in the population determines a certain state's long-term frequency - in other words, its probability. By probing a subset of the population (a sample) we hope to estimate this probability.

As part of this pragmatic means-end analysis, and in a truly astounding feat by human cognition, we mentally parse our events into meaningful categories, called populations, upon which we project variables – possibility spaces of potential states. “Mankind” is an example of a population, with a variable such as “gender”, but so is “upper-class Norwegians”, “sword fish”, “all earthquakes” and “all theoretically possible coin tosses”.

Moreover, member states can be distinguished categorically – based on how you interact with them in qualitatively different ways – or numerically, if they differ on a quantitative dimension. Statisticians have tried to erect complex taxonomies for the different types of variables, but as products of our brains, there is a degree of arbitrariness in how we frame a variable. For example, the variable “color” can both be considered categorical (including blue, green, red…) or numerical, based on electromagnetic frequency. Indeed, any continuous measurement can be chunked into separate groups, like “tall” and “short”, to simplify analysis, at the cost of less fine-grained results.

In a population, a variable’s possible states may vary in their relative frequencies. Based on an observed subset of a population, known as sample, we hope to find a way to predict what state a system will assume, or, at the very least, how certain we should be about a particular outcome, to support our decision-making. We are, in other words, looking for probability distributions and how different factors change them.

Probability Distributions

The shape of a probability distribution reflects – albeit very indirectly and cryptically – logical properties about the generating mechanism underneath. An economist may infer from a national income distribution whether the economic system is socialist or capitalist. A gambler may infer that a roulette wheel is biased. Some shapes are rather ubiquitous and mathematically elegant, indicating general organizing principles in Nature. Two populations could generate a probability distribution of the same shape, though one could be wider and the other taller, and they may apply to different types of scales. Quantities that define the particulars of a distribution, apart from its general shape, are known as parameters. Based on corresponding quantities of the sample, known as statistics, the hope is to infer the population parameter, which, given the shape, holds the key to the probability distribution we are looking for.

A scientist's task to chart the causal structure of a system through experimentation can be compared to that of understanding the workings of a complex contraption by identifying things that change in its state (e.g. sound) and linking it to parameters (the settings).

Given a set of samples of a variable, we are curious about whether or not we should re-carve our reality and regard them as separate categories from different populations. As usual, whether this is meaningful or not depends on if it affords us any predictive power. Because to be member of different categories means having different parameters, parameters can be thought of as manipulable knobs, whose settings remain fixed as the system changes dynamically. An experiment effectively asks whether an observed change is attributable to different knob-settings.

Indeed, the metaphor of it as a man-made contraption has a strong appeal: we may imagine ourselves as an archeologist who unearths a complex, mechanical device that has no obvious purpose. You search for ways of adjusting it (you distinguish and alter its “parameters”) and observe its effects (the population is its behavior at each instant in time). You may turn a knob and find that the machine emits a sound, suggesting that the alteration implied a change in parameter values. Or, it could be due to some other change: maybe, just as you turned the knob, a confluence of events independent of your action caused the sound. Ideally therefore, we would like to rewind time and see if it would happen in the absence of our turning the knob, but because the laws of spacetime won’t allow such counterfactual exploration, we assume that the times that you do not turn the knob represent this scenario.

Experimental Designs

This idea – of creating “fake counter-factuals” – is central to experimental designs. The manipulation made by the researcher is known as “independent variable”, while the results are measured along a “dependent variable”.  To fake parallel universes, the same subjects would have to undergo all the different treatments at different points in time, making it a “within-subject variable” or different subject subjects would only undergo one treatment, which could be done simultaneously, making it a “between-subject variable” where there are no dependencies between data. What design is preferred depends on the variables – in general, the former leads to less noise (since the subject is constant across the conditions) but the latter has fewer confounds (since there are no practice effects and the like).

The members of a population differ along an indefinite number of variables. By choosing members randomly we - on average, without guarantees - preserve the relative frequencies of different states in the sample without ever specifying the variables.

While it is practically impossible to account for all conceivable influences, our failure to do so won’t pose a problem if their aggregated influence is balanced across all conditions. This is ensured by randomization, in which an experimenter assigns subjects to conditions using some form of random number generator, so that each subject subjectively has equal probability to end up in either condition. No category of subjects will be systematically biased to receive one treatment over the other. Uncontrolled, “extraneous” differences in gender, mood, height, etcetera, would thus be cancelled out by chance. Hence, an uncontrolled extraneous variable is not a valid criticism of an experiment. 

Balancing-by-randomness, however, is not guaranteed, so group allocations are typically examined afterwards to make sure that they are not grossly imbalanced on the most plausible confounds. Sometimes random assignment is impossible. If, for example, gender is manipulated as a between-subject variable, height and other variables will co-vary with this, since men generally are taller, making it “quasi-experimental”. If height is a priori judged to be potential confound, the groups will therefore have to be matched instead.


For a better understanding of aggregated randomness, let us now focus on natural populations with the most famous distribution of all – the normal distribution.