In Word and Object, W.V.O. Quine talks about the way in which the mind revises its web of beliefs as if this process occurs in an unconscious way:
[…] Prediction is in effect the conjectural anticipation of further sensory evidence for a foregone conclusion. When a prediction comes out wrong, what we have is a divergent and troublesome sensory stimulation that tends to inhibit that once foregone conclusion, and so to extinguish the sentence-to-sentence conditionings that led to the prediction. Thus it is that theories wither when their predictions fail.
In an extreme case, the theory may consist in such firmly conditioned connections between two sentences that it withstands the failure of a prediction or two. We find ourselves excusing the failure of prediction as a mistake in observation or a result of unexplained interference. The tail thus comes, in an extremity, to wag the dog.
The sifting of evidence would seem from recent remarks to be a strangely passive affair, apart from the effort to intercept helpful stimuli: we just try to be as sensitively responsive as possible to the ensuing interplay of chain stimulations. What conscious policy does one follow, then, when not simply passive toward this interanimation of sentences? Consciously the quest seems to be for the simplest story. Yet this supposed quality of simplicity is more easily sensed than described. Perhaps our vaunted sense of simplicity, or of likeliest explanation, is in many cases just a feeling of conviction attaching to the blind resultant of the interplay of chain stimulations in their various strengths. (§5)
Mercier & Sperber’s argumentative theory of reasoning could offer an answer to this conundrum. To be sure, Quine is not suggesting an argumentative theory – by “story” he means theory, not argument. But he is only able to hesitantly claim that the conscious part of cognition has the function of preserving the simplicity of theories. Even this operation appears to occur at the level of intuition, and what purpose conscious reasoning has left is unclear. In the argumentative theory of reasoning, the “interplay of chain stimulations” by which contrary evidence tugs at our theoretical ideas would be a part of the intuitive track of cognition. The function of conscious reasoning would be not to oversee this intuitive process, but to come up with good ways of verbalizing its results. Conscious reasoning would not, in the normal course of things, involve changing the web of belief at all – instead its purpose would be to look for paths along the web that link the particular beliefs one anticipates having to defend to sentences that others might be willing to take as premises.
The argumentative theory claims to explain the confirmation bias by thus reconceiving the function of conscious reasoning, but Quine suggests (in the second paragraph I quoted) that a confirmation bias of sorts can occur in what I have assigned to the intuitive track of cognition as well. Sometimes our theoretical ideas have become so ingrained that we “excuse” contrary observations. As far as I can tell, Mercier & Sperber’s argumentative theory would not explain this sort of confirmation bias.
To the extent that it serves the preference for simplicity, an intuitive confirmation bias is not fundamentally irrational, because at least in certain situations selectively ignoring evidence in the name of simplicity can result in better predictions. This has proven true experimentally in the field of machine learning. Suppose that we have a plane on which each point is either red or green. Given a finite number of observations about the colors of particular points, we wish to come up with a way of predicting the color of any point on the plane. One way of doing this is to produce a function that divides the plane into two sections, red and green. If we can draw a straight line that correctly divides all of our observations, red on one side and green on the other, then we have a very simple model that, assuming that the set of observations we used to derive it is representative and sufficiently large, is likely to work well. However, if it is necessary to draw a very complex, squiggly line to correctly account for all of the observations (if we are required to use a learning machine with a high VC dimension), then it is often better to choose a simpler function even if makes the wrong prediction for a few of our observed cases. Overfitting can lead to the creation of models that deviate from the general pattern in order to account for what might actually be random noise in the observational data. In the same way, if we attempted to account for every possible bit of contrary evidence in the revision our mental theories, our ability to make useful predictions with them would be confounded. We will always encounter deviations from what we expect, and at least some of these will be caused by factors that we will never come across enough data to model correctly. In such cases, we are better off allowing our too-simple theories to stand.