Why you shouldn't trust computer simulations in social sciences

One of the first lessons about epistemology you will learn when you start studying natural sciences is that mathematics is a useful tool for telling apart simple explanations (those supported by the Occam's Razor) from ad-hoc hypotheses. Take, for example, Flat-Earthism and the illusion that ships appear to sink as they recede over the horizon. Flat-Earthers generally offer three explanations for that illusion:

It's an illusion caused by waves.
It's an illusion caused by the fact that the human eye cannot see infinitely small angles, so the middle of the ship will merge with the sea sooner than the top of the ship will.
It's an illusion caused by the light being affected by gravity. Gravity pulls the light downwards, so it should not be surprising that there is a time when the light from the top of the ship reaches the eye whereas the light from the bottom of the ship doesn't.

Now, if you ask the Flat-Earthers to give you the mathematical formula for approximating the distance how far the ship needs to be for that effect to start occurring depending on how high the observer's elevation is, none of them will be able to. For the first explanation, one who is trying to do the mathematics will quickly realize the waves would need to be higher than your eye level for that to happen. For the second explanation, one will quickly realize that explanation predicts, if anything, that the horizon is nearer the higher the observer is, whereas we empirically see it's further away the higher the observer is. For the third explanation... Really, if there is gravity, we would expect it to crash a disc-shaped planet into a sphere. You almost certainly cannot have a coherent-enough theory of gravity on a flat Earth to be able to provide a mathematical formula for that. And, yet, the Earth being round provides a way to predict how far a ship would have to be for that to occur with no more advanced math than the Pythagorean Theorem. That's why the Occam's Razor favors the explanation of the Earth being round: it allows you to predict things with extraordinarily simple math.
From that, it's easy to draw a conclusion that social sciences should use as much mathematics as possible for it not to, well, become like Flat-Earthism: having explanations which sound simple, but are in fact going wildly against the Occam's Razor. Now, obviously, simple mathematics such as the Pythagorean Theorem is rarely ready to use in social sciences. However, it often seems that computer simulations can come useful in providing numbers of how likely some result of social processes is. It seemed appealing to me because, well, I got burned onto Flat-Earthism and I don't want to do the same mistake again, and, as well, I am a computer engineer. I know much more about computer simulations than an average person, so no wonder using computer simulations seems appealing to me. However, I would argue now that relying on computer simulations in social sciences is almost never the right thing to do.
I will give you two examples of how I tried to use computer simulations in historical linguistics and got things wildly wrong:

The Etymology Game
When I was a high-school senior, I read about the task Tocharian from International Olympiad in Linguistics 2003. In it, you are supposed to guess which of the words in pairs belongs to Tocharian A and which belongs to Tocharian B. It seemed like a fun puzzle to solve. I asked myself whether I could make a computer program that generates such tasks automatically, using simulated languages. So I did, and I linked to it in the navigation, the 5th link from the top. Of course, in order to do that, I needed to make a whole bunch of assumptions of how languages really behave. So, when I was a 3rd year computer engineering student, I decided to test those assumptions. I decided to test how well the algorithm implemented in the Etymology Game will predict how the names of the numbers one to ten (the data about which is easily found on the Internet) will evolve in various languages. The results were very far from impressive: it guessed the correct result in 0.56027% of the time, whereas a random algorithm which does not even take care that the result is pronounceable guessed it 0.4797% of the time. What is going on here? Well, first of all, sometimes the words which my algorithms considers unpronounceable are the correct result. For instance, my algorithm considers the words starting in mr- to be unpronounceable, but quite a few languages do allow them. A random algorithm will easily derive Tamazit word "mraw" from Proto-Berber *meraw, whereas my algorithm will give the probability of that word to be zero. As well, my algorithm considers that the languages generally respond to difficult-to-pronounce consonant cluster at the end of a word by adding a paragogue vowel at the end of it. In reality, languages are much more likely to insert an epenthetic vowel between the two consonants. From Indo-European *dek^jm (ten), my algorithm claims that results of phonetic evolution such as *decme are far more likely than Latin decem, even though empirical data shows it significantly differently. And it's not just Latin decem, it's also Gothic taihun and Armenian tasun. That skews the results a lot.
Beginners in historical lingusitics often think that such a computer model, if it were close to accurate, would be useful to historical linguistics. But, once you think this through, you will quickly come to the conclusion that such a computer model of the evolution of languages, even if it were completely accurate, would be little more than curiosity. You can nearly always surmise some sound changes about whatever extinct language you are talking about, and sound changes do not occur independently of one another (due to the phonological system balance). That's another reason why using computer models to argue whether some phonological evolution of a word is likely or unlikely would be little more than misleading nonsense.
My paper Etimologija Karašica
In 2022, I published a paper called Etimologija Karašica in two peer-reviewed (or at least nominally peer-reviewed) journals: Valpovački Godišnjak and Regionalne Studije. In it, I argued that the p-value of that k-r pattern in the Croatian river names (Krka, Krapina, Krbavica, Kravarščica, Korana, and two rivers named Karašica) can be calculated using computer simulations to be between 1/300 and 1/17, that therefore there must have been a word such as *karr~kurr meaning to flow in Illyrian, and that Karašica comes from Illyrian *Kurr-urr-issia (flow-water-suffix). The responses I got from the experts in informatics were overwhelmingly positive, whereas the responses I received from linguists were mixed. Then, in mid-2025, an expert on Semitic languages on an Internet forum (so, knowing almost nothing about Croatian names of places) asked me if I had taken into account the Law of Sonority (also known as Sonority Sequencing Principle), as it seems to her that it could have giant effects. That I am assuming that the collision entropy of the consonant pairs is distributed equally among the consonant pairs in a word, and that's not the case. That, for languages such as English or Croatian (which allow for many consonant clusters in the beginnings and the ends of the words), word-initial and word-final consonant pairs have around 1 bit per consonant pair lower collision entropy than the consonant pairs in the middle of a word because of the Law of Sonority: consnonat clusters (and therefore, to some degree, consonant pairs) which are likely in the beginning of a word are unlikely at the end of it, and vice versa. I knew about the Law of Sonority, I just didn't make that connection. So, when I tried to make a computer simulation that controls for the Law of Sonority (by tokenizing the Aspell word-list), I got the result that the p-value is around 85%. I will try to publish a paper correcting myself.

See, the problem with computer simulations in social sciences is that it is very easy to tacitly make an assumption that's very far from reality. Sometimes you are even aware of that other factor and it's easy to control for it, but you somehow miss that connection.
I think that relying on computer simulations in social sciences is a fallacy closely related to the Ludic fallacy.
Though, makes me wonder, what if my computer models were proven wrong precisely because they are presented in a way that can be falsified? What if everybody else who isn't using computer models is every bit as wrong as I was, it's just that their "theories" aren't even presented in a way that can be easily evaluated?