Indo-Austronesian

The Indo-Austronesian Hypothesis

I've decided to make a web-page in which I put the information about my proposal that Indo-European and Austronesian language families share a common ancestor. I realize my arguments are on borders with pseudoscience, but, let's face it, all the great scientific ideas start that way, especially in linguistics. Maybe some expert will read this and pursue my ideas.

ATTENTION: Some of the opinions stated in the following text are contrary to the mainstream science. I will not advise you to read it if you don't have a substantial background in linguistics. I am not a conspiracy theorist who wants to bombard people with controversial statements they don't know how to evaluate, and I am not denying it is possible my work is to historical linguistics what Anatoly Fomenko's work is to history. If you are ready to read it, click here.

Here are the words from the Swadesh list which I consider to be potential cognates between Proto-Indo-European and Proto-Austronesian:
*treys (three)-*telu (three)
*ronk Compare Lithuanian "ranka" (hand). (hand)-*lima (hand/five)
*ser (to flow)-*qalur (to flow)
*skend (skin)-*qanic (skin)
*stemb^h (to walk)-*qaqay (foot)
*smew (smoke)-*qabu (ash)
*serw Compare Latin "servus" (watchman). (to watch)-*qalayaw (day)
*sal (salt)-*qasira (salt)
*b^has (to talk)-*baqbaq (mouth)
*b^hewg^h (bow)-*busur (bow)
*b^her (to give birth)-*bay (woman)
*dwoh₁ (two)-*dusa (two)
*dyews (sky)-*daya (upwards/height/sky)
*danu (river)-*danaw (lake)
*dng^jhuh₂ (tongue)-*dilaq (tongue)
*delh₁ (wide)-*dempad (wide)
*eg^joh₂ (I)-*aku (I)
*men (to think)-*nemnen (to think)
*h₁nomn (name) could perhaps be a cognate to Proto-Polynesian *hingoa (also meaning "name", but that root isn't attested outside of Polynesian)
The regular sound correspondences are somewhat hard to establish. Ones that appear obvious are s:q, r:l, b^h:b and d:d.
People might think that there are not any actual correspondences and that what I find are just coincidences. But think of it this way: I have found a simple sound law (that PIE *s corresponds to PAN *q), and there are seven examples of that on the Swadesh list. What’s the probability of that if those languages are not actually related? We are mostly dealing with 2-consonantal roots here. Let's be generous and say I allowed myself the semantic drift of 3 words. Both proto-languages have about 20 consonants. So, if the word in PIE has an *s, the probability that I find a word in PAN in which PIE *s corresponds to PAN *q is 1-(1-(1-(1/20))^3)^2=26%. Swadesh list has 100 words, so we can expect that 100/20=5 words where the first consonant in PAN is *q, and 5 words where the second consonant is *q, so that there are 10 words where PIE *s can potentially correspond to PAN *q. The probability that the rule coincidentally works in 7/10 words is 1-((1-0.26^(7-1))^10)=0.31%. The probability of finding such a pattern in two truly unrelated languages is 1-(1-0.0031)^20=6%. That's pretty low.

That would be it! If you want to discuss my theory, go to the "Croatian Toponyms" forum thread I've linked to on the left. I'd like to have some educated supporters (though I realize that's unlikely). Remember, it's easy to criticize science, most scientific theories are wrong. What's hard is to contribute to some scientific theory so that it bears more explanatory power or to make a new one with more explanatory power than the existing ones.

UPDATE on 05/04/2025: You know what, I withdraw the statements I made above. I think that it is committing the fallacy of doing calculations making wildly unrealistic linguistic assumptions. For example, it is assuming the collision entropy of a consonant in an average language is around log₂(20) bits per symbol. It is not, it is far from it. The collision entropy of a single consonant in the English language, as can be measured using the methods I described elsewhere, around log₂(11) bits per symbol. Just by taking that into account, the probability of finding a single word in which Proto-Indo-European *s corresponds to Proto-Austronesian *q increases from 26% to 93.82%. And the probability of finding 7 such words on the Swadesh List increases from 0.31% to 99%. The p-value of the pattern that PIE *s corresponds to PAN *q is almost 100%. However, I still think that my alternative interpretation of the Croatian toponyms might be right, because the p-value of the k-r pattern in the Croatian river names seems to indeed be between 1/300 and 1/17, as the computer simulations adjusted for real-world data suggest that.