Croatian Toponyms

Toponyms are often affected by the linguistic phenomenon called tautology. A toponym is often composed of more words in different languages meaning exactly the same. A famous example of that is Torpenhow Hill.

My Interpretation of the Croatian Toponyms

UPDATE on 10/05/2022: If you speak Croatian, you may be interested in watching this YouTube video in which I summarize my ideas about toponyms (or, in case your browser cannot stream it, download the MP4 and open it in VLC or something like that).

Sunset on the island called Mljet.
The Salt Lake
on the Mljet Island.
It's sometimes suggested
that the islands
were once the places richest
in toponyms, because people
had to use
every single source of
fresh water and every single
piece of fertile land.

I was asked to create a web-page in which I summarize my alternative interpretation of the Croatian toponyms, which I have supported on many Internet forums and on some conferences (full text is available in this PDF on the page 70), so that we have everything about it on one page. Here we go!

ATTENTION: Some of the opinions stated in the following text are contrary to the mainstream science. I will not advise you to read it if you don't have a substantial background in linguistics. I am not a conspiracy theorist who wants to bombard people with controversial statements they don't know how to evaluate, and I am not denying it is possible my work is to historical linguistics what Anatoly Fomenko's work is to history. If you are ready to read it, click here.

The remainings of the Roman thermae in Issa.
The Roman Thermae
in Issa (Vis)
were getting the water
from a mineral spring
that doesn't exist
any more.
However, it's possible
that Issa was
named after it,
from the Indo-European
root *yos (spring).

That would be it! If you want to discuss my theory, go to the "Croatian Toponyms" forum thread I've linked to on the left. I'd like to have some sane opposition there, because I think my interpretation may be right. Ideas are correct or incorrect independent of their creators. The fact that I am not a linguist specializing in those things doesn't mean my ideas are wrong. I've used the methods that are well-accepted in linguistics (apart from applying statistics to the toponyms, which is for some reason very rarely done), I've just come to the conclusions that are different from the mainstream ones.
UPDATE on 09/07/2018: You can download my Illyrian-Croatian dictionary here (it's a .DOCX file!).
UPDATE on 11/04/2021: I managed to install MatLab on my computer. So, here is that Octave program related to entropies modified so that it can be run in MatLab:
% Ovo je MatLabski program koji uspoređuje rezultate koje daje moj algoritam
% procjenjivanja entropije s rezultatima koje daje Shannonov algoritam.
suglasnici = 'bcdfghjklmnpqrstvwxyz';
testni_stringovi=cell(100 - length(suglasnici) + 1, 1);
for koliko_cemo_staviti_b_ova = 100 - length(suglasnici) + 1 : -1 : 1
  for i = 1 : koliko_cemo_staviti_b_ova
    testni_stringovi{koliko_cemo_staviti_b_ova} = [
      testni_stringovi{koliko_cemo_staviti_b_ova} 'b'
  for i = 1 : 100 - koliko_cemo_staviti_b_ova
    testni_stringovi{koliko_cemo_staviti_b_ova} = [
        testni_stringovi{koliko_cemo_staviti_b_ova} suglasnici(int32(floor((i - 1) / (100 - koliko_cemo_staviti_b_ova) * (length(suglasnici) - 1))) + strfind(suglasnici, 'c'))
samarzijine_entropije = [];
shannonove_entropije = [];
for i = 1 : length(testni_stringovi)
  str = testni_stringovi{i};
  samarzijine_entropije = [samarzijine_entropije samarzijina_entropija(str)];
  shannonove_entropije = [shannonove_entropije shannonova_entropija(str, suglasnici)];
sgtitle('Usporedba Shannonove i Samarzijine entropije generiranih stringova');
plot(shannonove_entropije, samarzijine_entropije);
xlabel('Shannonova entropija');
ylabel('Samarzijina entropija');
hold on;
xlabel('Broj b-ova u stringu');
ylabel('Entropija (bit/simbol)');
legend('Shannonova entropija', 'Samarzijina entropija');
function ret = shannonova_entropija(str, suglasnici)
  apsolutne_frekvencije = [];
  for i = 1 : length(suglasnici)
    apsolutne_frekvencije = [apsolutne_frekvencije 0];
  for i = 1 : length(str)
    znak = str(i);
    apsolutne_frekvencije(strfind(suglasnici, znak)) = apsolutne_frekvencije(strfind(suglasnici, znak)) + 1;
  relativne_frekvencije = apsolutne_frekvencije / length(str);
  ret = 0;
  for relativna_frekvencija = relativne_frekvencije
    if relativna_frekvencija > 0
      ret = ret - log2(relativna_frekvencija) * relativna_frekvencija;
function ret = samarzijina_entropija(str)
  broj_pokusaja = 10000;
  broj_pogodaka = 0;
  for i = 1 : broj_pokusaja
    prvi = int32(floor(rand() * length(str) + 1));
    drugi = int32(floor(rand() * length(str) + 1));
    if str(prvi) == str(drugi)
      broj_pogodaka = broj_pogodaka + 1;
  omjer_pogodaka = broj_pogodaka / broj_pokusaja;
  ret = -log2(omjer_pogodaka);
Here is what it outputs:
The output of the MatLab program above.
UPDATE on 14/04/2021: I have found out why the Octave program and the MatLab program give wildly different results. Namely, there was a syntax error in my program. MatLab refused to parse it, but Octave was apparently doing automatic semicolon insertion (like JavaScript engines are doing). That resulted in incorrect test strings in testni_stringovi. This is what the test strings look like in MatLab when exported to a CSV file, and this is what they look like when exported from Octave.
Anyway, since we know the number of possible consonant pairs in Croatian is 26*26=676, the maximal possible entropy a consonant pair in the Croatian language could have is log2(676)=9.4 bits/symbol. And we have measured the Shannon's entropy to be log2(229)=7.839. So, assuming the curve representing the relationship between the Samaržija's entropy and the Shannon's entropy does not change its shape between individual consonants and consonants pairs, but only scales uniformly (which I have no idea how to test), we can estimate the Samaržija's entropy of the consonant pairs in the Croatian language the following way. We can assume the entropy of the pairs of consonants is log2(676)/log2(21)=2.14 times bigger than the corresponding entropy of individual consonants. The ratio between the measured Shannon's entropy and the maximal possible entropy in this case is 7.839/9.4=0.834. Thus, the corresponding point on the curve on the above diagram is when the Shannon's entropy is equal to 0.834*log2(21)=3.663 bits/symbol. The Samaržija's entropy at that point, as can be read from the diagram, is around 2.8 bits/symbol. Thus, we can expect the Samaržija's entropy of the pairs of consonants in Croatian to be around 2.14*2.8=5.992 bits/symbol. Thus, the probability of two random words beginning with the same pair of consonants should be around 1/(2^5.992)=1/63.65=1.57%. If that is true, then the p-value of that pattern of the Croatian river names starting with *karr~kurr is only 5.9% (the highest estimate I got by running the birthday-paradox-calculation written in C a few times), rather than around 1/500. Well, I guess it is always like that in social sciences: If you think you have a good p-value, you are probably calculating something incorrectly.
Of course, whether that is a correct estimate for the p-value depends on where the entropy of the language goes. If it is mostly syntax and morphology that decreases the entropy of the language, then those decreases in entropy do not matter in toponyms borrowed from an ancient language. Only if those decreases in entropy come from the phonology, they do matter. See the paper I linked below for a lengthy discussion about that, including my attempts to estimate which parts of the grammar are responsible for how much decrease of entropy.

UPDATE on 29/04/2021: You can see the draft of the next paper about linguistics I am planning to publish.

UPDATE on 14/09/2021: I have written a paper explaining what I think about the name Karašica, summarizing many of the things explained in the paper linked above. If you cannot open it, try opening this HTML file.

UPDATE on 06/10/2021: I asked a professional historical linguist, Dubravka Ivšić, what she thinks about my text about the river name Karašica via e-mail and posted her answer here, because, like I have said, I am not a conspiracy theorist who wants people not to hear both sides of the story: Poštovani Teo,
hvala Vam na Vašem mailu i interesu za predslavensku toponimiju.
Sinkronijski gledano, ime Karašica je slavensko, s obzirom na to da je izvedeno slavenskim sufiksom -ica. Pitanje je odakle je osnova (karas- ili karaš-), no to ne mijenja prvu činjenicu (isto kao što je npr. Jurica ime izvedeno hrvatskim sufiksom od osnove grčkoga podrijetla, pa ga to čini hrvatskim imenom). Koliko sam upoznata, hidronim Karašica prvi put je zabilježen tek u 17. st., na mađarskom se zove Karassó. Želite li doista poštivati znanstvenu metodologiju, trebalo bi prikupiti povijesne potvrde hidronima Karašica (iz pisanih izvora i sa starih karata) te utvrditi koji je najstariji oblik. S obzirom na to da dunavska Karašica teče i kroz Mađarsku, za nju u obzir dolazi i da je u mađarski ime posuđeno iz hrvatskog i obrnuto, iz hrvatskog u mađarski. Također, osnova karaš- plodna je i drugim toponimima (i izvan Hrvatske), pa bi trebalo utvrditi i jesu li svi oni povezani, tj. je li riječ o istoj onomastičkoj osnovi.
Formalno gledajući, nema prepreka da bi hidronim Karašica bio izveden od naziva ribe karas ili karaš (taj naziv se ne odnosi samo na zlatnu ribicu), a dublje podrijetlo naziva ribe u ovom slučaju nije relevantno za hidronim (slično kao što je i Krapina najvjerojatnije izvedeno od naziva ribe krap).
Što se tiče ostalih navedenih rijeka koje u svojim imenima sadrže k-r: Krka bi doista moglo biti predslavensko ime, Korana je nesigurnoga podrijetla, Krbavica je izvedeno od Krbava, a Kravarščica je izvedeno od Kravarsko (što je izvedeno od kravar).
Indoeuropski korijen koji spominjete rekonstruira se kao *k(')ers- sa značenjem 'trčati', a postoje mišljenja da se od njega u germanskim jezicima razvila riječ za konja. Indoeuropska riječ za konja rekonstruira se kao *h1ek'u-. Argument koji počinjete s „mnogi ilirski natpisi počinju s“ potpuno je promašen, s obzirom na to da ne postoje natpisi pisani „ilirskim jezikom“.
Matematičke metode u lingvistici mogu biti korisne u nekim slučajevima, no one ne mogu zamijeniti klasične lingvističke metode. U povijesnoj toponimiji nema prečaca.
Srdačan pozdrav,
Dubravka Ivšić Majić
Anyway, what do you think, who is really being more scientific here? Is it me, who has attempted to measure collision entropy of different parts of the Croatian grammar and has done numerical calculations showing the probability of that k-r pattern occurring by chance is somewhere between 1/300 and 1/17? Or is it her, who makes arguments from silence (that the name Karašica is unlikely to date back to antiquity because of its late first known attestation in the 17th century; that is also historically inaccurate, the name Karašica is first mentioned in a document from the year 1228 together with a dubious piece of information that it used to be called Mogioros in antiquity; Even if it were true, it would be much like saying Marco Polo has not really been to China because he did not mention the Great Wall or tea), does some intricate theoretical reasoning overshadowing my experimental results (like the contemporary response to the Ignaz Semmelweis experiment showing that puerperal fever was caused almost exclusively by uncleanliness), and asserts that traditional methods are superior to mathematical methods?

UPDATE on 18/12/2021: I have made a LibreOffice presentation about my alternative interpretation of the names of places in Croatia.

UPDATE on 26/12/2021: I have written a short summary of the ideas presented in the presentation: To summarize, I think that I have thought of a way to measure the collision entropy of the different parts of the grammar. The entropy of the syntax can obviously be measured by measuring the entropy of spell-checker word list such as that of Aspell and subtracting from that an entropy of a long text in the same language. I got that, for example, the entropy of the syntax of the Croatian language is log2(14)-log2(13)=0.107 bits per symbol, that the entropy of the syntax of the English language is log2(13)-log2(11)=0.241 bits peer symbol, and that the entropy of the syntax of the German language is log2(15)-log2(12)=0.3219 bits per symbol. It was rather surprising to me that the entropy of the syntax of the German language is larger than the entropy of the syntax of the English language, given that German syntax seems simpler (it uses morphology more than the English language does, somewhat simplifying the syntax), but you cannot argue with the hard data. The entropy of the phonotactics of a language can, I guess, be measured by measuring the entropy of consonant pairs (with or without a vowel inside them) in a spell-checker wordlist, then measuring the entropy of single consonants in that same wordlist, and then subtracting the former from the latter multiplied by two. I measured that the entropy of phonotactics of the Croatian language is 2*log2(14)-5.992=1.623 bits per consonant pair. Now, I have taken the entropy of the phonotactics to be the lower bound of the entropy of the phonology, that is the only entropy that matters in ancient toponyms (entropy of the syntax and morphology do not matter then, because the toponym is created in a foreign language). Given that the Croatian language has 26 consonants, the upper bound of the entropy of morphology, which does not matter when dealing with ancient toponyms, can be estimated as log2(26*26)-1.623-2*0.107-5.992=1.572 bits per pair of consonants. So, to estimate the p-value of the pattern that many names of rivers in Croatia begin with the consonants 'k' and 'r' (Karašica, Krka, Korana, Krbavica, Krapina and Kravarščica), I have done some birthday calculations, first setting the simulated entropy of phonology to be 1.623 bits per consonant pair, and the second by setting the simulated entropy of phonology to be 1.623+1.572=3.195 bits per consonant pair. The former gave me the probability of that k-r-pattern occuring by chance to be 1/300 and the latter gave me the probability 1/17. So the p-value of that k-r-pattern is somewhere between 1/300 and 1/17. So I concluded that the simplest explanation is that the river names Karašica, Krka, Korana, Krbavica, Krapina and Kravarščica are related and all come from the Indo-European root *kjers meaning horse (in Germanic languages) or to run (in Celtic and Italic languages). Do those arguments sound compelling to you?
UPDATE on 16/09/2021: The Etruscan letters are apparently flipped left-to-right on Android, I have started a Reddit thread about that.

UPDATE on 06/01/2022: A lot of the responses I get on the Internet forums when I share my ideas boil down to "You should not use mathematics in this part of linguistics.". Well, here is how I will respond to them: Samo ti slijepo vjeruj da statistika i informatika nemaju ništa za reći o hrvatskim toponimima. Toliko ne znaš o informatici i statistici da poričeš da su one uopće korisne. To je onaj prvi stadij Dunning Krugerovog efekta, kad poričeš da je vještina korisna. Zapravo, ima bolji opis što se s tobom događa: ti si u poziciji Darwina kada je komentirao na Mendelov rad: "Matematika je u biologiji ono što je skalpel u stolarevoj radionici, nema tamo što tražiti.". Danas to zvuči smiješno. Zapravo, zaboravi, ti nisi ni na toj razini, ti si na razini onih što su poricali indoeuropsku lingvistiku zbog svoje slijepe vjere u priču o Kuli babilonskoj i da lingvistika nema ništa za reći o tome. I žao mi je što u 21. stoljeću ima ljudi koji tako razmišljaju, kao da ih posljednjih nekoliko stoljeća razvoja znanosti nisu ništa naučila. Nema prave znanosti bez statistike. Whether or not my theories are correct, "You should not use mathematics in this part of linguistics." is a ridiculous argument and deserves such a response.

UPDATE on 13/01/2022: Here is the table with the data about collision entropy of various languages, which I have measured for purposes of my experiment:
Language nameCollision entropy of consonants in a long textThe most common consonant in a long textCollision entropy of consonants in the Aspell word-listThe most common consonant in the Aspell word-listCollision entropy of the syntax

UPDATE on 21/03/2022: I have written a NodeJS program that does all the calculations described here automatically, with no need to copy results from one program into another: "use strict"; let suglasnici = "bcčćdđfghjklmnpqrsštvwxyzž"; // NodeJS podržava ne-ASCII (hrvatske...) // znakove u stringovima. suglasnici += suglasnici.toUpperCase(); const datotecniSustav = require("fs"); const dugacakTekst = datotecniSustav.readFileSync("tekst.txt", { encoding: "utf-8", flag: "r" }); let mapaSaSuglasnicima = new Map(); for (const znak of dugacakTekst) if (suglasnici.indexOf(znak.toLowerCase()) !== -1) mapaSaSuglasnicima.set( znak.toLowerCase(), (mapaSaSuglasnicima.get(znak.toLowerCase()) | 0) + 1 ); let zbroj = 0; for (const apsolutna_frekvencija of mapaSaSuglasnicima.values()) zbroj += apsolutna_frekvencija; let kolizijskaEntropijaSuglasnikaUDugackomTekstu = 0; for (const apsolutna_frekvencija of mapaSaSuglasnicima.values()) kolizijskaEntropijaSuglasnikaUDugackomTekstu += (apsolutna_frekvencija / zbroj) ** 2; kolizijskaEntropijaSuglasnikaUDugackomTekstu = -Math.log2( kolizijskaEntropijaSuglasnikaUDugackomTekstu ); const rjecnik = datotecniSustav.readFileSync("croatian.wl", { encoding: "utf-8", flag: "r" }); mapaSaSuglasnicima = new Map(); for (const znak of rjecnik) if (suglasnici.indexOf(znak.toLowerCase()) !== -1) mapaSaSuglasnicima.set( znak.toLowerCase(), (mapaSaSuglasnicima.get(znak.toLowerCase()) | 0) + 1 ); zbroj = 0; for (const apsolutna_frekvencija of mapaSaSuglasnicima.values()) zbroj += apsolutna_frekvencija; let kolizijskaEntropijaSuglasnikaURjecniku = 0; for (const apsolutna_frekvencija of mapaSaSuglasnicima.values()) kolizijskaEntropijaSuglasnikaURjecniku += (apsolutna_frekvencija / zbroj) ** 2; kolizijskaEntropijaSuglasnikaURjecniku = -Math.log2( kolizijskaEntropijaSuglasnikaURjecniku ); let mapaSParovimaSuglasnika = new Map(); for (const prvi of suglasnici) for (const drugi of suglasnici) mapaSParovimaSuglasnika.set((prvi + drugi).toLowerCase(), 0); let prethodni, sadasnji, brojac = 0; for (const znak of rjecnik) { if (suglasnici.indexOf(znak) !== -1) { prethodni = sadasnji; sadasnji = znak.toLowerCase(); if (prethodni !== undefined) { brojac++; mapaSParovimaSuglasnika.set( prethodni + sadasnji, mapaSParovimaSuglasnika.get(prethodni + sadasnji) + 1 ); } } } let shannonovaEntropijaParovaSuglasnika = 0, kolizijskaEntropijaParovaSuglasnika = 0; for (const apsolutnaFrekvencija of mapaSParovimaSuglasnika.values()) if (apsolutnaFrekvencija) { shannonovaEntropijaParovaSuglasnika -= (apsolutnaFrekvencija / brojac) * Math.log2(apsolutnaFrekvencija / brojac); kolizijskaEntropijaParovaSuglasnika += (apsolutnaFrekvencija / brojac) ** 2; } kolizijskaEntropijaParovaSuglasnika = -Math.log2( kolizijskaEntropijaParovaSuglasnika ); console.log( "Kolizijska entropija suglasnika u dugačkom tekstu: " + kolizijskaEntropijaSuglasnikaUDugackomTekstu + "=log2(" + 2 ** kolizijskaEntropijaSuglasnikaUDugackomTekstu + ")" ); console.log( "Kolizijska entropija suglasnika u rječniku: " + kolizijskaEntropijaSuglasnikaURjecniku + "=log2(" + 2 ** kolizijskaEntropijaSuglasnikaURjecniku + ")" ); console.log( "Kolizijska entropija sintakse: " + (kolizijskaEntropijaSuglasnikaURjecniku - kolizijskaEntropijaSuglasnikaUDugackomTekstu) ); console.log( "Shannonova entropija parova suglasnika u rječniku: " + shannonovaEntropijaParovaSuglasnika ); console.log( "Kolizijska entropija parova suglasnika u rječniku: " + kolizijskaEntropijaParovaSuglasnika ); console.log( "Kolizijska entropija fonotaktike: " + (2 * kolizijskaEntropijaSuglasnikaURjecniku - kolizijskaEntropijaParovaSuglasnika) ); let iznad_koliko_kolizija_brojimo = 7, // Toliko, koliko ja znam, rijeka u Hrvatskoj počinje na k-r: Karašica (2 puta, jedna se ulijeva u Dravu, a druga u Dunav), Krka, Korana, Krbavica, Krapina i Kravarščica. koliko_ima_rijeka_u_Hrvatskoj = 100, // Ako netko ima ideju kako to točnije procijeniti, neka mi se slobodno javi. koliko_smo_puta_dobili_toliko_kolizija = 0, koliko_smo_puta_izvrtili_simulaciju = 1_000_000; for (let brojac = 0; brojac < koliko_smo_puta_izvrtili_simulaciju; brojac++) { let koliko_rijeka_pocinje_na_taj_par_suglasnika = []; for ( let brojac = 0; brojac < 2 ** (kolizijskaEntropijaParovaSuglasnika + 2 * (kolizijskaEntropijaSuglasnikaURjecniku - kolizijskaEntropijaSuglasnikaUDugackomTekstu)); brojac++ ) koliko_rijeka_pocinje_na_taj_par_suglasnika.push(0); for (let brojac = 0; brojac < koliko_ima_rijeka_u_Hrvatskoj; brojac++) koliko_rijeka_pocinje_na_taj_par_suglasnika[ Math.floor( Math.random() * 2 ** (kolizijskaEntropijaParovaSuglasnika + 2 * (kolizijskaEntropijaSuglasnikaURjecniku - kolizijskaEntropijaSuglasnikaUDugackomTekstu)) ) ] += 1; let jesmo_li_nasli_potreban_broj_kolizija = false; for ( let brojac = 0; brojac < 2 ** (kolizijskaEntropijaParovaSuglasnika + 2 * (kolizijskaEntropijaSuglasnikaURjecniku - kolizijskaEntropijaSuglasnikaUDugackomTekstu)); brojac++ ) if ( koliko_rijeka_pocinje_na_taj_par_suglasnika[brojac] >= iznad_koliko_kolizija_brojimo ) { jesmo_li_nasli_potreban_broj_kolizija = true; break; } if (jesmo_li_nasli_potreban_broj_kolizija) koliko_smo_puta_dobili_toliko_kolizija += 1; } console.log( `Vjerojatnost da ${iznad_koliko_kolizija_brojimo} od ${koliko_ima_rijeka_u_Hrvatskoj} hidronima slučajno počinje na isti par suglasnika iznosi ${ (koliko_smo_puta_dobili_toliko_kolizija / koliko_smo_puta_izvrtili_simulaciju) * 100 }%.` );This time, to calculate the collision entropy, instead of using the complicated algorithm that follows right from the definition (choose two symbols from the string randomly, check whether they are equal, and repeat that many times), I used a much simpler algorithm described at Wikipedia. I must admit my understanding of the issue has improved drastically.

UPDATE on 23/03/2022: Here is how I responded to somebody comparing me to theologians who try to use mathematics to prove the existence of God:
Mislim da, da je ontološki argument dobar, matematička logika bi bila izvrstan alat za dokazivanje postajanja Boga. Nažalost, ontološki argument zasniva se na dvije premise koje su u najmanju ruku veoma upitne:
  1. Bog postoji u nekim mogućim svjetovima. Drugim rječima, paradoks svemoći i drugi a-priori argumenti protiv postojanja Boga nisu valjani.
  2. Ono što je savršeno i postoji u nekim mogućim svjetovima postoji u svim mogućim svjetovima. To je upitno jer se, recimo, čini da savršeni krug postoji u nekim mogućim svjetovima, ali ne i u našemu.
Matematičkom logikom se eventualno može dokazati da postoji forma ontološkog argumenta koja je logički valjana, no to nam, zbog tih upitnih premisa, ne govori da Bog postoji. To jest, eventualno se matematičkom logikom može dokazati da Kant nije bio u pravu da je skrivena premisa svake forme ontološkog argumenta da je postojanje logički predikat, no to ne negira problem da su te dvije premise upitne.
Sve u svemu, problem ni s jednim oblikom ontološkog argumenta nije to da koristi matematičku logiku.

UPDATE on 19/04/2022: I have written a script for my new YouTube video about toponyms.

UPDATE on 19/04/2022: I have published a YouTube video about my alternative interpretation of Croatian toponyms. If you cannot open it, try opening this MP4 video in VLC or a similar program.

UPDATE on 18/06/2022: My informatics professor Anđelko Lišnjić suggested me that I make a table with the frequencies of consonant pairs in the Croatian language. So I did that!