Croatian Toponyms

Toponyms are often affected by the linguistic phenomenon called tautology. A toponym is often composed of more words in different languages meaning exactly the same. A famous example of that is Torpenhow Hill.

My Interpretation of the Croatian Toponyms

Sunset on the island called Mljet.
The Salt Lake
on the Mljet Island.
It's sometimes suggested
that the islands
were once the places richest
in toponyms, because people
had to use
every single source of
fresh water and every single
piece of fertile land.

I was asked to create a web-page in which I summarize my alternative interpretation of the Croatian toponyms, which I have supported on many Internet forums and on some conferences (full text is available in this PDF on the page 70), so that we have everything about it on one page. Here we go!

ATTENTION: Some of the opinions stated in the following text are contrary to the mainstream science. I will not advise you to read it if you don't have a substantial background in linguistics. I am not a conspiracy theorist who wants to bombard people with controversial statements they don't know how to evaluate, and I am not denying it is possible my work is to historical linguistics what Anatoly Fomenko's work is to history. If you are ready to read it, click here.

The remainings of the Roman thermae in Issa.
The Roman Thermae
in Issa (Vis)
were getting the water
from a mineral spring
that doesn't exist
any more.
However, it's possible
that Issa was
named after it,
from the Indo-European
root *yos (spring).

That would be it! If you want to discuss my theory, go to the "Croatian Toponyms" forum thread I've linked to on the left. I'd like to have some sane opposition there, because I think my interpretation may be right. Ideas are correct or incorrect independent of their creators. The fact that I am not a linguist specializing in those things doesn't mean my ideas are wrong. I've used the methods that are well-accepted in linguistics (apart from applying statistics to the toponyms, which is for some reason very rarely done), I've just come to the conclusions that are different from the mainstream ones.
UPDATE on 09/07/2018: You can download my Illyrian-Croatian dictionary here (it's a .DOCX file!).
UPDATE on 11/04/2021: I managed to install MatLab on my computer. So, here is that Octave program related to entropies modified so that it can be run in MatLab:
% Ovo je MatLabski program koji uspoređuje rezultate koje daje moj algoritam
% procjenjivanja entropije s rezultatima koje daje Shannonov algoritam.
suglasnici = 'bcdfghjklmnpqrstvwxyz';
testni_stringovi=cell(100 - length(suglasnici) + 1, 1);
for koliko_cemo_staviti_b_ova = 100 - length(suglasnici) + 1 : -1 : 1
  for i = 1 : koliko_cemo_staviti_b_ova
    testni_stringovi{koliko_cemo_staviti_b_ova} = [
      testni_stringovi{koliko_cemo_staviti_b_ova} 'b'
  for i = 1 : 100 - koliko_cemo_staviti_b_ova
    testni_stringovi{koliko_cemo_staviti_b_ova} = [
        testni_stringovi{koliko_cemo_staviti_b_ova} suglasnici(int32(floor((i - 1) / (100 - koliko_cemo_staviti_b_ova) * (length(suglasnici) - 1))) + strfind(suglasnici, 'c'))
samarzijine_entropije = [];
shannonove_entropije = [];
for i = 1 : length(testni_stringovi)
  str = testni_stringovi{i};
  samarzijine_entropije = [samarzijine_entropije samarzijina_entropija(str)];
  shannonove_entropije = [shannonove_entropije shannonova_entropija(str, suglasnici)];
sgtitle('Usporedba Shannonove i Samarzijine entropije generiranih stringova');
plot(shannonove_entropije, samarzijine_entropije);
xlabel('Shannonova entropija');
ylabel('Samarzijina entropija');
hold on;
xlabel('Broj b-ova u stringu');
ylabel('Entropija (bit/simbol)');
legend('Shannonova entropija', 'Samarzijina entropija');
function ret = shannonova_entropija(str, suglasnici)
  apsolutne_frekvencije = [];
  for i = 1 : length(suglasnici)
    apsolutne_frekvencije = [apsolutne_frekvencije 0];
  for i = 1 : length(str)
    znak = str(i);
    apsolutne_frekvencije(strfind(suglasnici, znak)) = apsolutne_frekvencije(strfind(suglasnici, znak)) + 1;
  relativne_frekvencije = apsolutne_frekvencije / length(str);
  ret = 0;
  for relativna_frekvencija = relativne_frekvencije
    if relativna_frekvencija > 0
      ret = ret - log2(relativna_frekvencija) * relativna_frekvencija;
function ret = samarzijina_entropija(str)
  broj_pokusaja = 10000;
  broj_pogodaka = 0;
  for i = 1 : broj_pokusaja
    prvi = int32(floor(rand() * length(str) + 1));
    drugi = int32(floor(rand() * length(str) + 1));
    if str(prvi) == str(drugi)
      broj_pogodaka = broj_pogodaka + 1;
  omjer_pogodaka = broj_pogodaka / broj_pokusaja;
  ret = -log2(omjer_pogodaka);
Here is what it outputs:
The output of the MatLab program above.
UPDATE on 14/04/2021: I have found out why the Octave program and the MatLab program give wildly different results. Namely, there was a syntax error in my program. MatLab refused to parse it, but Octave was apparently doing automatic semicolon insertion (like JavaScript engines are doing). That resulted in incorrect test strings in testni_stringovi. This is what the test strings look like in MatLab when exported to a CSV file, and this is what they look like when exported from Octave.
Anyway, since we know the number of possible consonant pairs in Croatian is 26*26=676, the maximal possible entropy a consonant pair in the Croatian language could have is log2(676)=9.4 bits/symbol. And we have measured the Shannon's entropy to be log2(229)=7.839. So, assuming the curve representing the relationship between the Samaržija's entropy and the Shannon's entropy does not change its shape between individual consonants and consonants pairs, but only scales uniformly (which I have no idea how to test), we can estimate the Samaržija's entropy of the consonant pairs in the Croatian language the following way. We can assume the entropy of the pairs of consonants is log2(676)/log2(21)=2.14 times bigger than the corresponding entropy of individual consonants. The ratio between the measured Shannon's entropy and the maximal possible entropy in this case is 7.839/9.4=0.834. Thus, the corresponding point on the curve on the above diagram is when the Shannon's entropy is equal to 0.834*log2(21)=3.663 bits/symbol. The Samaržija's entropy at that point, as can be read from the diagram, is around 2.8 bits/symbol. Thus, we can expect the Samaržija's entropy of the pairs of consonants in Croatian to be around 2.14*2.8=5.992 bits/symbol. Thus, the probability of two random words beginning with the same pair of consonants should be around 1/(2^5.992)=1/63.65=1.57%. If that is true, then the p-value of that pattern of the Croatian river names starting with *karr~kurr is only 5.9% (the highest estimate I got by running the birthday-paradox-calculation written in C a few times), rather than around 1/500. Well, I guess it is always like that in social sciences: If you think you have a good p-value, you are probably calculating something incorrectly.
Of course, whether that is a correct estimate for the p-value depends on where the entropy of the language goes. If it is mostly syntax and morphology that decreases the entropy of the language, then those decreases in entropy do not matter in toponyms borrowed from an ancient language. Only if those decreases in entropy come from the phonology, they do matter. See the paper I linked below for a lengthy discussion about that, including my attempts to estimate which parts of the grammar are responsible for how much decrease of entropy.

UPDATE on 29/04/2021: You can see the draft of the next paper about linguistics I am planning to publish.

UPDATE on 14/09/2021: I have written a paper explaining what I think about the name Karašica, summarizing many of the things explained in the paper linked above.

UPDATE on 06/10/2021: I asked a professional historical linguist, Dubravka Ivšić, what she thinks about my text about the river name Karašica via e-mail and posted her answer here, because, like I have said, I am not a conspiracy theorist who wants people not to hear both sides of the story: Poštovani Teo,
hvala Vam na Vašem mailu i interesu za predslavensku toponimiju.
Sinkronijski gledano, ime Karašica je slavensko, s obzirom na to da je izvedeno slavenskim sufiksom -ica. Pitanje je odakle je osnova (karas- ili karaš-), no to ne mijenja prvu činjenicu (isto kao što je npr. Jurica ime izvedeno hrvatskim sufiksom od osnove grčkoga podrijetla, pa ga to čini hrvatskim imenom). Koliko sam upoznata, hidronim Karašica prvi put je zabilježen tek u 17. st., na mađarskom se zove Karassó. Želite li doista poštivati znanstvenu metodologiju, trebalo bi prikupiti povijesne potvrde hidronima Karašica (iz pisanih izvora i sa starih karata) te utvrditi koji je najstariji oblik. S obzirom na to da dunavska Karašica teče i kroz Mađarsku, za nju u obzir dolazi i da je u mađarski ime posuđeno iz hrvatskog i obrnuto, iz hrvatskog u mađarski. Također, osnova karaš- plodna je i drugim toponimima (i izvan Hrvatske), pa bi trebalo utvrditi i jesu li svi oni povezani, tj. je li riječ o istoj onomastičkoj osnovi.
Formalno gledajući, nema prepreka da bi hidronim Karašica bio izveden od naziva ribe karas ili karaš (taj naziv se ne odnosi samo na zlatnu ribicu), a dublje podrijetlo naziva ribe u ovom slučaju nije relevantno za hidronim (slično kao što je i Krapina najvjerojatnije izvedeno od naziva ribe krap).
Što se tiče ostalih navedenih rijeka koje u svojim imenima sadrže k-r: Krka bi doista moglo biti predslavensko ime, Korana je nesigurnoga podrijetla, Krbavica je izvedeno od Krbava, a Kravarščica je izvedeno od Kravarsko (što je izvedeno od kravar).
Indoeuropski korijen koji spominjete rekonstruira se kao *k(')ers- sa značenjem 'trčati', a postoje mišljenja da se od njega u germanskim jezicima razvila riječ za konja. Indoeuropska riječ za konja rekonstruira se kao *h1ek'u-. Argument koji počinjete s „mnogi ilirski natpisi počinju s“ potpuno je promašen, s obzirom na to da ne postoje natpisi pisani „ilirskim jezikom“.
Matematičke metode u lingvistici mogu biti korisne u nekim slučajevima, no one ne mogu zamijeniti klasične lingvističke metode. U povijesnoj toponimiji nema prečaca.
Srdačan pozdrav,
Dubravka Ivšić Majić
Anyway, what do you think, who is really being more scientific here? Is it me, who has attempted to measure collision entropy of different parts of the Croatian grammar and has done numerical calculations showing the probability of that k-r pattern occurring by chance is somewhere between 1/300 and 1/17? Or is it her, who makes arguments from silence (that the name Karašica is unlikely to date back to antiquity because of its late first known attestation in the 17th century; that is also historically inaccurate, the name Karašica is first mentioned in a document from the year 1228 together with a dubious piece of information that it used to be called Mogioros in antiquity; Even if it were true, it would be much like saying Marco Polo has not really been to China because he did not mention the Great Wall or tea), does some intricate theoretical reasoning overshadowing my experimental results (like the contemporary response to the Ignaz Semmelweis experiment showing that puerperal fever was caused almost exclusively by uncleanliness), and asserts that traditional methods are superior to mathematical methods?

UPDATE on 16/09/2021: The Etruscan letters are apparently flipped left-to-right on Android, I have started a Reddit thread about that.