This is my attempt to implement the Huffman's data compression algorithm in
my programming language, AEC. A few
years ago, I implemented it in JavaScript, you can see it
here.
This only works in very modern browsers. I'm not too interested in
targeting archaic browsers here. This only works in Firefox 62 and newer or
Chrome 69 and newer (those with support for
WebAssembly.Global). After a few years (if not months), all
JavaScript environments will become as capable as they are, rather than like
Internet Explorer.
AEC program output:
You can see the source code of the AEC program
here.
UPDATE on 13/03/2025: For a reason that escapes me, the Huffman Coding in
AEC and the Huffman Coding in JavaScript don't always output the same
result. For the input TEO SAMARZIJA, they do, but, for example,
for the input HENDRIK WADE BODE, they don't. The Huffman Coding
in AEC outputs:
I have warned about that problem
on Discord. The only potentially relevant difference between the way I implemented
the Huffman's Algorithm in AEC and the way I implemented it in JavaScript is
this: In AEC, I was trying to save a bit of memory by deleting the tree
nodes that have already been used (that is, the two tree nodes with the
lowest frequency in the array) from the array, whereas, in JavaScript, I put
a boolean in the tree node structure indicating whether the node has already
been used in the tree construction. But that shouldn't affect the final
results, should it?
UPDATE on 18/03/2025: Here is what my Algorithms and Data Structures
professor Alfonzo Baumgartner has to say about this problem:
Poštovani,
najčešći problemi kod Huffmanovog algoritma su u određivanju najmanjeg
čvora u nizu. Naime, može se desiti situacija da više podataka može imati
vrijednost najmanjeg, pa je pitanje kojeg odabrati. Optimalan algoritam
traženja najmanjeg će uvijek odabrati prvu pojavu najmanjeg čvora. Znači
ja bih najprije pogledao da li se u oba slučaja radi na isti način. Primjer
za niz 5,1,3,7,2,1,4, bila bi odabrana jedinica na 2. mjestu, a ne ona
jedinica koja je na predzadnjem mjestu!
Druga stvar koja može dovesti do više verzija stabla je kada dva najmanja
odabrana čvora imaju jednaku vrijednost. Kako mi imamo pravilo da zbog
jednoznačnosti stavljamo da je lijevi manji od desnog, u ovom slučaju to
pravilo nije potpuno jasno, jer su čvorovi jednake vrijednosti. Može se
desiti da Vaši algoritmi u takvom slučaju 'obrnuto' uzmu i naravno da su
onda konačni kodovi za svaki znak različiti. Ako je u ovome Vaš problem,
ja bih se držao sličnog prinicpa kao u gornjoj situaciji: uzeti za
lijevog, onog podatka kojeg sam prvo susreo u nizu. Opet primjer, ako su
težine 5,1,3,7,2,1,4 i naravno dva najmanja su dvije jedinice, ja bih uzeo
za lijevog nasljednika jedinicu koja se prvo pojavila (na 2. mjestu), a
kao desnog ovu drugu jedinicu koja se pojavila na predzadnjem mjestu.
Nadam se da Vam je ovo može pomoći, ako ni u tome nije problem, javite mi
pa bih mogao možda malo detaljnije pogledati i Vaše programske kodove.
Alfonzo.
Overall, I've diagnozed that problem more precisely by now: I think the
problem is that, in JavaScript, I "excise" both the minimum
and the second minimum from the array the Huffman Algorithm looks through
and push their sum onto the end of the array, whereas, in AEC, I replace the
first minimum with the sum of the two minimums. However, I've tried telling
the AEC program to excise the first minimum from the array and push the sum
of the two minimums at the end of the array,
but that didn't work. I've left the code to do that as dead code in the
AEC program, maybe I will figure it out later.
It looks as though it does work, that I just screwed something up while
testing.