AEC Specification
Contents
Here is a quick-and-dirty logo I made using GIMP and LibreOffice Calc: I hope it is good enough, I am not an artist...
Introduction
Compilers these days, even C compilers, have lots of features and are often smarter than the programmer when it comes to things they are made to do. While this is usually very useful, sometimes it's counter-productive. Suppose that you are writing a program in Assembly and want to do something high-level (because correctness is way more important than speed). You can't tell your C compiler to simply output the code to assign sqrt(a*a+b*b) to c, if you try to, it will complain those variables aren't declared and that you aren't in a C function. C compilers have ideas how to declare functions and variables in assembly. While these ideas usually work, sometimes you are writing something where the assembler will complain if you give it the code that C compilers produced for those things, and there is no way to modify the code the C compiler outputs for declaring a function in C. So, sometimes, compilers, while they could come useful for some task, are trying to do too much and are thus counter-productive. In my opinion, this is especially true for the mainstream compilers targetting WebAssembly, but more on that later. Also, compilers are buggy, and nearly all compiler bugs are in the optimizer. Can we have a language with a compiler that does only what you told it, in a very predictable way? Well, that's what inspired me to create Arithmetic Expression Compiler (AEC) a few years ago. The first programming language I learned was Microsoft Small Basic, I have made a Labyrinth game in it, and it has influenced a lot the decisions I made when designing my programming language. Microsoft Small Basic is a simplified programming language that compiles to .NET bytecode (the same one that C# compiles to), and it is the only language that compiles to .NET bytecode that I have managed to learn so far.
I have also always been interested in languages, both natural and artificial (such as programming languages), and making a programming language will certainly give me a lot better insight into how programming languages really work. That is not quite true for natural languages: if you try to make your own constructed language to be spoken by humans (similar to Esperanto), and you do not know about consecutio temporum, you will most likely not specify how consecutio temporum is supposed to work in your language and you will probably not even realize your grammar has a huge hole in its specification (You will assume it is understood by itself it will be done the same way as in your native language, thinking, like I used to think before learning about consecutio temporum in English, that the way tenses are put together in complex sentences in your native language is based on logic, rather than that the rules for that are essentially-arbitrary. And in case you think all languages with tenses follow either the Croatian-like consecutio temporum or the Latin-like consecutio temporum, like I used to think until recently, read this Reddit answer to my question about accusative with infinitive and consecutio temporum in English. If you use accusative with infinitive in English, you should use Croatian-like consecutio temporum, even though, if you use indirect speech with object clauses, you should use Latin-like consecutio temporum. And you don't even need complex sentences to show that tenses work very differently in different languages, even related ones. In English, an adverbial phrase such as "every evening" puts the verb in a simple tense. You say "He comes here every evening.", and "He is coming here every evening." is ungrammatical, or is perhaps grammatical for the meaning that the speaker is annoyed by him coming. In Croatian, exactly the opposite is true: an adverbial phrase such as "svaku večer", meaning "every evening", puts the sentence in a continuous tense. You say "On dolazi ovdje svaku večer.", and "On dođe ovdje svaku večer." sounds very ungrammatical, or is perhaps grammatical for the meaning that it is going to happen in the future even though it is not happening yet. I have asked a StackExchange question about how it is in Latin, another natural language I know somewhat. You will probably not realize that such things differ between languages by making your own language similar to Esperanto.). But it is true that making a programming language gives you a special insight into how programming languages, and computers in general, work. Especially if you are writing everything yourself (like I am doing), rather than using frameworks for tokenizing, parsing and compiling. To be clear, I am not saying making your programming language gives you some special insight into how languages in general work. Although in papers I have written I have often made comparisons between human languages and programming languages, I am not entirely convinced the similarity between human languages and programming languages is anything more than superficial (Look up "pseudo-coordination" - it is a grammatical construction found in many languages, including English, which can hardly be explained by structural or generative theories of the syntax. Or, probably a less extreme example, "Donkey Sentences".).
I also think that having some experience writing compilers can help you write other types of programs more effectively. If I didn't know as much about compiler theory as I know, I think I would have much more trouble writing the web-based PicoBlaze assembler and emulator, if I would even succeed at it at all. My PicoBlaze Simulator solves a real-world issue: PicoBlaze is a small computer produced by Xilinx that we are using as an example of a simple computer at our Computer Architecture classes, and my Computer Architecture professor Ivan Aleksi asked me to create it so that laboratory exercises can be done from home (in case real laboratory exercises need to be canceled due to the pandemic).
What platforms can be targeted now
Right now, I've written two compilers for the AEC language. First, I wrote one targeting x86 processors (AMD and Intel). That one is written in JavaScript and the core of it can be run in browsers that have basic support for JavaScript (even in Internet Explorer 6). To use all the features, one needs to use NodeJS or Duktape with it, to enable it to access the file system. Thanks to the help I received from people on VOGONS forum, my AEC-to-x86 compiler outputs assembly code that runs on both i486 in 32-bit mode and on x86_64 processors in 64-bit mode, with no modification (it uses ebx register for indexing arrays, which is allowed (although not recommended) in 64-bit mode, and it does not push and pop 32-bit values from the stack, it only pushes and pops 16-bit ones, which is allowed in both 32-bit and 64-bit mode, and so on..). When I started studying at the university, many professors were impressed by my AEC-to-x86 compiler. My Algorithms and Data Structures professor Alfonso Baumgartner urged me to write a paper about it which got published in Osječki Matematički List. The compiler targeting x86 is around 2'000 lines of code (excluding the example programs). So, I decided to extend it so that my language can be used to target JavaScript Virtual Machine using WebAssembly (the JavaScript bytecode, which Mozilla has been pushing to get standardized, so that people can run programming languages better than JavaScript in a browser), and not only x86. As targeting WebAssembly is easier than targeting x86 (or probably any physical processor, as WebAssembly was designed to be an easy target for compilers, rather than to be easily implemented in hardware or easy to write assembly-language code for manually), I was able to add many new features. However, I think it's still not nearly as intrusive as C compilers are. Emscripten (the primary C and C++ compiler for WebAssembly, a modified version of the CLANG compiler) always assumes the standard C library is present on the JavaScript Virtual Machine when compiling any kind of program, so it's an overkill for most cases when it could come useful. The AEC-to-WebAssembly compiler has around 5'500 lines of code, and it's written in C++ (a language much more suitable for writing compilers than JavaScript).
WebAssembly is one of the reasons I am a libertarian, because it shows that, when a private company makes a mistake, no matter how hopeless the situation seems, there will come a solution... from capitalism itself. Making JavaScript, which is widely agreed to be a very poorly-designed programming language, a standard language of the Internet, which is what Netscape did back when it had a near-monopoly on the Internet browsers... for a long time, it seemed like a way to retard the development of the Internet forever. Fortunately, once the Internet got used more, somebody came up with this brilliant idea of WebAssembly, which seems to solve basically all the problems created by Netscape with that wrong decision. And, incidentally, that solution also significantly lowers the barrier towards making a new programming language, so that many more people can experiment with those things. When governments make a mistake, quite often, there is no solution. When the UN decided back in 1948 that the solution to the Holocaust is to make Palestinians pay for the Hitler's crimes with their land... it led to wars which continue to this day, and will likely continue all until a nuclear holocaust destroys most life on Earth (as has almost happened a few times by now). A private company most likely cannot make a mistake with such horrible consequences.
UPDATE on 11/10/2020: Of course, my compiler is not a very high-quality software. The LGTM static analyzer places it in the category, because it has found potential bugs per 5'500 lines of code, most of them being unnecessarily doing deep copies of C++ objects (wasting time and memory), and quite a few of them being using potentially uninitialized variables in JavaScript. If you want to collaborate with me, perhaps one of the first things to do is fix those bugs found by static analysis.
UPDATE on 28/08/2021: As a part of a paper I have written in my economics class, I have explained why I chose to target WebAssembly and why I think WebAssembly will revolutionalize compilers:
UPDATE on 17/04/2022: I have started a Reddit thread asking why there are a few C++ compilers targetting WebAssembly (JavaScript bytecode), but no C++ compilers targetting Java Bytecode.
How to use the compilers
Probably the simplest way to use the AEC-to-x86 compiler on a Linux machine is to type the following code into a terminal emulator:
By the way, in case you are interested, here is how zero9178 (a friend I met on Discord, who maintains his own GCC port to Windows) explains the fact that putting -lm in front of source files causes an error on some (but, curiously, not all) versions of Linux:
A friend I met on Discord called zero9178 helped me write the CMAKE script for building and testing the AEC-to-WebAssembly compiler, so that you can easily use any IDE that works well with CMAKE (Visual Studio, QtCreator, NetBeans and CLion can import CMAKE projects automatically, and CMAKE can be made to output configuration files necessary for Eclipse). For now, however, Visual Studio 2019 falsely claims tests which invoke WABT fail (that WABT executables are not proper Windows executables, although they can be run from command-line as well as from CMAKE run from command line). I do not know why. (UPDATE on 26/08/2021: As of time of writing this, the compiler crashes if compiled with Visual Studio, due to a Stack Overflow error.) The automated tests are integrated with GitHub Actions and GitLab CI, and they seem to work properly there. The structureDeclarationTestCompiles seems to run around an order of magnitude faster (that is, 10 times faster) on GitHub Actions than on the laptop I am working on, and around five times faster than on GitLab CI (that is, GitLab CI seems to be around 2 times faster than my laptop). I am not sure why, as I expected my compiler to run very poorly in computer clouds, because computer clouds are, as far as I understand it, made of countless low-powered computers which can be well-used only by programs that support parallel execution, which my compiler does not. I asked a question on Quora about that. Nevertheless, I think the hypothesis that it does not actually run there (and that is the reason why it seems to run so quickly) can be eliminated, as subsequent tests would fail (WABT invoked in structureDeclarationTestAssembles would exit with an error message, and so would NodeJS invoked by structureDeclarationTestRuns) if it were the case. The tokenizer of my compiler runs very slowly, and I do not know how to make it faster. I have made a forum thread about it.
UPDATE on 28/05/2021: As a friend I met on Discord called elucent (the author of the Basil programming language) suggested, the tokenizer can be made a lot faster by using std::remove_if to erase all whitespace at once rather than by calling std::vector<typename T>::erase for each all-whitespace token (as zero9178 had suggested me to do). I implemented that, and now the test structureDeclarationTestCompiles takes only 2 seconds to run on GitLab CI, whereas it previously took 6 seconds (so it is around 3 times faster). He also suggested some other ways to make both the tokenizer and the parser faster, but those are harder to implement (UPDATE on 26/08/2021: A user of the Atheist Forums called HappySkeptic found a way to significantly speed up the parser in a relatively easy way.).
UPDATE on 06/06/2021: The AEC-to-WebAssembly compiler can now target WebAssembly System Interface (WASI), as the example Hello World from WASI shows. Basically, you need to put #target WASI before any declarations. Unlike with AEC-to-x86 syntax gas directive, comments can go before that. I believe this a significant step forward on a way for the WebAssembly dialect of AEC to run on rarely used operating systems such as FreeDOS, using portable WASI environments such as Wasm3. Of course, that is assuming we also manage to compile wat2wasm from WebAssembly Binary Toolkit to run there (which will not be easy because CMAKE does not run there). Also, some blockchains support WebAssembly Binary Interface, so perhaps my compiler can now be useful there.
UPDATE on 09/01/2023: Here is how my AEC-to-WebAssembly compiler can be compiled using latest Emscripten:
UPDATE on 05/05/2023: Here is how you can cross-compile my AEC-to-WebAssembly compiler from Windows to 64-bit Linux using Docker:
Comments
In the version of AEC targeting x86, the comments start with ; and end with a newline character, as in FlatAssembler dialect of Assembly (which ArithmeticExpressionCompiler primarily targets), and there are no multi-line comments. In the version of AEC for WebAssembly, the comments are as in C, C++ and JavaScript, single-line comments start with //, and multi-line comments start with /* and end with */. Multi-line comments do not nest (as they do in, for example, Swift). Many people say multi-line comments are a bad thing because bad programmers use them for versioning code (which is a very bad practice). I don't think the job of the compiler is to enforce some particular programming style and refuse to compile code-smelling programs (though warnings are often useful).
Constants
In AEC for x86, a token that consists of numbers and at most one point is a number, and all numbers are treated as 32-bit decimal numbers. In both dialects of AEC, a string is a token which starts and ends with ", and strings are passed unchanged to the assembler. String tokens next to each other are concatenated by the tokenizer into one string (as in C and C++, in contrast with JavaScript). In both dialects of AEC, a token consisting of three characters of which both the first one and the last one are ' is a number, and the tokenizer replaces it with a number equal to the ASCII code of the second character in that token (like in most dialects of x86 assembly). In AEC for WebAssembly, a token which matches the regular expression "(^\\d+$)|(^0x(\\d|[a-f]|[A-F])+$)" is of type Integer64 and is passed unchanged to the assembler (notice that this includes hexadecimal numbers starting with 0x, as in C, C++ and JavaScript). A token which matches the regular expression "^\\d+\\.\\d*$" is of the type Decimal64 and is also passed unchanged to the assembler (or, in case it is to be assigned as an initial value of a global variable, it is converted to IEEE754 hexadecimal). In AEC for x86, a token which matches the regular expression "^\\d+\\.\\d*$" is, just like all numbers, of type Decimal32 and is always converted to IEEE754 hexadecimal (unless it's run in a JavaScript environment that doesn't support ArrayBuffer, in that case, only FlatAssembler can be targetted, as FlatAssembler can convert decimal numbers to IEEE754 hexadecimals itself). Notice that in AEC for WebAssembly, 3/2=1 (as in C, C++, Java, C#, Rust and Python 2.x), while, in AEC for x86, 3/2=1.5 (as in JavaScript, PHP, LISP and Python 3.x). It's hard to tell which approach is better, both can produce hard-to-find bugs. The Pascal-like approach of using different operators for integer division and decimal division probably makes the most sense (Pascal uses / for floating-point division and div for integer division. Similarly, Dart uses / for floating-point division and ~/ for integer division), but it will also undeniably feel alien to most programmers. I have started a Reddit thread about that.
UPDATE on 17/09/2021: The AEC-to-WebAssembly compiler version v2.1.0 (not released as of the time of writing this, but it can be built from source) supports multi-line strings, the same way C++ does. An excerpt from Multi-line String Test:
UPDATE on 23/09/2021: Until the version v2.1.2 (not released as of the time of writing this, but it can be built from source), there was a bug in the tokenizer preventing multi-line strings consisting exclusively of the \ characters from being tokenized correctly.
UPDATE on 22/01/2022: AECforWebAssembly v2.3.0 and newer will support inserting _-es (underscores) inside number literals for better legibility, like JavaScript supports. Here is an excerpt from Birthday Paradox that illustrates that:
Variable declarations
In the version of AEC targeting x86, there are no variable declarations in the language itself, the compiler simply assumes any token that matches the regular expression "^(_|[A-Z]|[a-z])\\w*\\[?$" and is not a keyword is a name of a variable of type 32-bit decimal number or 32-bit decimal number array (if it ends with [) that's been previously declared in assembly. In the version of AEC targeting WebAssembly, variables are declared with:
TODO: Decide what to do about aligning the variables in memory (making sure that, for example, a Integer32 is on the memory location divisible by 4). Aligning variables wastes memory, sometimes around half of the allocated memory ends up unused because of the aligning. On the other hand, for the interoperability with other languages, it is probably desirable for variables and arrays to be aligned. JavaScript throws an exception on attempted unaligned access, while in C and C++, it is supported by some compilers and optimization levels but not in others (it's undefined behavior). Right now, the AEC compiler doesn't make sure the variables are aligned, which I am not sure is the best approach. Also, while JavaScript Virtual Machine does allow unaligned memory access, it's not guaranteed to be nearly as fast as aligned access (on x86, it usually is, on ARM, it's many times slower).
Arrays and Pointers
In the x86 dialect of AEC, you don't declare arrays in AEC, the compiler assumes any arrays you mention in AEC are declared in inline assembly (just like it does for variables). In the x86 dialect of AEC, you can reference an array called fib either as fib(0) (as in BASIC) or as fib[0] (as in C). In the WebAssembly dialect of AEC, you can only do that as fib[0].
In the WebAssembly dialect of AEC, arrays are declared as follows:
TODO: Implement the multi-dimensional arrays. But let's not use the JavaScript-like approach to them, JavaScript really sucks in that regard. For now, onlylocal (not yet global, the compiler
crashes
if you try to do that)
(UPDATE: AECforWebAssembly
v2.4.0 supports both global and local
two-dimensional arrays of characters) two-dimensional arrays of characters are supported, like this
(excerpt from Roman Numerals):
Assignments
For assignments, you use := operator, in both dialects of AEC. In the AEC for WebAssembly, you can nest assignment expressions, like this (excerpt from HybridSort):
TODO:Implement the assignment operators
+=,
-= and related ones as they are in C,
C++ and JavaScript. They can make code significantly shorter.
(UPDATE on 14/09/2020: They have been implemented.)(UPDATE on 23/01/2022: Actually, they are implemented incorrectly,
see
this GitHub issue
for more information.)(UPDATE on 24/01/2022: I believe I have
fixed that issue.)
UPDATE on 15/07/2021: In most programming languages these days, the assignment operator is =, whereas the equality-testing operator is ==. The rationale for that, dating back to the C programming language, is that most programs more commonly do variable assignments than they do testing for equality between variables, so that it makes sense to use a shorter operator for a more common operation that is assignment. While that may have made sense back then, I don't think that makes any sense with modern computers. Beginners in programming are in particular often confused by = meaning assignment, rather than something they are used to from mathematics. Also, I believe everybody who has programmed in a C-like language can agree that thing about equality operator being two assignment operators often leads to bugs. Namely, one of the most common errors in C-like languages is that the programmer mistypes if (variable==0) as if (variable=0), leading to the value in variable being lost and the block of code after that if never being executed. C# attempts to solve that problem by disallowing implicitly converting to the boolean type in the if-conditions, which is an interesting solution, but it often makes the code longer, as well as not addressing the fact that the same problem can happen with boolean variables (UPDATE on 05/06/2024: One hard-to-trace bug in my AEC-to-WebAssembly compiler, causing my compiler to lose track of the sizes of the structures and cause an internal compiler error, was caused by mismatched parentheses, which C++ compiler didn't complain about, but instead it implicitly casted an integer into boolean. Had I written my compiler in Java or C#, rather than in C++, something like that couldn't have happened to me.). So, I think that making := the assignment operator and = the equality-testing operator is a good choice. The AEC-to-WebAssembly compiler allows whitespace between : and = in the assignment operator :=, so that, when ClangFormat mistakes : for the label-ending sign and puts a whitespace after it, the code does not lose its meaning. I am not sure now whether that was a good choice. I have created a Reddit thread about it, as well as a StackExchange thread about how to explain to ClangFormat that := is the assignment operator.
Operators
AEC has the following operators:
(UPDATE on 07/10/2020: The operators
. and
-> have the same meaning they do
in C, dealing with structures and structure pointers.)
The first argument of the ternary conditional operator is converted to
Integer32 (that's how WebAssembly
represents Booleans), and the last two are converted to strongest type
of those two (UPDATE: In AECforWebAssembly v2.6.0 and newer, in case one of the
last two arguments of the ternary conditional operator is a structure,
but the other is either a structure of a different type, or not a
structure at all, the semantic analyzer reports an error. In previous
versions of AECforWebAssembly, that caused the compiler to crash. See
this GitHub issue
for more information.). The strongest types are pointers, next comes
Decimal64, then
Decimal32, then
Integer64, then
Integer32, then comes
Integer16, while the weakest type is
Character. Most binary operands convert
both of their operands to the stronger type, except for the assignment
operator, the and and
or operator. The assignment operator
converts the right-hand side operand to the type of the left-hand-side
operand. In case one of them is a pointer and the other one isn't, the
compiler issues a warning. The bitwise
and and
or convert both operands to
Integer32. There are no logical
and and
or operators in AEC (like the C and C++
and and
or or
&& and
||). This can lead to some confusing
behavior. For instance, 1 and 2 would
be 1 in C and C++, but it's
0 in AEC. To convert a number to
Boolean, you can use the built-in function
not(x), like this (from HybridSort):
UPDATE on 30/11/2023: With the new versions of AEC (supporting easy-to-use inline assembly), there is an even simpler way to implement the xor-operation in AEC:
Unfortunately, the implementation is incorrect. All it does is to rewrite a < b < c to a < b and b < c, which will, of course, evaluate b twice. That is incorrect whenever b has side-effects. You can see how it's implemented on GitHub. (UPDATE on 29/04/2024: I've asked a StackExchange question about that.)
That problem might seem similar to the problem which occurs when compiling += and similar operators, when incrementing an element of an array with the index being a function with side-effects, but that's actually not the same problem. There doesn't seem to be such a simple solution (of using a local temporary variable) to the problem with chained comparisons.
UPDATE on 03/05/2024: Here is an example program that illustrates the problem with chained comparisons as they are currently implemented in AEC:TODO: Implement a less-than-or-equal-to operator
<= and a greater-than-or-equal-to
operator >=. For that, we will
need to modify both the tokenizer, the parser and the compiler.
(UPDATE on 29/04/2024: They have been implemented and they will be
available in the release v2.9.0.)
UPDATE on 22/09/2024: I've received some comments on Internet forums telling me that the scanning backwards method of parsing right-associative operators (as I was doing both in my PicoBlaze assembler and in my AEC-to-WebAssembly compiler, but not in my AEC-to-x86 compiler) is considered to be an anti-pattern. So, I've opened a StackExchange question about that, which received many upvotes.
UPDATE on 12/01/2025: Why do many programming languages (C, JavaScript...) use the symbol || (two vertical parallel lines) to mean "or"? My guess is that the etymology of that is the fact that two switches connected in parallel form a primitive or-gate, similar to how two switches connected in a serie form a primitive and-gate. I've opened a question about it on StackExchange.
Branching
Branching is supported only via If, If-Else, and If-ElseIf-Else. There is no equivalent of C, C++ and JavaScript switch-case. After the If token, the compiler expects a condition. In AEC for WebAssembly, the condition ends with a Then, while, in AEC for x86, the condition ends with a newline character. The branching ends with EndIf. That's to make the program easier to parse, and to prevent dangling-else. Between Then and EndIf, there can be an Else token. Before the Else token (if there is one, otherwise it's before the EndIf) there can be an ElseIf token. An ElseIf token, much like the If token, is followed by an expression representing a condition, which is ended by the Then token. After the Then token and after the Else token, you can put an { (curly brace) which the compiler will ignore. Similarly, you can put an } before EndIf, ElseIf and Else. That is to make it easier to use text editors made primarily for C like languages for writing AEC (to jump to the end of the code block, for example). An example from the Analog Clock in AEC:
Simple branching can also be done using the ?: operator. As it is right-associative, it can be used to concisely write ElseIf statements. An example of that from Arithmetic Operators Test (one of the first programs I wrote in AEC for WebAssembly):
TODO: Implement something like switch-case, to make it easier to write long ElseIf branchings. Most of the time, they are a sign of bad code design, but sometimes they are not, and it's not acceptable for a language to discourage them.
UPDATE on 09/07/2021: A StackOverflow user called rici warned me that AEC-to-x86 compiler actually produces incorrect code for the ?: operator. Namely, the code it produces calculates the results of the second operand and the third operand before calculating the result of the first operand (the condition), which can, according to that rici, lead to unexpected divide-by-zero errors for expressions such as d = 0 ? 0 : n / d (if d is equal to zero). There does not seem to be a simple solution, given the way my AEC-to-x86 compiler is structured. Here is the code with which the ternary operator is implemented in the AEC-to-x86 compiler. The AEC-to-WebAssembly compiler is a lot more professionally made (I made the core of it when I was 20, while I made the core of the AEC-to-x86 compiler back when I was 18) and it does not suffer from that problem. It is also easier to implement the ternary operator when targetting WebAssembly than when targetting x86 because WebAssembly itself essentially has the ternary operator just with a different (LISP-like) syntax. Here is how I implemented the ternary conditional operator in the AEC-to-WebAssembly compiler.
So, as far as I can see, you can run into at least two problems when implementing the ?: operator in your language. PHP incorrectly parses it, my AEC-to-x86 compiler incorrectly compiles it... I've asked a question on Reddit and a question on StackExchange to see if there are some more (Apparently, there is, and I bumped into one of them related to structures that kaya3 warns about on StackExchange when making my AEC-to-WebAssembly compiler. What is exceptionally bad is that the error message is very misleading, and, right now, it escapes me what exactly is going on in the compiler. How can using two different structure types inside a ?: operator lead to the "Some part of the compiler tried to compile the array with a negative size." error? Well, it has been 3 years since I wrote the core of that compiler, so no wonder I don't remember the details of how it works. (UPDATE: I have diagnozed the problem. It is actually two different bugs, one in the compiler and one in the semantic analyzer. The problem is fixed in AECforWebAssembly v2.6.0 (not released as of time of writing this, but you have the source code on GitHub, GitLab and SourceForge). I have started a forum thread about absurd error messages on forum.hr, r/ProgrammingLanguages, Atheist Forums and r/CroIT.)).
UPDATE on 10/09/2021: I have decided to add the left-hand-side ternary conditional operator in assignments, which exists in C++, to my programming language (for now, and presumably forever, only in the AEC-to-WebAssembly compiler in the version v2.0.0 and newer, not in the no-longer-maintained AEC-to-x86 compiler). Sometimes, it can make the code shorter, and it looks very cool. An excerpt from Left Hand Side Conditional Test which illustrates it by implementing the original Euclid's Algorithm (which uses subtraction) using it:
UPDATE on 13/09/2021: As I have mentioned on Discord, the behaviour of the left-hand-side ternary conditional operator in AEC does not exactly match the behaviour of left-hand-side ternary conditional operator in C++, and I think the behaviour in AEC is better. In C++, the following program compiles:
Loops
For now, there is only one type of loop in AEC, the While-loop. A While token is followed by an expression representing a condition. That expression ends with a newline character in AEC for x86, and with a Loop token in AEC for WebAssembly. In both of them, the While-statement ends with an EndWhile token. For example, here is what the Euclid Algorithm looks like in AEC for WebAssembly (an excerpt from Euclid Test):
TODO: Think of some nice syntax for the for-loop, every remotely modern language has a for-loop. Also, implement something like C, C++ and JavaScript break and continue. Most of the time, they are a sign of bad code design, but sometimes they aren't, and it's not acceptable for a language to discourage them.
Functions
AEC for x86 supports no user-added functions. Furthermore, the parser implemented there doesn't support functions with more than 2 arguments. AEC for WebAssembly supports functions, because WebAssembly makes it easy to implement them (x86 assembly doesn't have a standardized way to call functions, the way it's done varies by operating system). Importing functions from JavaScript is done like this (excerpt from Analog Clock):AEC doesn't support function hoisting nor circularly dependent
functions, for the simple reason that there is no obvious way to
implement them in WebAssembly.
(UPDATE: C++-style forward function declarations have been
implemented, as there is a relatively simple way to do it using
WebAssembly Binary Toolkit, the assembler takes care of that for you.
The syntax is the same as for declaring external functions, except
that you replace External with
Declared.)
TODO: Find a way to implementcircularly dependent functions, and, if possible, function
hoisting. Also, implement function pointers. Function pointers will
allow for an easy-to-implement object-oriented programming (structures
containing function pointers, often used in the Linux kernel instead of
classes). Mozilla Developer Network recommends function pointers to be
implemented in compilers using
WebAssembly Tables.
Named function arguments (using, for example, the
:= operator) might also come useful
(in languages with them, there is basically no need for the abstract
builders and directors).
UPDATE on 26/08/2021: I have implemented named function arguments, as illustrated by the Named Arguments Test:
UPDATE on 31/10/2023: Ever since the first versions of AEC-to-WebAssembly, there was a bug in the parser preventing you from using multi-argument functions (such as pow) inside default arguments of functions. The version v2.7.0 will fix that issue. So, it will be possible to write this:
Structures
The parser can parse structure declarations of the form:However, the compiler crashes if you insert them in a real program
(rather than just in a parser test).
Structures are supposed to be instantiated using a directive called
InstantiateStructure, that is, an
InstantiateStructure token followed by
the structure name followed by the variable name (and the variable will
be of type represented by the structure name).
TODO:Implement structures in the compiler, not just in the parser.
(UPDATE on 30/09/2020: Some progress on that has been made, namely,
local and global structures are now supported
as long as they are not nested. You can see an example
program using them
here. Structures as
arguments or return types aren't supported for now, and are unlikely
to be so in the near future. I've started a
Reddit thread
asking for help with that.
UPDATE on 16/02/2021: My implementation of the N-Queens Puzzle in AEC uses structures. If you have trouble understanding the N-Queens Puzzle program, I'd suggest you to first look into the Permutations Test, as it is based on the same algorithm.
UPDATE on 28/04/2021: I have started a forum thread to help me with implementing structures as arguments to the functions.)
UPDATE on 03/06/2021: It is important to note that the assignment operator when used between structures works very differently in AEC and C++. I made it so because I found the C++ behaviour very confusing while making this compiler, as it appeared to corrupt the Abstract Syntax Tree while compiling. Excerpt from Structure Declaration Test that illustrates the AEC-specific behaviour (the spaghetti function returns 2):
UPDATE on 28/08/2021: Thereis was a weird bug in my
compiler causing internal compiler errors in case somebody tries to
instantiate an array of nested structures inside a function. I have
found it while writing the
Named Arguments Test. There is
a simple work-around for now for AECforWebAssembly versions
v1.6.2 and older: declare that array globally.
UPDATE on 05/05/2024: There seems to be a bug in the AEC-to-WebAssembly compiler which is triggered under certain conditions (apparently, only when compiled using Microsoft C++ Compiler) when comparing two structures using the = operator. You can read more about it on GitHub.
UPDATE on 15/12/2024: I've added the support for the TypeOf operator, similar to the one that exists in JavaScript. It returns the type of an expression as a string. Here is an example of how to use it:
Inline assembly
AEC for x86 passes everything between the AsmStart and AsmEnd token unchanged to the assembler. AEC for WebAssembly has keywords asm(, asm_i32(, asm_i64(, asm_f32( and asm_f64( which expect a compile-time constant string as an argument which the compiler will process (for example, replace \n with a literal new-line character and \\ with \...) and pass it to the assembler. asm( assumes that nothing will be left on the system stack by the inline assembly code, while other ones assume that a certain WebAssembly type (corresponding to some AEC type) will be left on the system stack by it, which they will then fetch. Since WebAssembly type i32 corresponds to AEC Integer32 and pointers, we can write something like this (excerpt from HybridSort):TODO: Implement a GCC-and-CLANG-style inline assembly, where
you can access variables from assembly by their names (rather than
having to manually calculate their memory addresses).
UPDATE on 15/02/2023: I have implemented that you can access variables from inline assembly by their names, by inserting the % character and then the name of a variable. So, no need to add additional arguments to asm in order to use variables from inline assembly, like you do in CLANG or GCC. I realize there are benefits to what CLANG and GCC are doing, but it's both more difficult to implement in the compiler and more difficult to use. So, the code above has been changed to this:
Built-in functions
AEC for x86 has many built-in functions, wrappers around x86 assembly instructions. These are: sin(, cos(, tan(, atan2(, arctan(, ctg(, arcctg(, sqrt(, arcsin(, arccos(, ln( (natural logarithm), log( (base-10 logarithm), exp( (returns the Euler number to the power of the argument), pow( (which, unfortunately, due to the limitations of mathematical functions built into x86 assembly, causes an error whenever the first argument is negative or zero, and there does not appear to be a simple solution), abs( and mod( (floating-point modulo). Unlike in most programming languages, trigonometric functions expect arguments in degrees (not radians, as in C, C++ or JavaScript). Similarly, the cyclometric functions such as atan2( return the result in degrees, rather than in radians. If you want to use them in AEC for WebAssembly, you can write them yourself, like this (excerpt from the Analog Clock):
TODO: Build some mathematical functions into AEC for WebAssembly.
UPDATE on 25/07/2021: If you just want to use the square root function in AEC for WebAssembly, you can simply invoke the f64.sqrt or f32.sqrt in inline assembly. It is not recommended to do it because it is supposedly slow (it calculates the square root to maximal possible precision, when usually you just need the first few decimal digits; it was specified in WebAssembly specification only to obey the IEEE754 standard which requires it), but, in the new version of Analog Clock program, I find it fast enough. Here is how the square root function is implemented in the Analog Clock program:
String manipulation
AEC for x86 doesn't support string manipulation at all, it doesn't have a character type. AEC for WebAssembly is about as good at C manipulation as C is without a C library (like when doing operating system development). There are no built-in string-manipulation functions, but one can easily write them oneself, like this (excerpt from Dragon Curve):
Advanced array manipulation
When it comes to sorting arrays, I've tried to efficiently implement the HybridSort (a sorting algorithm I came up with, a mixture of MergeSort, QuickSort and SelectionSort) algorithm in AEC. HybridSort sorting algorithm is based on the fact that the number of comparisons done by MergeSort depends only on the size of the array, and is always equal to
I am not sure what causes those stairs in the measurement results. Professor Alfonzo Baumgartner thinks it has to do with cache misses, here is what he wrote when I asked him about that in an e-mail:
Example program in both dialects of AEC
So, here is an example piece of code in the x86 dialect of AEC:
Conclusion
I think AEC is a promising project, but a lot of work is still needed to make it successful. I don't think I can do everything that's needed for it to be successful by myself. (UPDATE on 06/06/2021: It would, for example, be useful to make a web-based IDE for the AEC-to-WebAssembly compiler, so that somebody can try my programming language directly in the browser. I have opened a Quora question asking for advice about how to do that. We would need to get the AEC-to-WebAssembly compiler, which can already run in NodeJS if compiled with EMSCRIPTEN, to run in a browser, and, to be honest, I do not know enough WebAssembly to do that by myself. Actually, I think almost no web-developer these days has the knowledge needed to make that. We would also need to embed the wat2wasm from WebAssembly Binary Toolkit to run in that web-app, as my compiler relies on it to convert the WebAssembly assembly language it outputs to the bytecode that browsers understand. Somebody has already made wat2wasm run in modern browsers, but they, apparently, left no instructions how they managed to do that.).
UPDATE on 16/10/2020: I've published a YouTube video about programming in your programming languages for the client-side web. If you have trouble playing it, you can download the minified MP4 and try opening it in VLC or a similar program. If nothing else works, try opening the ZIP file with a PDF, an ODP and a PPT file.
UPDATE on 18/02/2022: In case you are interested, here is what Stuxxnet, the moderator of a Discord server about programming, has to say about my programming language:
you're trying to compare sort functions written in an interpreted language to those provided by that language in its native VM
you're overfitting measurement data to come up with a completely insane formula for the comparisons of quicksort
you're using rand in c++ code
you implement custom math functions using highschool maths that just doesn't work if you want to be numerically efficient or just even get a sane result for the whole range of possible inputs.
I'm sorry, but I really don't have much confidence in your judgement calls...
A StackExchange user called G. Sliepen also has a
lot of negative comments
about the way I structured my compiler.
I once learned that a Serbian company called RT-RK, that also has an office in Osijek, where I was living at the time, was searching for a compiler developer. So I sent them via e-mail the AEC-to-WebAssembly compiler on GitHub, to finally get an entry-level job, after I had been learning to program for 8 years. They did not even invite me for an interview. They responded me in an e-mail that they are searching for somebody who knows in details how GCC or LLVM, preferably both, work internally, and that my project does not show them that I know that.
UPDATE on 05/01/2023: A question I often get asked on Internet forums is, if I have made my programming language, why haven't I also made my own operating system? The answer is fairly simple: While I do have some ideas about what a good programming language would look like and work internally (as you can probably tell by reading the documentation of my programming language), I have no idea what a good operating system would work like. So, I haven't made my own operating system, and I probably never will.
UPDATE on 04/07/2023: I started a StackExchange question about why most programming languages use the same token for EndIf, EndWhile, EndFunction and EndStructure, and that question got many upvotes.
UPDATE on 24/05/2024: A question I often ask myself is what is the proper way of dealing with algorithms that involve tree manipulation in languages such as AEC. Well, here is how I solved that problem in Huffman Coding:
- Logo of my programmming language
- Introduction
- What platforms can be targeted now
- How to use the compilers
- Comments
- Constants
- Variable declarations
- Arrays and pointers
- Assignments
- Operators
- Branching
- Loops
- Functions
- Structures
- Inline assembly
- Built-in functions
- String manipulation
- Advanced array manipulation
- Example program in both dialects of AEC
- Conclusion
- Stuxxnet's comments
- Dealing with trees
Here is a quick-and-dirty logo I made using GIMP and LibreOffice Calc: I hope it is good enough, I am not an artist...
Introduction
Compilers these days, even C compilers, have lots of features and are often smarter than the programmer when it comes to things they are made to do. While this is usually very useful, sometimes it's counter-productive. Suppose that you are writing a program in Assembly and want to do something high-level (because correctness is way more important than speed). You can't tell your C compiler to simply output the code to assign sqrt(a*a+b*b) to c, if you try to, it will complain those variables aren't declared and that you aren't in a C function. C compilers have ideas how to declare functions and variables in assembly. While these ideas usually work, sometimes you are writing something where the assembler will complain if you give it the code that C compilers produced for those things, and there is no way to modify the code the C compiler outputs for declaring a function in C. So, sometimes, compilers, while they could come useful for some task, are trying to do too much and are thus counter-productive. In my opinion, this is especially true for the mainstream compilers targetting WebAssembly, but more on that later. Also, compilers are buggy, and nearly all compiler bugs are in the optimizer. Can we have a language with a compiler that does only what you told it, in a very predictable way? Well, that's what inspired me to create Arithmetic Expression Compiler (AEC) a few years ago. The first programming language I learned was Microsoft Small Basic, I have made a Labyrinth game in it, and it has influenced a lot the decisions I made when designing my programming language. Microsoft Small Basic is a simplified programming language that compiles to .NET bytecode (the same one that C# compiles to), and it is the only language that compiles to .NET bytecode that I have managed to learn so far.
I have also always been interested in languages, both natural and artificial (such as programming languages), and making a programming language will certainly give me a lot better insight into how programming languages really work. That is not quite true for natural languages: if you try to make your own constructed language to be spoken by humans (similar to Esperanto), and you do not know about consecutio temporum, you will most likely not specify how consecutio temporum is supposed to work in your language and you will probably not even realize your grammar has a huge hole in its specification (You will assume it is understood by itself it will be done the same way as in your native language, thinking, like I used to think before learning about consecutio temporum in English, that the way tenses are put together in complex sentences in your native language is based on logic, rather than that the rules for that are essentially-arbitrary. And in case you think all languages with tenses follow either the Croatian-like consecutio temporum or the Latin-like consecutio temporum, like I used to think until recently, read this Reddit answer to my question about accusative with infinitive and consecutio temporum in English. If you use accusative with infinitive in English, you should use Croatian-like consecutio temporum, even though, if you use indirect speech with object clauses, you should use Latin-like consecutio temporum. And you don't even need complex sentences to show that tenses work very differently in different languages, even related ones. In English, an adverbial phrase such as "every evening" puts the verb in a simple tense. You say "He comes here every evening.", and "He is coming here every evening." is ungrammatical, or is perhaps grammatical for the meaning that the speaker is annoyed by him coming. In Croatian, exactly the opposite is true: an adverbial phrase such as "svaku večer", meaning "every evening", puts the sentence in a continuous tense. You say "On dolazi ovdje svaku večer.", and "On dođe ovdje svaku večer." sounds very ungrammatical, or is perhaps grammatical for the meaning that it is going to happen in the future even though it is not happening yet. I have asked a StackExchange question about how it is in Latin, another natural language I know somewhat. You will probably not realize that such things differ between languages by making your own language similar to Esperanto.). But it is true that making a programming language gives you a special insight into how programming languages, and computers in general, work. Especially if you are writing everything yourself (like I am doing), rather than using frameworks for tokenizing, parsing and compiling. To be clear, I am not saying making your programming language gives you some special insight into how languages in general work. Although in papers I have written I have often made comparisons between human languages and programming languages, I am not entirely convinced the similarity between human languages and programming languages is anything more than superficial (Look up "pseudo-coordination" - it is a grammatical construction found in many languages, including English, which can hardly be explained by structural or generative theories of the syntax. Or, probably a less extreme example, "Donkey Sentences".).
I also think that having some experience writing compilers can help you write other types of programs more effectively. If I didn't know as much about compiler theory as I know, I think I would have much more trouble writing the web-based PicoBlaze assembler and emulator, if I would even succeed at it at all. My PicoBlaze Simulator solves a real-world issue: PicoBlaze is a small computer produced by Xilinx that we are using as an example of a simple computer at our Computer Architecture classes, and my Computer Architecture professor Ivan Aleksi asked me to create it so that laboratory exercises can be done from home (in case real laboratory exercises need to be canceled due to the pandemic).
What platforms can be targeted now
Right now, I've written two compilers for the AEC language. First, I wrote one targeting x86 processors (AMD and Intel). That one is written in JavaScript and the core of it can be run in browsers that have basic support for JavaScript (even in Internet Explorer 6). To use all the features, one needs to use NodeJS or Duktape with it, to enable it to access the file system. Thanks to the help I received from people on VOGONS forum, my AEC-to-x86 compiler outputs assembly code that runs on both i486 in 32-bit mode and on x86_64 processors in 64-bit mode, with no modification (it uses ebx register for indexing arrays, which is allowed (although not recommended) in 64-bit mode, and it does not push and pop 32-bit values from the stack, it only pushes and pops 16-bit ones, which is allowed in both 32-bit and 64-bit mode, and so on..). When I started studying at the university, many professors were impressed by my AEC-to-x86 compiler. My Algorithms and Data Structures professor Alfonso Baumgartner urged me to write a paper about it which got published in Osječki Matematički List. The compiler targeting x86 is around 2'000 lines of code (excluding the example programs). So, I decided to extend it so that my language can be used to target JavaScript Virtual Machine using WebAssembly (the JavaScript bytecode, which Mozilla has been pushing to get standardized, so that people can run programming languages better than JavaScript in a browser), and not only x86. As targeting WebAssembly is easier than targeting x86 (or probably any physical processor, as WebAssembly was designed to be an easy target for compilers, rather than to be easily implemented in hardware or easy to write assembly-language code for manually), I was able to add many new features. However, I think it's still not nearly as intrusive as C compilers are. Emscripten (the primary C and C++ compiler for WebAssembly, a modified version of the CLANG compiler) always assumes the standard C library is present on the JavaScript Virtual Machine when compiling any kind of program, so it's an overkill for most cases when it could come useful. The AEC-to-WebAssembly compiler has around 5'500 lines of code, and it's written in C++ (a language much more suitable for writing compilers than JavaScript).
WebAssembly is one of the reasons I am a libertarian, because it shows that, when a private company makes a mistake, no matter how hopeless the situation seems, there will come a solution... from capitalism itself. Making JavaScript, which is widely agreed to be a very poorly-designed programming language, a standard language of the Internet, which is what Netscape did back when it had a near-monopoly on the Internet browsers... for a long time, it seemed like a way to retard the development of the Internet forever. Fortunately, once the Internet got used more, somebody came up with this brilliant idea of WebAssembly, which seems to solve basically all the problems created by Netscape with that wrong decision. And, incidentally, that solution also significantly lowers the barrier towards making a new programming language, so that many more people can experiment with those things. When governments make a mistake, quite often, there is no solution. When the UN decided back in 1948 that the solution to the Holocaust is to make Palestinians pay for the Hitler's crimes with their land... it led to wars which continue to this day, and will likely continue all until a nuclear holocaust destroys most life on Earth (as has almost happened a few times by now). A private company most likely cannot make a mistake with such horrible consequences.
UPDATE on 11/10/2020: Of course, my compiler is not a very high-quality software. The LGTM static analyzer places it in the category, because it has found potential bugs per 5'500 lines of code, most of them being unnecessarily doing deep copies of C++ objects (wasting time and memory), and quite a few of them being using potentially uninitialized variables in JavaScript. If you want to collaborate with me, perhaps one of the first things to do is fix those bugs found by static analysis.
UPDATE on 28/08/2021: As a part of a paper I have written in my economics class, I have explained why I chose to target WebAssembly and why I think WebAssembly will revolutionalize compilers:
WebAssembly je dizajniran da bude jako lagan cilj za compilere, daleko
lakši nego asemblerski jezici fizičkih procesora. Kada dizajniramo
asemblerski jezik za fizički procesor, imamo tri trade-offsa
(pritisaka):
Sorry about that, but it would take me a lot of time to translate that
to English.- Moramo osigurati da taj asemblerski kod mogu lagano ispisivati compileri. Compilerima je najlakše ispisivati asemblerske kodove za mašine bazirane na stogu, i mislim da će se svatko tko je napravio i najjednostavniji compiler s time složiti. Korištenje registara kakvi postoje u fizičkim procesorima zahtijeva komplicirane algoritme. Compilerima nije važno ima li dovoljno registara da se sve varijable pohrane u registar, jer compileri lako stave varijable u memoriju i vode računa gdje su. Compilerima koji optimiziraju kod smeta postojanje nevažnih instrukcija u asemblerskom jeziku, jer im je onda teško procijeniti je li asemblerski kod koji ih koristi optimalniji ili manje optimalan.
-
Moramo osigurati da taj asemblerski kod mogu relativno lagano
pisati ljudi. Jer, suočimo se s time, neki se programi nužni za
rad računala, kao što su bootloaderi, ne mogu pisati u višim
programskim jezicima, a još su se manje mogli prije 50-ak godina
kada su osnove asemblerskih jezika kojima se služe današnji (x86,
ARM...) procesori dizajnirane. Ljudima je, mislim da će se svatko
tko je pisao asemblerski kod složiti, najlakše pisati asemblerski
kod za procesore s mnogo registara, a teško im je koristiti stog
za računanje, gotovo suprotno nego što je lagano compilerima. A
voditi računa o tome koliko je koja lokalna varijabla udaljena od
vrha sistemskog stoga, što je compilerima jako lagano, za ljude je
gotovo nemoguća misija. Ljudima koji pišu asemblerske kodove dobro
dođu compilerima nevažne instrukcije kao što su
fsin
(sporo, ali ekstremno precizno računanje sinusa), koje optimizirajućim compilerima koji pokušavaju modelirati procesor smetaju. - Procesor koji će razumjeti taj asemblerski kod (točnije, strojni kod koji odgovara tom asemblerskom kodu naredbu-po-naredbu) mora biti moguće efikasno fizički implementirati. Jer, suočimo se s time, na kraju krajeva, procesori su žice. Moderni procesori su bezbroj žica spojenih na način toliko kompliciran da ti glava pukne i da ti bude drago što o tome ne učite na fakultetu. Zapravo, vjerojatno ti ni jedna osoba ne bi mogla objasniti kako moderni procesori funkcioniraju. No, ipak, očito je da asemblerski jezici koji semantički pružaju neograničeno velik stog ili proizvoljan broj registara ne mogu biti efikasno implementirani žicama.
UPDATE on 17/04/2022: I have started a Reddit thread asking why there are a few C++ compilers targetting WebAssembly (JavaScript bytecode), but no C++ compilers targetting Java Bytecode.
How to use the compilers
Probably the simplest way to use the AEC-to-x86 compiler on a Linux machine is to type the following code into a terminal emulator:
mkdir ArithmeticExpressionCompiler cd ArithmeticExpressionCompiler if [ $(command -v wget > /dev/null 2>&1 ; echo $?) -eq 0 ] # Check if "wget" exists, see those StackOverflow answers for more details: # https://stackoverflow.com/a/75103891/8902065 # https://stackoverflow.com/a/75103209/8902065 then wget https://flatassembler.github.io/Duktape.zip else curl -o Duktape.zip https://flatassembler.github.io/Duktape.zip fi unzip Duktape.zip if [ $(command -v clang > /dev/null 2>&1 ; echo $?) -eq 0 ] # We prefer "clang" to "gcc" because... what if somebody tries to run this in CygWin terminal? GCC will not work then, CLANG might. then c_compiler="clang" else c_compiler="gcc" fi $c_compiler -o aec aec.c duktape.c -lm # The linker that comes with recent versions of Debian Linux insists that "-lm" is put AFTER the source files, or else it outputs some confusing error message. if [ "$OS" = "Windows_NT" ] then ./aec analogClockForWindows.aec $c_compiler -o analogClockForWindows analogClockForWindows.s -m32 ./analogClockForWindows else ./aec analogClock.aec $c_compiler -o analogClock analogClock.s -m32 ./analogClock fiIf everything is fine, the Analog Clock program should now print the current time in the terminal. I think this would work on the vast majority of Linux machines, as well as on many non-Linux (FreeBSD, Solaris...) machines. A potential problem is that the 32-bit libraries are not installed on a 64-bit Linux machine (so that -m32 fails), but this is rarely the case, as Linux machines today usually have WINE (a Windows compatibility layer requiring 32-bit libraries), or at least some 32-bit programs. Needless to say, this will not work on Linux running on ARM processors, such as Android or Raspberry Pi. Also, the Analog Clock program probably cannot be run on Windows (I haven't managed to try it using CygWin, but I am quite sure it wouldn't run on Windows even if I managed to install Cygwin. (UPDATE on 08/05/2021: I have found a relatively simple way to modify the analogClock.aec so that it can be assembled by the version of GNU Assembler that comes with 32-bit MinGW-W64, and thus run on both 32-bit and 64-bit Windows, so I saved it in analogClockForWindows.aec. Unfortunately, it still shows a lot of errors if you attempt to assemble it using the versions of GNU Assembler that come with TDM-GCC or CygWin. Apparently, the versions of GNU Assembler that come with various ports of GCC on Windows are different from each other to a greater extent than the version of GNU Assembler that runs on Linux and one that comes with MinGW-W64 are. I find that both surprising and somewhat unfortunate. What is especially unfortunate is the fact that some preprocessor directives in GNU Assembler have the same syntax, but different meanings depending on whether it is targetting Linux or Windows. With a little more modifications, which I have also done, analogClockForWindows.aec can also be assembled by the version of LLVM Assembler that comes with CLANG on Windows. However, the GNU Assembler that comes with MinGW-W64 and the LLVM Assembler that comes with CLANG on Windows do not output the same machine code. The executable produced by LLVM Assembler appears to run significantly faster on Windows 10, but it refuses to run at all on Windows XP. I have not studied it enough to explain what is going on there, and it is a bit creepy.)), but some other programs in Duktape.zip can. However, those cannot be assembled by GNU Assembler (invoked by gcc), you need to use FlatAssembler instead. See the ReadMe.html inside Duktape.zip for more details. The executables of Duktape for various x86 OS-es and example x86 AEC programs are available in a ZIP archive on my GitHub profile (UPDATE on 13/05/2021: Like I have said, the analogClockForWindows.exe file, assembled by CLANG on Windows, although it works on both 32-bit and 64-bit Windows 10, for some reason that escapes me, it refuses to run on Windows XP. You can assemble the assembly code produced by my compiler for the analogClockForWindows.aec, which will be called analogClockForWindows.s, using MinGW-w64, and then it will run on Windows XP, but it will be slower. Again, the explanation for that escapes me. An obvious explanation would be that MinGW-w64 includes some kind of a polyfill for functions missing on Windows XP, that make the executable able to run on it, but make it slower. But the problem with that explanation is that, actually, the executable produced by CLANG is bigger than the executable produced by MinGW-w64. If the executable by MinGW-w64 were polyfilled, we would expect it to be bigger, rather than smaller. (UPDATE on 16/05/2021: I have started a Reddit thread about it.)). You can also use SimpleCalculator, a version of the AEC-to-x86-compiler running in Rhino (a JavaScript engine written in Java, by Mozilla) and using Swing GUI. It an be used as a simple calculator, but it also supports converting AEC programs to x86 assembly. The AEC-to-x86 compiler, both when run in Rhino and when run in Duktape, can output assembly in two formats, one compatible with FlatAssembler, and one compatible with GNU Assembler. To switch between them, use syntax fasm and syntax gas. By default, it targets FlatAssembler. When targeting GNU Assembler, keep in mind that the directive syntax gas needs to be the very first directive in your program, even before any comments. Namely, the AEC-to-x86 compiler passes the comments down to the assembler, but FlatAssembler begins comments with ;, whereas GNU Assembler begins comments with #. To GNU Assembler, semi-colon ; means to have multiple assembly-language directives in a single line (useful for when invoked from a debugger, where you need to inline a few directives).
By the way, in case you are interested, here is how zero9178 (a friend I met on Discord, who maintains his own GCC port to Windows) explains the fact that putting -lm in front of source files causes an error on some (but, curiously, not all) versions of Linux:
Only a very educated guess, but GNU ld is very sensitive to link order
of archives and lazily loads archives.
More specifically what is likely happening here is that it is passing -lm first, before any object files to the linker (you can see the linker invocation GCC does by adding -v that should give you more info).
Since at that point in time, there are no undefined symbols, as object files have not been processed yet, the -lm is effectively discard.
Later -lm are likely discard too as they are seen as duplicates of the first one.
Usually -lm would be placed further back, since later object files or other archives that require symbols from it will have already been imported by the linker.
Using the AEC-to-WebAssembly compiler
is, on most Linux machines, a little trickier. The following code might
work:
More specifically what is likely happening here is that it is passing -lm first, before any object files to the linker (you can see the linker invocation GCC does by adding -v that should give you more info).
Since at that point in time, there are no undefined symbols, as object files have not been processed yet, the -lm is effectively discard.
Later -lm are likely discard too as they are seen as duplicates of the first one.
Usually -lm would be placed further back, since later object files or other archives that require symbols from it will have already been imported by the linker.
if [ $(command -v git > /dev/null 2>&1 ; echo $?) -eq 0 ] then git clone https://github.com/FlatAssembler/AECforWebAssembly.git cd AECforWebAssembly elif [ $(command -v wget > /dev/null 2>&1 ; echo $?) -eq 0 ] then mkdir AECforWebAssembly cd AECforWebAssembly wget https://github.com/FlatAssembler/AECforWebAssembly/archive/refs/heads/master.zip unzip master.zip cd AECforWebAssembly-master else mkdir AECforWebAssembly cd AECforWebAssembly curl -o AECforWebAssembly.zip -L https://github.com/FlatAssembler/AECforWebAssembly/archive/refs/heads/master.zip # Without the "-L", "curl" will store HTTP Response headers of redirects to the ZIP file instead of the actual ZIP file. unzip AECforWebAssembly.zip cd AECforWebAssembly-master fi if [ $(command -v g++ > /dev/null 2>&1 ; echo $?) -eq 0 ] then g++ -std=c++11 -o aec AECforWebAssembly.cpp # "-std=c++11" should not be necessary for newer versions of "g++". Let me know if it is, as that probably means I disobeyed some new C++ standard (say, C++23). else clang++ -o aec AECforWebAssembly.cpp fi cd analogClock ../aec analogClock.aec npx -p wabt wat2wasm analogClock.wat if [ "$OS" = "Windows_NT" ] # https://stackoverflow.com/a/75125384/8902065 # https://www.reddit.com/r/bash/comments/10cip05/comment/j4h9f0x/?utm_source=share&utm_medium=web2x&context=3 then node_version=$(node.exe -v) else # We are presumably running on an UNIX-like system, where storing output of some program into a variable works as expected. node_version=$(node -v) fi # "node -v" outputs version in the format "v18.12.1" node_version=${node_version:1} # Remove 'v' at the beginning node_version=${node_version%\.*} # Remove trailing ".*". node_version=${node_version%\.*} # Remove trailing ".*". node_version=$(($node_version)) # Convert the NodeJS version number from a string to an integer. if [ $node_version -lt 11 ] then echo "NodeJS version is lower than 11 (it is $node_version), you will probably run into trouble!" fi node analogClockAgain, if everything is fine, the Analog Clock program should print the current time in the terminal. However, in order for this to work, you need to have NodeJS installed, which is often not the case. You also need to have a version of NodeJS newer than one which is usually shipped with Linux, as Debian-like Linux distributions today usually ship with NodeJS 10, and CentOS-like distributions with NodeJS 6, and the code my compiler generates relies on WebAssembly.Global being present, which is only true on NodeJS 11 and newer. For the exactly same reason, the code my compiler produces will not run in Firefox 52, which is the last version of Firefox to run on Windows XP, and the first one to support WebAssembly. I think I made the right decision not to waste time supporting the earliest implementations of WebAssembly, as almost nobody will notice my effort, and I'd need to put a lot of it. Where to draw the line there? Make my compiler output asm.js as well? (If you really need the AEC programs to work in old browsers, perhaps you could try decompiling them using wasm2c from WebAssembly Binary Toolkit and somehow re-compiling them using Emscripten (I haven't tried it myself, but I speculate it may be possible). Or you may try to somehow polyfill WebAssembly, which the FAQ of WebAssembly says is, in principle, possible. But I am quite sure you would need to polyfill the entire WebAssembly, and not just WebAssembly.Global.) Of course, in order for npx -p wabt wat2wasm to work, you either need to have WABT already installed, or you need to be connected to the Internet so that npx can download it (which you probably do as you recently cloned from GitHub, but the firewall also needs to enable access to npmjs.com, which fewer firewalls do). If you want to use AEC-to-WebAssembly compiler on an OS that's not compatible with Linux, well, good luck. I have provided some executable files of my compiler for different OS-es, including FreeDOS (compiled using DJGPP C++ compiler), in case it helps, as the assets to the releases of AECforWebAssembly. Especially good luck using WABT and NodeJS there (I believe WABT can be made to run on FreeDOS, but with a lot of difficulty, and that NodeJS cannot be made to run on FreeDOS even with a lot of effort.) Sorry, but dealing with binary files is complicated, and me building an assembler for WebAssembly into my compiler (instead of simply outputting assembly) would give me a lot of hassle and very little benefit. I hope you understand. And, if you are going to try to run the analogClock.html file in a browser, please note it will not work unless you are using a web server such as php -S 127.0.0.1:8080. It has to do with sandboxing and external WebAssembly not being trusted by browsers. If you are running untrusted WebAssembly (not produced by the JavaScript compiler in the browser itself) from your harddrive, the JavaScript Virtual Machine cannot be sandboxed not to read or modify or delete the data on your harddrive. That is why contemporary web-browsers disallow that (just like, when JavaScript was in its infancy, browsers did not allow JavaScript on your harddrive to be run without your explicit permission). If you are running WebAssembly from a web server, it can be sandboxed so that it cannot harm your computer.
A friend I met on Discord called zero9178 helped me write the CMAKE script for building and testing the AEC-to-WebAssembly compiler, so that you can easily use any IDE that works well with CMAKE (Visual Studio, QtCreator, NetBeans and CLion can import CMAKE projects automatically, and CMAKE can be made to output configuration files necessary for Eclipse). For now, however, Visual Studio 2019 falsely claims tests which invoke WABT fail (that WABT executables are not proper Windows executables, although they can be run from command-line as well as from CMAKE run from command line). I do not know why. (UPDATE on 26/08/2021: As of time of writing this, the compiler crashes if compiled with Visual Studio, due to a Stack Overflow error.) The automated tests are integrated with GitHub Actions and GitLab CI, and they seem to work properly there. The structureDeclarationTestCompiles seems to run around an order of magnitude faster (that is, 10 times faster) on GitHub Actions than on the laptop I am working on, and around five times faster than on GitLab CI (that is, GitLab CI seems to be around 2 times faster than my laptop). I am not sure why, as I expected my compiler to run very poorly in computer clouds, because computer clouds are, as far as I understand it, made of countless low-powered computers which can be well-used only by programs that support parallel execution, which my compiler does not. I asked a question on Quora about that. Nevertheless, I think the hypothesis that it does not actually run there (and that is the reason why it seems to run so quickly) can be eliminated, as subsequent tests would fail (WABT invoked in structureDeclarationTestAssembles would exit with an error message, and so would NodeJS invoked by structureDeclarationTestRuns) if it were the case. The tokenizer of my compiler runs very slowly, and I do not know how to make it faster. I have made a forum thread about it.
UPDATE on 28/05/2021: As a friend I met on Discord called elucent (the author of the Basil programming language) suggested, the tokenizer can be made a lot faster by using std::remove_if to erase all whitespace at once rather than by calling std::vector<typename T>::erase for each all-whitespace token (as zero9178 had suggested me to do). I implemented that, and now the test structureDeclarationTestCompiles takes only 2 seconds to run on GitLab CI, whereas it previously took 6 seconds (so it is around 3 times faster). He also suggested some other ways to make both the tokenizer and the parser faster, but those are harder to implement (UPDATE on 26/08/2021: A user of the Atheist Forums called HappySkeptic found a way to significantly speed up the parser in a relatively easy way.).
UPDATE on 06/06/2021: The AEC-to-WebAssembly compiler can now target WebAssembly System Interface (WASI), as the example Hello World from WASI shows. Basically, you need to put #target WASI before any declarations. Unlike with AEC-to-x86 syntax gas directive, comments can go before that. I believe this a significant step forward on a way for the WebAssembly dialect of AEC to run on rarely used operating systems such as FreeDOS, using portable WASI environments such as Wasm3. Of course, that is assuming we also manage to compile wat2wasm from WebAssembly Binary Toolkit to run there (which will not be easy because CMAKE does not run there). Also, some blockchains support WebAssembly Binary Interface, so perhaps my compiler can now be useful there.
UPDATE on 09/01/2023: Here is how my AEC-to-WebAssembly compiler can be compiled using latest Emscripten:
emcc -o aec.js AECforWebAssembly.cpp -s NODERAWFS=1 -s ALLOW_MEMORY_GROWTH=1 -s TOTAL_STACK=4MB -O3Previous versions of Emscripten did not require -s TOTAL_STACK=4MB, but the latest version does, because of this pull request.
UPDATE on 05/05/2023: Here is how you can cross-compile my AEC-to-WebAssembly compiler from Windows to 64-bit Linux using Docker:
PS E:\My programs\AECforWebAssembly> docker run --rm -v "$($PWD):/usr/src/myapp" -w /usr/src/myapp gcc:13.1 g++ -o aec.elf AECforWebAssembly.cpp -static -O3Of course, you should replace 13.1 with the version of GCC you want to use. The newer the GCC version, the more optimized the output will be, but you are risking running into trouble because of the poor backwards compatibility (C++ is not JavaScript, where you can expect a program that works now to work forever). For instance, GCC 13.1 started to require #include <cstdint> in order to use uint8_t and similar typenames, which broke my GitLab CI/CD once GitLab started using GCC 13.1 (which is how I came to know about GCC 13.1 in the first place). You need to use -static, or else your program will not run on any Linux machine that has the C++ library older than GCC 13.1 in Docker uses (which are, as of time of writing this, all Linux distributions except Fedora). It is also a good idea to run strip aec.elf inside Docker after using GCC to produce aec.elf to make it use less space. I have asked a question on both Quora and Reddit about why GCC run in Docker produces around 30% smaller executable files than GCC run on actual Linux.
Comments
In the version of AEC targeting x86, the comments start with ; and end with a newline character, as in FlatAssembler dialect of Assembly (which ArithmeticExpressionCompiler primarily targets), and there are no multi-line comments. In the version of AEC for WebAssembly, the comments are as in C, C++ and JavaScript, single-line comments start with //, and multi-line comments start with /* and end with */. Multi-line comments do not nest (as they do in, for example, Swift). Many people say multi-line comments are a bad thing because bad programmers use them for versioning code (which is a very bad practice). I don't think the job of the compiler is to enforce some particular programming style and refuse to compile code-smelling programs (though warnings are often useful).
Constants
In AEC for x86, a token that consists of numbers and at most one point is a number, and all numbers are treated as 32-bit decimal numbers. In both dialects of AEC, a string is a token which starts and ends with ", and strings are passed unchanged to the assembler. String tokens next to each other are concatenated by the tokenizer into one string (as in C and C++, in contrast with JavaScript). In both dialects of AEC, a token consisting of three characters of which both the first one and the last one are ' is a number, and the tokenizer replaces it with a number equal to the ASCII code of the second character in that token (like in most dialects of x86 assembly). In AEC for WebAssembly, a token which matches the regular expression "(^\\d+$)|(^0x(\\d|[a-f]|[A-F])+$)" is of type Integer64 and is passed unchanged to the assembler (notice that this includes hexadecimal numbers starting with 0x, as in C, C++ and JavaScript). A token which matches the regular expression "^\\d+\\.\\d*$" is of the type Decimal64 and is also passed unchanged to the assembler (or, in case it is to be assigned as an initial value of a global variable, it is converted to IEEE754 hexadecimal). In AEC for x86, a token which matches the regular expression "^\\d+\\.\\d*$" is, just like all numbers, of type Decimal32 and is always converted to IEEE754 hexadecimal (unless it's run in a JavaScript environment that doesn't support ArrayBuffer, in that case, only FlatAssembler can be targetted, as FlatAssembler can convert decimal numbers to IEEE754 hexadecimals itself). Notice that in AEC for WebAssembly, 3/2=1 (as in C, C++, Java, C#, Rust and Python 2.x), while, in AEC for x86, 3/2=1.5 (as in JavaScript, PHP, LISP and Python 3.x). It's hard to tell which approach is better, both can produce hard-to-find bugs. The Pascal-like approach of using different operators for integer division and decimal division probably makes the most sense (Pascal uses / for floating-point division and div for integer division. Similarly, Dart uses / for floating-point division and ~/ for integer division), but it will also undeniably feel alien to most programmers. I have started a Reddit thread about that.
UPDATE on 17/09/2021: The AEC-to-WebAssembly compiler version v2.1.0 (not released as of the time of writing this, but it can be built from source) supports multi-line strings, the same way C++ does. An excerpt from Multi-line String Test:
CharacterPointer first := R"( \"Hello world!"\ )", second := R"ab( \"Hello world!"\ )ab", third := R"a( \"Hello world!"\ )a"; //Should return 1 Function multiLineStringTest() Which Returns Integer32 Does Return strlen(first) = strlen(second) and strlen(second) = strlen(third) and strlen(third) = strlen("\\\"Hello world!\"\\") + 2; EndFunctionThe support for multi-line strings will probably never be added to the AEC-to-x86 compiler, since AEC-to-x86 compiler does not tokenize the whole program before compiling, but tokenizes line-by-line, so it would need to be significantly restructured to support multi-line-strings. I like the way C++ supports multi-line strings more than I like the way JavaScript supports them. In JavaScript, namely, multi-line strings begin and end with a backtick `, which was presumably made under the assumption that long hard-coded strings (for which multi-line strings are used) would never include a back-tick. That does not seem like a reasonable assumption. C++ allows us to specify which string surrounded by a closed paranthesis ) and the quote sign " we think will never appear in the text stored as a multi-line string (in the example above, those were an empty string in first, the string ab in second, and the string a in third), and the programmer will more-than-likely be right about that. Java does not support multi-line strings at all, supposedly to discourage hard-coding of large texts into a program. I think that is not the right thing to do, primarily because multi-line strings have many good uses: they arguably make the AEC-to-WebAssembly compiler, written in C++, more legible. Parser tests and large chunks of assembly code are written as multi-line strings there, and I think rightly so. I have opened a Reddit thread about that.
UPDATE on 23/09/2021: Until the version v2.1.2 (not released as of the time of writing this, but it can be built from source), there was a bug in the tokenizer preventing multi-line strings consisting exclusively of the \ characters from being tokenized correctly.
UPDATE on 22/01/2022: AECforWebAssembly v2.3.0 and newer will support inserting _-es (underscores) inside number literals for better legibility, like JavaScript supports. Here is an excerpt from Birthday Paradox that illustrates that:
Integer32 nasumican_broj := 9_907; Function nasumican_broj() Which Returns Integer16 Does Return nasumican_broj := mod(nasumican_broj * 48_271, 2_147_483_647); // Ovo radi isto što i minstd_rand radi u C++-u. EndFunctionIt implements a simple pseudo-random generator.
Variable declarations
In the version of AEC targeting x86, there are no variable declarations in the language itself, the compiler simply assumes any token that matches the regular expression "^(_|[A-Z]|[a-z])\\w*\\[?$" and is not a keyword is a name of a variable of type 32-bit decimal number or 32-bit decimal number array (if it ends with [) that's been previously declared in assembly. In the version of AEC targeting WebAssembly, variables are declared with:
DataType name_of_the_variable;Where DataType is Character, CharacterPointer, Integer16, Integer16Pointer, Integer32, Integer32Pointer, Decimal32, Decimal32Pointer, Decimal64 or Decimal64Pointer. (UPDATE on 28/09/2021: Since AECforWebAssembly v2.2.0 (not released as of the time of writing this), you can write PointerToCharacter instead of CharacterPointer, PointerToInteger32 instead of Integer32Pointer, and so on. I think PointerToCharacter is a name that is far harder to misinterpret, as somebody might think CharacterPointer is a character that is also a pointer.) The compiler assumes the pointers are 32-bit and characters are 8-bit, which is true in the vast majority of cases. There is also a way to initialize a variable:
DataType name_of_the_variable := initial_value;Without that, global variables are zero by default, and local variables contain whatever happens to be on the top of the system stack at the time of their declarations (as in C or C++). In Ada and old versions of C, variables must be declared on the top of a scope, before any other statement. There is no such restriction in AEC. However, the initial values to global variables must be compile-time constants. That means, you can't refer to other global variables or to AEC functions. However, when writing an initial value for a decimal variable, you can use C library (available to the compiler) functions and constants, such as sin(x), asin(x), atan2(y,x), pi or e. The same doesn't work when assigning initial values to local variables. It's possible to declare multiple variables of the same type in the same statement by separating them with a comma , (as in C, C++ and JavaScript). For example (excerpt from the Dragon Curve):
Integer32 directionX[4] := { 0, 1, 0, -1}, directionY[4] := {-1, 0, 1, 0}, currentX := 10, currentY := 250 + 490 - 410, //When set on 250, the turtle //reaches 410 and then turns //back (I know this by //experimenting). currentDirection := 0, lineLength := 5, lineWidth := 2, currentStep := 0;I like the C-like approach to declaring multiple variables in the same statement way more than the Ada-and-VHDL-like approach, yet alone the Rust-like approach. I realize the Rust-like approach makes theoretically much more sense, but it definitely feels weird to somebody coming from another programming language.
TODO: Decide what to do about aligning the variables in memory (making sure that, for example, a Integer32 is on the memory location divisible by 4). Aligning variables wastes memory, sometimes around half of the allocated memory ends up unused because of the aligning. On the other hand, for the interoperability with other languages, it is probably desirable for variables and arrays to be aligned. JavaScript throws an exception on attempted unaligned access, while in C and C++, it is supported by some compilers and optimization levels but not in others (it's undefined behavior). Right now, the AEC compiler doesn't make sure the variables are aligned, which I am not sure is the best approach. Also, while JavaScript Virtual Machine does allow unaligned memory access, it's not guaranteed to be nearly as fast as aligned access (on x86, it usually is, on ARM, it's many times slower).
Arrays and Pointers
In the x86 dialect of AEC, you don't declare arrays in AEC, the compiler assumes any arrays you mention in AEC are declared in inline assembly (just like it does for variables). In the x86 dialect of AEC, you can reference an array called fib either as fib(0) (as in BASIC) or as fib[0] (as in C). In the WebAssembly dialect of AEC, you can only do that as fib[0].
In the WebAssembly dialect of AEC, arrays are declared as follows:
DataType name_of_the_array[size];They can be initialized as follows:
DataType name_of_the_array[size] := {first_element, second_element...};Note that, unlike in C, if you put only one element in the initializer list (between { and }), only the first element is initialized with that value, while others are left uninitialized (or are, in the case of global variables, set to zero). Makes a lot more sense to me than those complicated rules C and C++ have for initializing arrays. In C and C++, you can use array-style syntax with pointers or pointer-style syntax with arrays most of the time, except in some confusing scenarios. In AEC, you can never do those things. To the AEC compiler, the array is named name_of_the_array[ rather than name_of_the_array, and attempting to use name_of_the_array to refer to the pointer to the first element of the array (as you can usually do in C or C++) leads to "undeclared variable" error (since the compiler is using Levenshtein Distance to provide suggestions for misspelled variable names, it will almost certainly suggest you the array name then, with the [ at the end, since it will have the Levnstein Distance of 1). This can be confusing to those who come from C or C++, but it makes a lot more sense to me. Or, you can make it behave in the C-like manner (if it makes your code shorter) like this (excerpt from the Analog Clock in AEC):
Character signature[100] := {0}; CharacterPointer signature := AddressOf(signature[0]); //AEC, unlike C, always makes a clear distinction between //arrays and pointers. logString("Empty signature has length of: "); logInteger(strlen(signature)); logString("\n"); strcat(signature, " Analog Clock for WebAssembly\n");To get or assign the value to the thing a pointer points to, you use the ValueAt( operator, like this (excerpt from HybridSort in AEC ):
ValueAt(originalni_niz + donja_granica) := ValueAt(originalni_niz + donja_granica + 1);Makes a lot more sense to me than to use the same operator as for the multiplication, like C or C++ do (they use * for both of those things). Using the same operator for declaring a pointer as for multiplication in fact makes C and C++ significantly more difficult to parse in some cases, this is called the typedef problem. In C and C++, the statement first * second; can mean both "Declare a pointer to the type named first, and name that pointer second." and the far less sensible statement "Multiply variables named first and second, store the result on the system stack and then ignore it.", depending on the context (whether first is a name of a type or a name of a variable, which one cannot know during the parsing phase of compiling). In the dialect of AEC targeting x86, it can only mean the second one (with the modification that the result is not stored on the system stack, but to a memory location where the pointer result, which is supposed to be declared in inline assembly, points to), and, in the dialect of AEC targeting WebAssembly, the compiler complains if you write something like that because WebAssembly itself does not let you store a result on system stack and then pretend it is not there, that is, it does not support expressions which are not assigned to anything. To get a pointer to something, you use the AddressOf( operator. Makes a lot more sense to me than the way C and C++ do that, using the same operator as for the bitwise and operation (they use &). AEC for x86 doesn't support pointers at all. Note that, like in C and C++ (but unlike in JavaScript or Assembly), name_of_some_integer32pointer := name_of_some_integer32pointer + 1 increases the value stored in it by 4 (the size of Integer32), rather than by 1.
TODO: Implement the multi-dimensional arrays. But let's not use the JavaScript-like approach to them, JavaScript really sucks in that regard. For now, only
CharacterPointer elements[13] := {"I", "IV", "V", "IX", "X", "XL", "L", "XC", "C", "CD", "D", "CM", "M"}; Integer16 meaningsOfElements[13] := {1, 4, 5, 9, 10, 40, 50, 90, 100, 400, 500, 900, 1000};UPDATE on 10/04/2023: For now, if you need to implement an algorithm that uses multi-dimensional arrays, you can write it in a relatively-legible manner by writing helper functions. I was doing that when writing Hurwitz'es Algorithm:
Function f(Integer16 i, Integer16 j) Which Returns Integer16 Does // Za pretvaranje indeksa dvodimenzionalnog polja u indeks jednodimenzionalnog // polja. Kada u svoj AEC compiler još nisam implementirao dvodimenzionalna // polja... Return 20 * i + j; EndFunctionAnd, after that...
matrica[f(i, j)] := (matrica[f(i - 1, 0)] * (j + 1 < broj_stupaca ? matrica[f(i - 2, j + 1)] : 0) - (matrica[f(i - 2, 0)] * (j + 1 < broj_stupaca ? matrica[f(i - 1, j + 1)] : 0))) / matrica[f(i - 1 , 0)];
Assignments
For assignments, you use := operator, in both dialects of AEC. In the AEC for WebAssembly, you can nest assignment expressions, like this (excerpt from HybridSort):
broj_obrnuto_poredanih_podniza := broj_vec_poredanih_podniza := broj_pokretanja_QuickSorta := broj_pokretanja_MergeSorta := broj_pokretanja_SelectSorta := 0;After that, all 5 variables will be 0. You can't do that in AEC for x86. In AEC for WebAssembly, assignment statements end with a semicolon ; and can run across multiple lines. In AEC for x86, they end with a newline character. In AEC for x86, there is also the string-assignment operator <= (similar to the difference between signal and variable assignments in VHDL) (UPDATE on 28/04/2024: I've started a StackExchange thread about how the VHDL compiler knows when <= is a signal assignment and when it is a less-than-or-equal operator.). In AEC for WebAssembly, you use := for string assignments.
TODO:
UPDATE on 15/07/2021: In most programming languages these days, the assignment operator is =, whereas the equality-testing operator is ==. The rationale for that, dating back to the C programming language, is that most programs more commonly do variable assignments than they do testing for equality between variables, so that it makes sense to use a shorter operator for a more common operation that is assignment. While that may have made sense back then, I don't think that makes any sense with modern computers. Beginners in programming are in particular often confused by = meaning assignment, rather than something they are used to from mathematics. Also, I believe everybody who has programmed in a C-like language can agree that thing about equality operator being two assignment operators often leads to bugs. Namely, one of the most common errors in C-like languages is that the programmer mistypes if (variable==0) as if (variable=0), leading to the value in variable being lost and the block of code after that if never being executed. C# attempts to solve that problem by disallowing implicitly converting to the boolean type in the if-conditions, which is an interesting solution, but it often makes the code longer, as well as not addressing the fact that the same problem can happen with boolean variables (UPDATE on 05/06/2024: One hard-to-trace bug in my AEC-to-WebAssembly compiler, causing my compiler to lose track of the sizes of the structures and cause an internal compiler error, was caused by mismatched parentheses, which C++ compiler didn't complain about, but instead it implicitly casted an integer into boolean. Had I written my compiler in Java or C#, rather than in C++, something like that couldn't have happened to me.). So, I think that making := the assignment operator and = the equality-testing operator is a good choice. The AEC-to-WebAssembly compiler allows whitespace between : and = in the assignment operator :=, so that, when ClangFormat mistakes : for the label-ending sign and puts a whitespace after it, the code does not lose its meaning. I am not sure now whether that was a good choice. I have created a Reddit thread about it, as well as a StackExchange thread about how to explain to ClangFormat that := is the assignment operator.
Operators
AEC has the following operators:
Priority | Associativity | Operators |
---|---|---|
1 | left | . -> |
2 | left | * / |
3 | left | - + |
4 | left | < > = |
5 | left | and (in the x86 dialect: &) |
6 | left | or (in the x86 dialect: |) |
7 | right | ?: (ternary conditional operator) |
8 | right | := (assignment operator) |
velicina_niza / (64 * 1024 / 4) + not(not(mod(velicina_niza, 64 * 1024 / 4))) > prijasnja_velicina_niza / (64 * 1024 / 4) + not(not(mod(prijasnja_velicina_niza, 64 * 1024 / 4))) or prijasnja_velicina_niza = 0Namely, not(not(x)) returns 1 if x is not 0, and 0 if x=0. You can also use not(x=0). There is no built-in exclusive or function or operator in AEC, but you can easily build one like this (excerpt from Arithmetic Operators Test):
Function xor(Integer32 first, Integer32 second) Which Returns Integer32 Does //I hope people will like the way I named the bit-operators. Return (first and invertBits(second)) or (invertBits(first) and second); EndFunctionAs you can see, there is a built-in invertBits(Integer32 x) function which inverts the bits in an integer. Internally, it xor-s x with -1.
UPDATE on 30/11/2023: With the new versions of AEC (supporting easy-to-use inline assembly), there is an even simpler way to implement the xor-operation in AEC:
Function xor_using_assembly(Integer32 first, Integer32 second) Which Returns Integer32 Does Return asm_i32(R"assembly( (i32.xor (i32.load %first) (i32.load %second) ) )assembly"); EndFunctionUPDATE on 28/04/2024: I've implemented the chained comparisons operators. For instance, this function:
Function testChainedComparisons() Which Returns Integer32 Does Return (1 < 2 < 3) and not(2 < 3 < 1) and (-3 < -2 < -1); EndFunctionIt returns 1 if it is compiled using AECforWebAssembly v2.8.0 (not released as of writing this), but it returns 0 if it's compiled with AECforWebAssembly v2.7.0.
Unfortunately, the implementation is incorrect. All it does is to rewrite a < b < c to a < b and b < c, which will, of course, evaluate b twice. That is incorrect whenever b has side-effects. You can see how it's implemented on GitHub. (UPDATE on 29/04/2024: I've asked a StackExchange question about that.)
That problem might seem similar to the problem which occurs when compiling += and similar operators, when incrementing an element of an array with the index being a function with side-effects, but that's actually not the same problem. There doesn't seem to be such a simple solution (of using a local temporary variable) to the problem with chained comparisons.
UPDATE on 03/05/2024: Here is an example program that illustrates the problem with chained comparisons as they are currently implemented in AEC:
#target WASI Integer16 counter := 0; Function b() Which Returns Integer32 Does counter += 1; Return counter; EndFunction Function test() Which Returns Integer32 Does // By common sense, this should return 1. However, because of the // current AEC semantics, this returns 2. counter := 0; Integer16 a := 0, c := 2; Integer16 resultOfComparison := a <= b() <= c; Return counter; EndFunction
UPDATE on 22/09/2024: I've received some comments on Internet forums telling me that the scanning backwards method of parsing right-associative operators (as I was doing both in my PicoBlaze assembler and in my AEC-to-WebAssembly compiler, but not in my AEC-to-x86 compiler) is considered to be an anti-pattern. So, I've opened a StackExchange question about that, which received many upvotes.
UPDATE on 12/01/2025: Why do many programming languages (C, JavaScript...) use the symbol || (two vertical parallel lines) to mean "or"? My guess is that the etymology of that is the fact that two switches connected in parallel form a primitive or-gate, similar to how two switches connected in a serie form a primitive and-gate. I've opened a question about it on StackExchange.
Branching
Branching is supported only via If, If-Else, and If-ElseIf-Else. There is no equivalent of C, C++ and JavaScript switch-case. After the If token, the compiler expects a condition. In AEC for WebAssembly, the condition ends with a Then, while, in AEC for x86, the condition ends with a newline character. The branching ends with EndIf. That's to make the program easier to parse, and to prevent dangling-else. Between Then and EndIf, there can be an Else token. Before the Else token (if there is one, otherwise it's before the EndIf) there can be an ElseIf token. An ElseIf token, much like the If token, is followed by an expression representing a condition, which is ended by the Then token. After the Then token and after the Else token, you can put an { (curly brace) which the compiler will ignore. Similarly, you can put an } before EndIf, ElseIf and Else. That is to make it easier to use text editors made primarily for C like languages for writing AEC (to jump to the end of the code block, for example). An example from the Analog Clock in AEC:
If signature[j] = '\n' Then i := (i / windowWidth + 1) * windowWidth; ElseIf not(signature[j] = 0) Then output[i] := signature[j]; colors[i] := modraColor; i := i + 1; Else output[i] := ' '; EndIfIf there is an ElseIf token inside the If-statement, there, of course, doesn't need to be an Else token:
If j < 2 and (output[i - windowWidth] = 'x' and (output[i + 1] = 'x' or output[i - 1] = 'x')) Then output[i] := 'x'; colors[i] := darkGreenColor; ElseIf j=2 and (output[i + 1]=' ' and output[i - windowWidth] = 'x') Then output[i] := ' '; EndIfNote that, unlike in Ada or VHDL, there is no semicolon after the EndIf token. That is, you can put the semicolon ; after EndIf if you want to, but you do not have to. I often put it so that ClangFormat does not get confused. When programming in VHDL, Ada or Pascal, it bothered me why I needed to put a semi-colon after end if (ADA and VHDL) or end (Pascal). I mean, EndIf by itself means an end of a statement, so why put another sign for ending a statement there? Is not it some kind of a pleonasm? I have asked a Quora question about that (apparently, it is to make recovery from parsing errors easier), and have created a Reddit thread, as well as a StackExchange question about that.
Simple branching can also be done using the ?: operator. As it is right-associative, it can be used to concisely write ElseIf statements. An example of that from Arithmetic Operators Test (one of the first programs I wrote in AEC for WebAssembly):
Function signum(Integer32 number) Which Returns Integer32 Does /* * The ternary conditional operator "?:" is right-associative, * as it is in C, C++ and JavaScript (unlike in PHP), which * makes it easy to abbreviate else-if statements using it. * And, as of time of writing this, I haven't yet implemented * the "If" statement into the AEC-to-WebAssembly compiler. */ Return (number<0)? //If the number is less than 0... -1 //signum of that number is -1... : //else... (number=0)? //if the number is 0... 0 //signum of that number is 0. : //else... 1; //The signum of that number is 1. EndFunctionIn AEC for x86, the condition after ElseIf ends with a newline, rather than with a Then token.
TODO: Implement something like switch-case, to make it easier to write long ElseIf branchings. Most of the time, they are a sign of bad code design, but sometimes they are not, and it's not acceptable for a language to discourage them.
UPDATE on 09/07/2021: A StackOverflow user called rici warned me that AEC-to-x86 compiler actually produces incorrect code for the ?: operator. Namely, the code it produces calculates the results of the second operand and the third operand before calculating the result of the first operand (the condition), which can, according to that rici, lead to unexpected divide-by-zero errors for expressions such as d = 0 ? 0 : n / d (if d is equal to zero). There does not seem to be a simple solution, given the way my AEC-to-x86 compiler is structured. Here is the code with which the ternary operator is implemented in the AEC-to-x86 compiler. The AEC-to-WebAssembly compiler is a lot more professionally made (I made the core of it when I was 20, while I made the core of the AEC-to-x86 compiler back when I was 18) and it does not suffer from that problem. It is also easier to implement the ternary operator when targetting WebAssembly than when targetting x86 because WebAssembly itself essentially has the ternary operator just with a different (LISP-like) syntax. Here is how I implemented the ternary conditional operator in the AEC-to-WebAssembly compiler.
So, as far as I can see, you can run into at least two problems when implementing the ?: operator in your language. PHP incorrectly parses it, my AEC-to-x86 compiler incorrectly compiles it... I've asked a question on Reddit and a question on StackExchange to see if there are some more (Apparently, there is, and I bumped into one of them related to structures that kaya3 warns about on StackExchange when making my AEC-to-WebAssembly compiler. What is exceptionally bad is that the error message is very misleading, and, right now, it escapes me what exactly is going on in the compiler. How can using two different structure types inside a ?: operator lead to the "Some part of the compiler tried to compile the array with a negative size." error? Well, it has been 3 years since I wrote the core of that compiler, so no wonder I don't remember the details of how it works. (UPDATE: I have diagnozed the problem. It is actually two different bugs, one in the compiler and one in the semantic analyzer. The problem is fixed in AECforWebAssembly v2.6.0 (not released as of time of writing this, but you have the source code on GitHub, GitLab and SourceForge). I have started a forum thread about absurd error messages on forum.hr, r/ProgrammingLanguages, Atheist Forums and r/CroIT.)).
UPDATE on 10/09/2021: I have decided to add the left-hand-side ternary conditional operator in assignments, which exists in C++, to my programming language (for now, and presumably forever, only in the AEC-to-WebAssembly compiler in the version v2.0.0 and newer, not in the no-longer-maintained AEC-to-x86 compiler). Sometimes, it can make the code shorter, and it looks very cool. An excerpt from Left Hand Side Conditional Test which illustrates it by implementing the original Euclid's Algorithm (which uses subtraction) using it:
Function gcd(Integer32 a, Integer32 b) Which Returns Integer32 Does While b > 0 Loop ((a > b) ? a : b) := ((a > b) ? (a - b) : (b - a)); /* * The compiler should transform the line above to: * ``` * If a > b Then * a := ((a > b) ? (a - b) : (b - a)); * // a := a - b; * Else * b := ((a > b) ? (a - b) : (b - a)); * // b := b - a; * EndIf * ``` */ EndWhile Return a; EndFunctionEven in C++, it is generally agreed upon that using the conditional operator on the left-hand side of the assignment is very bad style which makes programs unnecessarily more difficult to read, and JavaScript does not allow it at all. Nevertheless, I think allowing it is not fundamentally different from allowing the conditional operator on the right-hand-side and other expressions.
UPDATE on 13/09/2021: As I have mentioned on Discord, the behaviour of the left-hand-side ternary conditional operator in AEC does not exactly match the behaviour of left-hand-side ternary conditional operator in C++, and I think the behaviour in AEC is better. In C++, the following program compiles:
#include <iostream> using namespace std; int main() { int a=15; int b=25; while (b>0) ((a>b)?a:b)=((a>b)?(a-b):(b-a)); cout <<a <<endl; }However, this program refuses to compile with some confusing error message:
#include <iostream> using namespace std; int main() { int a=15; float b=25; while (b>0) ((a>b)?a:b)=((a>b)?(a-b):(b-a)); cout <<a <<endl; }The only difference between those two programs is whether we declare the variable b as float (equivalent to AEC Decimal32) or as int (on most compilers, that is equivalent to AEC Integer32). Compilers will refuse to compile the second program with some error message from which it is not obvious that the issue will be solved by changing float to int. In my opinion, that is hardly acceptable, and the AEC-to-WebAssembly compiler accepts the equivalents of both programs.
Loops
For now, there is only one type of loop in AEC, the While-loop. A While token is followed by an expression representing a condition. That expression ends with a newline character in AEC for x86, and with a Loop token in AEC for WebAssembly. In both of them, the While-statement ends with an EndWhile token. For example, here is what the Euclid Algorithm looks like in AEC for WebAssembly (an excerpt from Euclid Test):
While not(b=0) Loop If a>b Then a:=mod(a,b); Else If a=0 Then Return b; EndIf b:=mod(b,a); EndIf EndWhileIt uses the built-in function mod(Integer64 a, Integer64 b) to get the remainder of the division. AEC for x86 supports it both for integers and decimal numbers, whereas AEC for WebAssembly supports it only for integers (for the simple reason that WebAssembly doesn't support it for decimal numbers either). Nevertheless, there is a simple way to write it yourself in AEC for WebAssembly (excerpt from the Analog Clock):
Function fmod(Decimal32 a, Decimal32 b) Which Returns Decimal32 Does Return a - b * Integer32(a / b); EndFunctionIt might actually be a good idea to use Integer64( instead of Integer32(, because there is no guarantee that a / b will be in the range of Integer32. So, that's how you do casting in AEC, with the built-in functions named DataType(.
TODO: Think of some nice syntax for the for-loop, every remotely modern language has a for-loop. Also, implement something like C, C++ and JavaScript break and continue. Most of the time, they are a sign of bad code design, but sometimes they aren't, and it's not acceptable for a language to discourage them.
Functions
AEC for x86 supports no user-added functions. Furthermore, the parser implemented there doesn't support functions with more than 2 arguments. AEC for WebAssembly supports functions, because WebAssembly makes it easy to implement them (x86 assembly doesn't have a standardized way to call functions, the way it's done varies by operating system). Importing functions from JavaScript is done like this (excerpt from Analog Clock):
//Let's import some functions useful for debugging from JavaScript... Function logString(CharacterPointer str) Which Returns Nothing Is External; Function logInteger(Integer32 int) Which Returns Nothing Is External;So, function declaration starts with a Function token. After it, comes a function name (ending with an open parenthesis (). After that, comes the list of arguments (it may be empty, and it often is). An element of that list consists of an argument type and an argument name. Unlike in C, argument name is obligatory and the parser is going to complain if you don't insert it. And there doesn't appear to be a simple way to change that. Arguments are separated by a colon token ,. After that comes a Which token, after which comes a Returns token. Then comes a return type (which can be a data type or Nothing). If the function is being imported from JavaScript, then comes the Is-token, External-token and a semicolon. If the function is implemented in AEC, there comes the Does-token, then comes the function body which ends with the EndFunction token. A function exits when the control flow reaches the EndFunction-token or when it reaches a Return-statement. If the function returns Nothing, a Return-statement consists only of a Return-token and a semicolon. If a function returns something, there needs to be an expression between a Return-token and a semicolon, the result of which the function will return to its caller. If the control flow of a function that returns something reaches the EndFunction-token, the function returns 0 (in sharp contrast with both C-like languages, where such a function returns an undefined value, and Rust, where such a function returns a value of last expression in it). There have been a few examples of functions in this specification. Arguments to functions may have default values, specified as follows (excerpt from Empty Function Test):
Function empty_function(Character charArgument:='A', Integer16 shortArgument:=4096, Integer32 intArgument:=32768, Integer64 longArgument:=8*exp(9*log(10)), Decimal32 floatArgument:=22/7, Decimal64 doubleArgument:=pi) Which Returns Nothing Does //It does nothing, but the compiler should still generate valid code. EndFunctionNow, if you call such a function with fewer than 6 arguments, the compiler will not complain, but will supply the rest with default values. Note that this will not work when calling an AEC function from JavaScript. SmallBasic, the first programming language I learned, back when I was 9 years old, from the now-shut-down Croatian website about computers called Enter, and I mentioned it in the introduction, does not support arguments for user-made functions at all, because, as it says, "What do you need function arguments for when all variables are global?". I think that is not a good idea, because all variables being global and there being no function arguments makes it significantly harder to implement recursive functions, and that was one of the problems I faced it when making my Labyrinth game (also mentioned in the introduction) in it. I solved it by storing what should be local variables on global stacks (which SmallBasic standard library includes). And, as an example of how to use stack instead of recursion when targeting x86 using AEC, compare the WebAssembly and x86 versions of the N-Queens Puzzle.
TODO: Find a way to implement
UPDATE on 26/08/2021: I have implemented named function arguments, as illustrated by the Named Arguments Test:
// Should return 0. Function namedArgumentsTest() Which Returns Integer32 Does InstantiateStructure QuadraticEquationSolution solution[1 * 2 * 3]; solveQuadraticEquation(a := 1 , b := -1, c := -1, AddressOf(solution[0])); // Since 'a' is set to default to 1 in the declaration of the // "solveQuadraticEquation" function, it does not need to be written // if it is indeed 1 when calling that function. In the next line, it is not. solveQuadraticEquation(c := -1, b := -1, solution := AddressOf(solution[1])); solveQuadraticEquation(b := -1, c := -1, solution := AddressOf(solution[2])); solveQuadraticEquation(b := -1, c := -1, a := 1 , AddressOf(solution[3])); solveQuadraticEquation(c := -1, a := 1 , b := -1, AddressOf(solution[4])); solveQuadraticEquation(c := -1, b := -1, solution := AddressOf(solution[5])); Integer32 iterator := 0; While iterator < 1*2*3 - 1 Loop If not(areStructuresEqual(AddressOf(solution[iterator]), AddressOf(solution[iterator + 1]), SizeOf(QuadraticEquationSolution))) Then Return iterator + 1; EndIf iterator += 1; EndWhile iterator := 0; // Let's test whether structure comparisons work... While iterator < 1*2*3 - 1 Loop If not(solution[iterator] = solution[iterator + 1]) Then Return iterator + 7; EndIf iterator += 1; EndWhile iterator := 0; While iterator < 1*2*3 - 1 Loop /* * This loop should have no effect, * but the compiler must not crash * when compiling it. */ solution[iterator] := solution[iterator + 1]; iterator += 1; EndWhile If not(solution[0].firstSolution.imaginary = 0 and solution[0].secondSolution.imaginary = 0 and abs(solution[0].firstSolution.real + goldenRatio - 1) < epsilon and abs(solution[0].secondSolution.real - goldenRatio) < epsilon) Then printFloat32(solution[0].firstSolution.real); printFloat32(solution[0].firstSolution.imaginary); printFloat32(solution[0].secondSolution.real); printFloat32(solution[0].secondSolution.imaginary); Return 15; EndIf Return 0; EndFunctionSo, named function arguments are using the := operator, the same as the assignment. I think that makes a lot of sense. I have made a Reddit thread about it, as well as a Quora question about it. On Quora, I also asked why Python and C# allow positional function parameters before named function parameters but not after them, while my programming language allows both and the compiler just issues a warning in case an argument has been overwritten.
UPDATE on 31/10/2023: Ever since the first versions of AEC-to-WebAssembly, there was a bug in the parser preventing you from using multi-argument functions (such as pow) inside default arguments of functions. The version v2.7.0 will fix that issue. So, it will be possible to write this:
Function empty_function(Character charArgument:='A', Integer16 shortArgument:=4096, Integer32 intArgument:=32768, Integer64 longArgument:=8*pow(10,9), //This line won't compile with AECforWebAssembly v2.6.1, but it will with v2.7.0. Decimal32 floatArgument:=22/7, Decimal64 doubleArgument:=pi) Which Returns Nothing Does //It does nothing, but the compiler should still generate valid code. EndFunctionI made that v2.7.0 release while I was high on drugs (0.75 mg of Alprazolam), so I cannot promise you I did everything properly.
Structures
The parser can parse structure declarations of the form:
Structure Point Consists Of Decimal32 x,y,z; Integer16 number_of_dimensions; EndStructure
TODO:
UPDATE on 16/02/2021: My implementation of the N-Queens Puzzle in AEC uses structures. If you have trouble understanding the N-Queens Puzzle program, I'd suggest you to first look into the Permutations Test, as it is based on the same algorithm.
UPDATE on 28/04/2021: I have started a forum thread to help me with implementing structures as arguments to the functions.)
UPDATE on 03/06/2021: It is important to note that the assignment operator when used between structures works very differently in AEC and C++. I made it so because I found the C++ behaviour very confusing while making this compiler, as it appeared to corrupt the Abstract Syntax Tree while compiling. Excerpt from Structure Declaration Test that illustrates the AEC-specific behaviour (the spaghetti function returns 2):
Structure ListNode Consists Of { ListNodePointer next; Integer32 value; } EndStructure; Function spaghetti() Which Returns Integer32 Does { // See the link above, about how C++ behaviour appeared to corrupt the AST. // By common sense, this should return 2. By C++ semantics, this should // return 3. InstantiateStructure ListNode list[3]; list[0].value : = 1; list[1].value : = 2; list[2].value : = 3; list[0].next : = AddressOf(list[1]); list[1].next : = AddressOf(list[2]); list[0] : = ValueAt(list[0].next); CharacterPointer pointer : = AddressOf(list[0]); Return ValueAt(ListNodePointer(pointer)).value; } EndFunction;The assembly code that the AEC compiler produces for structure assignments is slower than one produced by C++ compilers, but, in my opinion, the assembly code by AEC behaves way more intuitively in cases like this. I've asked a Reddit question about that.
UPDATE on 28/08/2021: There
Structure ComplexNumber Consists Of Decimal32 real, imaginary; EndStructure Structure QuadraticEquationSolution Consists Of ComplexNumber firstSolution, secondSolution; EndStructure InstantiateStructure QuadraticEquationSolution solution[1 * 2 * 3]; /* * TODO: Figure out why the compiler crashes if `solution` is made to be a * local variable. I suppose it has something to do with the fact that * it is an array of nested structures. */UPDATE on 31/08/2021: I have found a relatively simple fix.
UPDATE on 05/05/2024: There seems to be a bug in the AEC-to-WebAssembly compiler which is triggered under certain conditions (apparently, only when compiled using Microsoft C++ Compiler) when comparing two structures using the = operator. You can read more about it on GitHub.
UPDATE on 15/12/2024: I've added the support for the TypeOf operator, similar to the one that exists in JavaScript. It returns the type of an expression as a string. Here is an example of how to use it:
#target WASI Function areStringsEqual(PointerToCharacter first, PointerToCharacter second) Which Returns Integer32 Does While not(ValueAt(first)=0 and ValueAt(second)=0) Loop If (not(ValueAt(first)=ValueAt(second))) Then Return 0; EndIf first += 1; second += 1; EndWhile Return 1; EndFunction Structure Point Consists Of Decimal32 x, y, z; EndStructure Function test() Which Returns Integer32 Does InstantiateStructure Point myPoint; Return areStringsEqual(TypeOf(myPoint),"Point") and areStringsEqual(TypeOf("Hello world!"),"CharacterPointer"); EndFunctionMaybe it comes useful to somebody when debugging programs, or to me when debugging the compiler. However, as of the version v3.1.0, there is a bug which crashes the compiler in some cases when using AddressOf with the pointers to functions, and there doesn't appear to be a simple solution.
Inline assembly
AEC for x86 passes everything between the AsmStart and AsmEnd token unchanged to the assembler. AEC for WebAssembly has keywords asm(, asm_i32(, asm_i64(, asm_f32( and asm_f64( which expect a compile-time constant string as an argument which the compiler will process (for example, replace \n with a literal new-line character and \\ with \...) and pass it to the assembler. asm( assumes that nothing will be left on the system stack by the inline assembly code, while other ones assume that a certain WebAssembly type (corresponding to some AEC type) will be left on the system stack by it, which they will then fetch. Since WebAssembly type i32 corresponds to AEC Integer32 and pointers, we can write something like this (excerpt from HybridSort):
//Napravimo sada omotnicu oko WebAssemblerske naredbe "memory.grow"... Function zauzmi_memorijske_stranice(Integer32 broj_stranica) Which Returns CharacterPointer Does Integer32 nova_adresa_u_stranicama := asm_i32 //"asm_i32" kaže compileru da umetne asemblerski kod, i da //pretpostavi da će se nakon njega na sistemskom stogu //nalaziti vrijednost tipa "i32". To očito nije točno ako //netko prebaci JavaScript virtualnu mašinu u 64-bitni //način rada, ali nadam se da to nitko neće napraviti. //Vjerojatnost da će JavaScript virtualnoj mašini trebati //više nego 4GB RAM-a je zanemariva, a vjerojatnost da će //se neki korisni programi srušiti zbog prebacivanja u //64-bitni način rada nije baš zanemariva. ("(memory.grow\n" "\t(local.get 0)\n" //Prvi (nulti) argument funkcije, //"brojStranica". ")\n"); If nova_adresa_u_stranicama = -1 Then //Ako nema više //slobodne memorije... Return -1; EndIf Return nova_adresa_u_stranicama * 64 * 1024; //Na JavaScript Virtualnoj //Mašini, jedna stranica //(page) iznosi 64 KB. EndFunctionRemember that string tokens put next to each other are concatenated by the tokenizer into one string token. Or, in newer versions of the AEC-to-WebAssembly compiler, you can use multiline strings, in fact, that's what I recommend for inline assembly. When writing inline assembly in C or C++, it has bothered me why I cannot write something like return asm("#Assembly language expression");, and instead I have to write asm("#Store an expression into 'variable'"); return variable;. Well, in the WebAssembly dialect of AEC, you can indeed write something like (an excerpt from Analog Clock):
Function sqrt(Decimal32 x) Which Returns Decimal32 Does { If USE_WEBASSEMBLY_SQRT_INSTRUCTION Then { Return asm_f32( R"multiline((f32.sqrt (local.get 0) ))multiline"); } EndIf;
UPDATE on 15/02/2023: I have implemented that you can access variables from inline assembly by their names, by inserting the % character and then the name of a variable. So, no need to add additional arguments to asm in order to use variables from inline assembly, like you do in CLANG or GCC. I realize there are benefits to what CLANG and GCC are doing, but it's both more difficult to implement in the compiler and more difficult to use. So, the code above has been changed to this:
Function sqrt(Decimal32 x) Which Returns Decimal32 Does { If USE_WEBASSEMBLY_SQRT_INSTRUCTION Then { Return asm_f32( R"multiline((f32.sqrt (f32.load %x ;;The compiler will replace "%x" with assembly code representing a pointer to the variable "x". ) ))multiline"); } EndIf;In order to enable modifying the variables from inline assembly, the compiler will not insert assembly code loading that variable, but assembly code representing a pointer to that variable. In this case, it makes the inline assembly slightly more complicated, but, in many other cases, it will make it a lot simpler. In case you need to insert the actual % character inside inline assembly (which I think you never have to in the WebAssembly Text Format), simply write %%. Due to the way the AEC-to-WebAssembly compiler is structured internally (and I still wouldn't know how to do it better), this feature of refering to variables from inline assembly is not available if the inline assembly is in the global scope (outside of a function), and the compiler issues a warning if you try to do that. I have made a Reddit thread about how to best implement inline assembly in your programming language, as well as a StackExchange thread about it.
Built-in functions
AEC for x86 has many built-in functions, wrappers around x86 assembly instructions. These are: sin(, cos(, tan(, atan2(, arctan(, ctg(, arcctg(, sqrt(, arcsin(, arccos(, ln( (natural logarithm), log( (base-10 logarithm), exp( (returns the Euler number to the power of the argument), pow( (which, unfortunately, due to the limitations of mathematical functions built into x86 assembly, causes an error whenever the first argument is negative or zero, and there does not appear to be a simple solution), abs( and mod( (floating-point modulo). Unlike in most programming languages, trigonometric functions expect arguments in degrees (not radians, as in C, C++ or JavaScript). Similarly, the cyclometric functions such as atan2( return the result in degrees, rather than in radians. If you want to use them in AEC for WebAssembly, you can write them yourself, like this (excerpt from the Analog Clock):
Function sin(Decimal32 degrees) Which Returns Decimal32 Does If degrees<0 Then Return -sin(-degrees); EndIf If degrees>90 Then Decimal32 sinOfDegreesMinus90 := sin(degrees - 90); If fmod(degrees, 360) < 180 Then Return sqrt(1 - sinOfDegreesMinus90 * sinOfDegreesMinus90); Else Return -sqrt(1 - sinOfDegreesMinus90 * sinOfDegreesMinus90); EndIf EndIf /* * Sine and cosine are defined in Mathematics 2 (I guess it is * called "Calculus 2" in the English-speaking world) using the * system of equations (Cauchy system): * * sin(0)=0 * cos(0)=1 * sin'(x)=cos(x) * cos'(x)=-sin(x) * --------------- * * Let's translate that as literally as possible to the programming * language. */ Decimal32 radians := degrees / oneRadianInDegrees, tmpsin := 0, tmpcos := 1, epsilon := radians / PRECISION, i := 0; While (epsilon>0 and i<radians) or (epsilon<0 and i>radians) Loop tmpsin := tmpsin + epsilon * tmpcos; tmpcos := tmpcos - epsilon * tmpsin; i := i + epsilon; EndWhile Return tmpsin; EndFunctionOr, you may use the Taylor Series. But I think this obeys the KISS (keep it simple, stupid) principle better. Note that you can't call the JavaScript Math.sin and similar functions, because they are methods of the Math singleton, and there is no standardized way to call methods of JavaScript objects from WebAssembly (for a good reason).
TODO: Build some mathematical functions into AEC for WebAssembly.
UPDATE on 25/07/2021: If you just want to use the square root function in AEC for WebAssembly, you can simply invoke the f64.sqrt or f32.sqrt in inline assembly. It is not recommended to do it because it is supposedly slow (it calculates the square root to maximal possible precision, when usually you just need the first few decimal digits; it was specified in WebAssembly specification only to obey the IEEE754 standard which requires it), but, in the new version of Analog Clock program, I find it fast enough. Here is how the square root function is implemented in the Analog Clock program:
Function sqrt(Decimal32 x) Which Returns Decimal32 Does { If USE_WEBASSEMBLY_SQRT_INSTRUCTION Then { Return asm_f32("(f32.sqrt\n" "\t(local.get 0)\n" ")"); } EndIf; // Binary Search Algorithm... Decimal32 max : = 80 * 80 + 24 * 24, // This function will be used for calculating the // Euclidean distance between cells in the display // grid, and there will be 80x24 cells. min : = 0, i : = (min + max) / 2; If(max * max < x) Then // Shouldn't happen, but let's deal with that anyway. { Return exp( ln(x) / 2); // Much less precise (and probably slower) than binary search. } EndIf; While((max - min) > 1 / PRECISION) Loop { If(i * i > x) Then { max /* * ClangFormat apparently misinterprets the assignment operator ":=" * as the C label marker ':' followed by the C '=' operator, * there doesn't appear to be a simple solution to this problem. */ : = i; } Else { min: = i; } EndIf; i: = (max + min) / 2; } EndWhile; Return i; } EndFunction;UPDATE on 20/02/2022: Here is how I implemented the atan2 function (which is often useful in programming), also an excerpt from Analog Clock:
Decimal32 oneRadianInDegrees : = 180 / pi; //"180/pi" is a compile-time decimal // constant (since we are assigning an // initial value to a global variable), // and, as such, we can use "pi" to // refer to M_PI from the C library, // it's available to the compiler. Function arctan(Decimal32 x) Which Returns Decimal32 Does { // Arcus tangens is equal to the integral of 1/(1+x^2), highschool math. Decimal32 sum : = 0, epsilon : = x / PRECISION, i : = 0; While(i < x) Loop { sum += epsilon / (1 + i * i); i += epsilon; } EndWhile; Return(sum * oneRadianInDegrees); } EndFunction; Function atan2(Decimal32 y, Decimal32 x) Which Returns Decimal32 Does { If(y = 0) Then { If(x < 0) Then { Return 180; } Else { Return 0; } EndIf; } ElseIf(x = 0) Then { If y < 0 Then { Return 270; } Else { Return 90; } EndIf; } Else { If(x > 0 and y > 0) Then { Return arctan(y / x); } ElseIf(x < 0 and y > 0) Then { Return 90 + arctan(-x / y); } ElseIf(x < 0 and y < 0) Then { Return 180 + arctan(y / x); } Else { Return 270 + arctan(-x / y); } EndIf; } EndIf; } EndFunction;
String manipulation
AEC for x86 doesn't support string manipulation at all, it doesn't have a character type. AEC for WebAssembly is about as good at C manipulation as C is without a C library (like when doing operating system development). There are no built-in string-manipulation functions, but one can easily write them oneself, like this (excerpt from Dragon Curve):
//Again, we need to implement string manipulation functions. Like I've said, //even though this program will be running on JavaScript Virtual Machine, it //can't call the methods of the JavaScript "String" class. Function strlen(CharacterPointer str) Which Returns Integer32 Does //We can't implement this recursively, like we did in earlier AEC //programs, because we will be dealing with large strings which will //cause stack overflow. Integer32 length := 0; While ValueAt(str + length) Loop length := length + 1; EndWhile Return length; EndFunction Function strcpy(CharacterPointer dest, CharacterPointer src) Which Returns Nothing Does While ValueAt(src) Loop ValueAt(dest) := ValueAt(src); dest := dest + 1; src := src + 1; EndWhile ValueAt(dest) := 0; EndFunction Function reverseString(CharacterPointer string) Which Returns Nothing Does CharacterPointer pointerToLastCharacter := string + strlen(string) - 1; While pointerToLastCharacter - string > 0 Loop Character tmp := ValueAt(string); ValueAt(string) := ValueAt(pointerToLastCharacter); ValueAt(pointerToLastCharacter) := tmp; string := string + 1; pointerToLastCharacter := pointerToLastCharacter - 1; EndWhile EndFunction Function strcat(CharacterPointer dest, CharacterPointer src) Which Returns Nothing Does strcpy(dest + strlen(dest), src); EndFunction Function convertIntegerToString(CharacterPointer string, Integer32 number) Which Returns Integer32 Does //Returns the length of the string. Integer32 isNumberNegative := 0; If number < 0 Then number := -number; isNumberNegative := 1; EndIf Integer32 i := 0; While number > 9 Loop ValueAt(string + i) := '0' + mod(number, 10); number := number / 10; i := i + 1; EndWhile ValueAt(string + i) := '0' + number; i := i + 1; If isNumberNegative Then ValueAt(string + i) := '-'; i := i + 1; EndIf ValueAt(string + i) := 0; reverseString(string); Return i; EndFunctionTo delete a character from a string, you can use the following function from the Havlik's Law:
Function izbrisi_znak_iz_stringa(CharacterPointer mjesto_u_stringu) Which Returns Nothing Does While ValueAt(mjesto_u_stringu) Loop ValueAt(mjesto_u_stringu) := ValueAt( mjesto_u_stringu + 1 ); mjesto_u_stringu += 1; EndWhile EndFunctionTODO: Build some useful string manipulation into AEC, at least ones for converting numbers to strings and vice versa.
Advanced array manipulation
When it comes to sorting arrays, I've tried to efficiently implement the HybridSort (a sorting algorithm I came up with, a mixture of MergeSort, QuickSort and SelectionSort) algorithm in AEC. HybridSort sorting algorithm is based on the fact that the number of comparisons done by MergeSort depends only on the size of the array, and is always equal to
2·n·log2(n)
, where n is the length of the array, while QuickSort is faster
more shuffled the array is and is slowest for already-sorted and
inverse-sorted arrays. So, sometimes QuickSort is faster, and sometimes
MergeSort is faster. Using a simple
genetic algorithm, I came up with a formula for approximating how many comparisons
QuickSort will do on a given array. That formula is:
e(ln(n) + ln(ln(n))) · 1.05 + (ln(n) - ln(ln(n)) - ln(2))
· 0.9163 · |2.38854 · s7 - 0.284258
· s6 - 1.87104 · s5 + 0.372637
· s4 + 0.167242 · s3 - 0.0884977
· s2 + 0.315119 · s|
, where s is the sortedness of the array (-1 when the array is
inverse-sorted, 1 when it is sorted, around 0 when it is randomly
shuffled), and ln being the natural logarithm, base
e = 2.718.... I am not sure how to test how correct that
approximation is. My algorithm is recursive, and for every iteration of
the recursion, it estimates whether it should behave like QuickSort or
like MergeSort based on those formulas. In case the array is very small,
so that SelectionSort is faster than both QuickSort and MergeSort, or in
case it runs of stack memory, it runs SelectionSort. However, for some
reason, my algorithm is significantly slower than JavaScript
Array.sort method:I am not sure what causes those stairs in the measurement results. Professor Alfonzo Baumgartner thinks it has to do with cache misses, here is what he wrote when I asked him about that in an e-mail:
Poštovani,
objašnjenje koje ja nudim za te 'stepenice' je u činjenici da svi procesori koriste cache memoriju u koje spremaju određeni broj memorijskih stranica.
Ako uzmemo da je jedna memorijska stranica npr. 64K, onda, čim se naš niz poveća samo za 1 element više od 64K, potrebne su dvije memorijske stranice. Čim se naš niz prostire na dvije memorijske stranice, odmah je mogućnost tzv. 'cache miss'-ova kada procesor neće pronaći u svome L1 cache-u tu stranicu pa mora doći do zamjene stranica unutar cache, što usporava izvođenje programa.
Kako Vi povećavate niz, tako se koristi sve više memorijskih stranica u kojima je on zapisan, a sam cache-ing sistem onda ima više posla oko njihovih zamjena.
Zato se za neke duljine nizova dobiva skoro isti rezultat, a onda čim povećamo samo za jedan podatak (koji ide u novu mem. stranicu) dobijemo 'drastično' uvećan rezultat mjerenja.
Ne znam jeste li me shvatili i nisam siguran 100% da je to kod Vas uzrok tih 'stepenica', ali na prvi pogled mi se čini takvo nešto..
Alfonzo.
In any case, I think this is an interesting result worth of further
exploration. Which sorting algorithms have those stairs when
measured? Why do some do, and some (like JavaScript
Array.sort) do not, on the same array?
objašnjenje koje ja nudim za te 'stepenice' je u činjenici da svi procesori koriste cache memoriju u koje spremaju određeni broj memorijskih stranica.
Ako uzmemo da je jedna memorijska stranica npr. 64K, onda, čim se naš niz poveća samo za 1 element više od 64K, potrebne su dvije memorijske stranice. Čim se naš niz prostire na dvije memorijske stranice, odmah je mogućnost tzv. 'cache miss'-ova kada procesor neće pronaći u svome L1 cache-u tu stranicu pa mora doći do zamjene stranica unutar cache, što usporava izvođenje programa.
Kako Vi povećavate niz, tako se koristi sve više memorijskih stranica u kojima je on zapisan, a sam cache-ing sistem onda ima više posla oko njihovih zamjena.
Zato se za neke duljine nizova dobiva skoro isti rezultat, a onda čim povećamo samo za jedan podatak (koji ide u novu mem. stranicu) dobijemo 'drastično' uvećan rezultat mjerenja.
Ne znam jeste li me shvatili i nisam siguran 100% da je to kod Vas uzrok tih 'stepenica', ali na prvi pogled mi se čini takvo nešto..
Alfonzo.
Example program in both dialects of AEC
So, here is an example piece of code in the x86 dialect of AEC:
i := 0 While i < n | i = n If i = 0 fib(i) := 0 ElseIf i = 1 fib(i) := 1 Else fib(i) := fib(i - 1) + fib(i - 2) EndIf i := i + 1 EndWhile fib(n)And here is its near-equivalent in the WebAssembly dialect of AEC:
Function fib(Integer16 n) Which Returns Decimal64 Does Integer16 i := 0; Decimal64 fib[100]; While i < n or i = n Loop If i = 0 Then fib[i] := 0; ElseIf i = 1 Then fib[i] := 1; Else fib[i] := fib[i - 1] + fib[i - 2]; EndIf i += 1; EndWhile Return fib[n]; EndFunctionYou can read slightly more about it at the beginning of the Limitations section of the README.
Conclusion
I think AEC is a promising project, but a lot of work is still needed to make it successful. I don't think I can do everything that's needed for it to be successful by myself. (UPDATE on 06/06/2021: It would, for example, be useful to make a web-based IDE for the AEC-to-WebAssembly compiler, so that somebody can try my programming language directly in the browser. I have opened a Quora question asking for advice about how to do that. We would need to get the AEC-to-WebAssembly compiler, which can already run in NodeJS if compiled with EMSCRIPTEN, to run in a browser, and, to be honest, I do not know enough WebAssembly to do that by myself. Actually, I think almost no web-developer these days has the knowledge needed to make that. We would also need to embed the wat2wasm from WebAssembly Binary Toolkit to run in that web-app, as my compiler relies on it to convert the WebAssembly assembly language it outputs to the bytecode that browsers understand. Somebody has already made wat2wasm run in modern browsers, but they, apparently, left no instructions how they managed to do that.).
UPDATE on 16/10/2020: I've published a YouTube video about programming in your programming languages for the client-side web. If you have trouble playing it, you can download the minified MP4 and try opening it in VLC or a similar program. If nothing else works, try opening the ZIP file with a PDF, an ODP and a PPT file.
UPDATE on 18/02/2022: In case you are interested, here is what Stuxxnet, the moderator of a Discord server about programming, has to say about my programming language:
Come on now, I have written a programming language that compiles to
WebAssembly, I know that stuff [about WebAssembly].
you've also written a log function that runs in an infinite loop if
the input is close to 1you're trying to compare sort functions written in an interpreted language to those provided by that language in its native VM
you're overfitting measurement data to come up with a completely insane formula for the comparisons of quicksort
you're using rand in c++ code
you implement custom math functions using highschool maths that just doesn't work if you want to be numerically efficient or just even get a sane result for the whole range of possible inputs.
I'm sorry, but I really don't have much confidence in your judgement calls...
I once learned that a Serbian company called RT-RK, that also has an office in Osijek, where I was living at the time, was searching for a compiler developer. So I sent them via e-mail the AEC-to-WebAssembly compiler on GitHub, to finally get an entry-level job, after I had been learning to program for 8 years. They did not even invite me for an interview. They responded me in an e-mail that they are searching for somebody who knows in details how GCC or LLVM, preferably both, work internally, and that my project does not show them that I know that.
UPDATE on 05/01/2023: A question I often get asked on Internet forums is, if I have made my programming language, why haven't I also made my own operating system? The answer is fairly simple: While I do have some ideas about what a good programming language would look like and work internally (as you can probably tell by reading the documentation of my programming language), I have no idea what a good operating system would work like. So, I haven't made my own operating system, and I probably never will.
UPDATE on 04/07/2023: I started a StackExchange question about why most programming languages use the same token for EndIf, EndWhile, EndFunction and EndStructure, and that question got many upvotes.
UPDATE on 24/05/2024: A question I often ask myself is what is the proper way of dealing with algorithms that involve tree manipulation in languages such as AEC. Well, here is how I solved that problem in Huffman Coding:
Structure TreeNode Consists Of { Character character; Integer16 frequencyOfCharacter; PointerToTreeNode leftChild, rightChild; Character code[16]; } EndStructure; InstantiateStructure TreeNode treeNodes[32]; Integer16 isTreeNodeUsed[32]; Function newTreeNode() Which Returns PointerToTreeNode Does { Integer16 i : = 0; While i < 32 Loop { If not(isTreeNodeUsed[i]) Then { treeNodes[i].character : = 0; treeNodes[i].leftChild : = treeNodes[i].rightChild : = PointerToTreeNode(0); treeNodes[i].code[0] : = 0; treeNodes[i].frequencyOfCharacter : = 0; isTreeNodeUsed[i] : = 1; If NDEBUG = 0 Then { Character stringToBePrinted[64] : = {0}; strcat(AddressOf(stringToBePrinted[0]), "NDEBUG: Allocating the TreeNode #"); convertIntegerToString(AddressOf(stringToBePrinted[0]) + strlen(AddressOf(stringToBePrinted[0])), i); strcat(AddressOf(stringToBePrinted[0]), "\n"); printString(AddressOf(stringToBePrinted[0])); } EndIf; Return AddressOf(treeNodes[i]); } EndIf; i += 1; } EndWhile; noMoreFreeMemory(); } EndFunction; Function freeTreeNode(PointerToTreeNode treeNode) Which Returns Nothing Does { If not(AddressOf(treeNodes[0]) <= treeNode <= AddressOf(treeNodes[32 - 1])) Then { segmentationFault(); } EndIf; If NDEBUG = 0 Then { Character stringToBePrinted[64] : = {0}; strcat(AddressOf(stringToBePrinted[0]), "NDEBUG: Freeing the TreeNode #"); convertIntegerToString(AddressOf(stringToBePrinted[0]) + strlen(AddressOf(stringToBePrinted[0])), (treeNode - AddressOf(treeNodes[0])) / SizeOf(TreeNode)); strcat(AddressOf(stringToBePrinted[0]), "\n"); printString(AddressOf(stringToBePrinted[0])); } EndIf; isTreeNodeUsed[(treeNode - AddressOf(treeNodes[0])) / SizeOf(TreeNode)] : = 0; } EndFunction; Function freeUpTheTree(PointerToTreeNode tree) Which Returns Nothing Does { If tree->leftChild Then { freeUpTheTree(tree->leftChild); // Calling `freeTreeNode` here instead of // `freeUpTheTree` causes a memory leak. } EndIf; If tree->rightChild Then { freeUpTheTree(tree->rightChild); } EndIf; freeTreeNode(tree); } EndFunction;So, in other words, I think you should make a global array of TreeNode structures, and make another array to track which elements of that array are actually being used and which ones aren't. And write functions which allow you to allocate a TreeNode and deallocate a TreeNode.