My advice on implementing the ternary conditional operator
So, you are making a programming language, and you would like to
implement the ternary conditional operator
?: in it? Let's
not go into discussing how it might make code written in your language
less legible, and let's focus only on the strictly technical aspect of
it. For the simple reason that the ternary conditional operator looks
simple, but it is difficult to implement properly, as quite a few things
can go wrong. Here are 5 things I think it's important to be careful
about when implementing the ternary conditional operator.
-
Make sure it's parsed as right-associative.
Because PHP famously parses it incorrectly as left-associative, preventing it from being used to shortly write if-else-if-else statements.a ? b : c ? d : eshould parse asa ? b : (c ? d : e), not as(a ? b : c) ? d : e.
Now, admittedly, it's not at all obvious how to do it nicely in the parser. In both my PicoBlaze assembler and my AEC-to-WebAssembly compiler, I am doing it by scanning backwards, which is considered to be an anti-pattern in the world of Programming Language Design and Implementation, whereas, in my AEC-to-x86 compiler, I am doing it by assuming the rest of the expression has already been parsed, because the?:is the operator with the lowest priority. That's nice to code, but it's not an option if the assignment operator is also an operator so that assignments can be chained. For those who don't know (and, in that case, you probably should not be making your programming language), in many programming languages the assigniment operator=can be chained, so that, if you writea = b = c, the value ofcwill be assigned to bothaandb. The assignment operator in that case is a part of the expression and has a lower priority than?:(and it is also a right-associative operator). I've opened a LangDev StackExchange question about this. Maybe you can do it like this in C++ (I've refactored it many times, this is an excerpt from my AEC-to-WebAssembly compiler):auto findLastIndex = [](std::vector<TreeNode> treeNode, std::function<bool(TreeNode, int)> funcPtr) { for (int i = int(treeNode.size()) - 1; i >= 0; i--) if (funcPtr(treeNode[i], i)) return i; return -1; }; // The `findLastIndex` function from the JavaScript standard library // really comes useful when parsing the ternary conditional `?:` operator, // so why not make one here in C++? // Parsing the ternary conditional `?:` operator, which is right-associative. // We will parse it by "scanning backwards", which is why we are using // `findLastIndex` instead of `std::find_if`. int lastColon; while ((lastColon = findLastIndex(parsedExpression, [](TreeNode node, int index) { index = not(index); // To silence the warning by GCC. return node.text == ":"; })) != -1) { if (lastColon == int(parsedExpression.size()) - 1) { std::cerr << "Line " << parsedExpression[lastColon].lineNumber << ", Column " << parsedExpression[lastColon].columnNumber << ", Parser error: The ternary operator \"?:\" has less " "than three operands." << std::endl; break; } int colon = lastColon; int colonCounter = 1; int questionMark = findLastIndex(parsedExpression, [&](TreeNode node, int index) { if (index >= colon) return false; if (node.text == "?") colonCounter--; if (node.text == ":") colonCounter++; return colonCounter == 0; }); if (questionMark == -1) { std::cerr << "Line " << parsedExpression.at(colon).lineNumber << ", Column " << parsedExpression[colon].columnNumber << ", Parser error: Unexpected token \":\"!" << std::endl; break; } if (questionMark == 0) { std::cerr << "Line " << parsedExpression.at(questionMark).lineNumber << ", Column " << parsedExpression[questionMark].columnNumber << ", Parser error: The ternary operator \"?:\" has less " "than three operands." << std::endl; break; } std::vector<TreeNode> nodesThatTheRecursionDealsWith( parsedExpression.begin() + questionMark + 1, parsedExpression.begin() + colon); nodesThatTheRecursionDealsWith = parseExpression(nodesThatTheRecursionDealsWith); if (nodesThatTheRecursionDealsWith.size() > 1) { std::cerr << "Line " << nodesThatTheRecursionDealsWith.at(1).lineNumber << ", Column " << nodesThatTheRecursionDealsWith.at(1).columnNumber << ", Parser error: Unexpected token \"" << nodesThatTheRecursionDealsWith.at(1).text << "\"!" << std::endl; break; } if (nodesThatTheRecursionDealsWith.size() == 0) { std::cerr << "Line " << parsedExpression.at(questionMark).lineNumber << ", Column " << parsedExpression.at(questionMark).columnNumber << ", Parser error: The ternary operator \"?:\" has less " "than three operands!" << std::endl; break; } parsedExpression[questionMark].text = "?:"; parsedExpression[questionMark].children.push_back( parsedExpression[questionMark - 1]); // Condition parsedExpression[questionMark].children.push_back( nodesThatTheRecursionDealsWith[0]); // Then-clause parsedExpression[questionMark].children.push_back( parsedExpression[colon + 1]); // Else-clause parsedExpression.erase(parsedExpression.begin() + questionMark - 1); parsedExpression.erase(parsedExpression.begin() + questionMark, parsedExpression.begin() + colon + 1); }
Now, how much is it idiomatic C++, I suppose not very much. But I believe its intent is clear.
By the way, in case you are writing an assembler or some other whitespace-sensitive programming language where a new-line token ends the expression, it's probably a good idea to check in the lambda-function in the while-loop condition whether the colon:is immediately followed by a new-line token, like this (This is an excerpt from my PicoBlaze assembler written in JavaScript, closely corresponding to the beginning of the excerpt from the AEC-to-WebAssembly compiler dumped above):if (!Array.prototype .findLastIndex) // Firefox 52 (the last version of Firefox to run on // Windows XP) does not support the findLastIndex // directive on an array. Array.prototype.findLastIndex = function(lambda) { for (let i = this.length - 1; i >= 0; i--) if (lambda(this[i], i)) return i; return -1; }; let lastColon; while ((lastColon = tokenized.findLastIndex((node, index) => { if (index > tokenized.length - 2) return false; if (node.text == ':' && tokenized[index + 1].text != '\n') return true; return false; })) != -1) {
I think the error message will be a lot more clear that way than if you put the\ntoken as the third child of the?:tree node. -
Make sure the code specifying the condition is being executed
before the second and the third operand.
In my AEC-to-x86 compiler, I screwed that one up. Why is it important? Because, sooner or later, some user of your language will try to protect themselves from a divide-by-zero error by writing something like this:d == 0 ? 0 : 1 / d. Ifdis zero, the code compiled with my AEC-to-x86 compiler will still trigger a division by zero error. So, for example, do NOT output something like this:; This is incorrect assembly code for the expresssion `d == 0 ? 0 : 1 / d`,Such an error probably cannot happen when you are targetting WebAssembly (rather than x86), as there
; as it triggers a division by zero in case `d` is zero.
finit
mov dword [result],0x3f800000 ;IEEE754 hex of 1
fld dword [result]
fld dword [d]
fdivp
mov dword [result],0x0 ;IEEE754 hex of 0
fld dword [result]
fld dword [d]
mov dword [result],0x0 ;IEEE754 hex of 0
fld dword [result]
fcomp
push ax
fstsw ax
mov al,ah
lahf
and ax,0xba45 ;https://www.vogons.org/viewtopic.php?p=1130827#p1130827
or ah,al
sahf
pop ax
fstp dword [result]
jne operandsOfTheEqualityOperatorAreNotEqualLabel883768
fld1
jmp endOfTheEqualityOperatorLabel577125
operandsOfTheEqualityOperatorAreNotEqualLabel883768:
fldz
endOfTheEqualityOperatorLabel577125:
fistp dword [result]
xor eax,eax
cmp dword [result],eax
jz firstOperandOfTheTernaryOperatorIsZeroLabel537447
fstp dword [result]
mov eax, dword [result]
fstp dword [result]
mov dword [result],eax
fld dword [result]
jmp endOfTheTernaryOperatorLabel775264
firstOperandOfTheTernaryOperatorIsZeroLabel537447:
fstp dword [result]
endOfTheTernaryOperatorLabel775264:
fstp dword [result]
?:translates quite directly into the typed if-expression(if (result i32).... -
Make sure you output a sensible error message in case somebody
puts structures of different types as the second and the third
argument.
Don't output an internal compiler error saying that some part of the compiler attempted to compile an array of negative size, like my AEC-to-WebAssembly compiler used to output, because the user won't be able to guess what's going on from that message. It took me, a person who wrote that compiler three years earlier, hours to figure out what exactly is going in the compiler in that case. -
In case you also support labels with the colon syntax, like C or
most dialects of assembly, make sure you correctly determine which
colon belongs to a label and which one to a ternary conditional
operator.
A good way to determine that is: if the colon is the second token in a statement, then it is a part of a label, otherwise it's a part of a ternary conditional. If you screw that one up, you will run into problems such as this problem in my PicoBlaze assembler. If you are parsing right-associative operators by scanning backwards, you will need to parse?:in two different passes: one to tell which:belongs to a label and which one to a ternary conditional, and the other to actually parse?:. In my PicoBlaze assembler, the tokenizer is doing the first pass (which is probably a significant violation of the separation of concerns, as that is almost certainly the parser's job, rather than the tokenizer's job). -
In case you are writing an assembler, make sure the preprocessor
can deal with labels being put as the second and the third
operand.
Because, sooner or later, somebody will try to use the ternary conditional operator to determine, based on some compile-time condition (say, whether we are in a release or a debugging mode), where the program shouldjumpto. Like this, for example:jump NDEBUG ? the_permutations_algorithm : printing_the_sorted_array
(That's an excerpt from my implementation of the permutations algorithm in PicoBlaze assembly language, line number 56. If we are in the debugging mode, soNDEBUGis zero, we will print the array to make sure BubbleSort has done its job correctly. To add some context, relying on the array being already sorted before starting the permutations algorithm is supposed to make the algorithm for finding all the permutations of the array significantly faster because then the permutations algorithm can check in a constant time whether some element from the original array has already been attempted for the adding to the current attempt instead of doing it in the linear time -- the permutations algorithm can simply check whether the next element in the sorted array is equal to the current one and conclude based on that whether the current one has already been tried, instead of iterating all the way to the end of the input array to determine that. Though, the actual wall-clock-time measurements do not really confirm that the described clever trick makes the permutations algorithm significantly faster. And it's not due to BubbleSort being slow, as the time complexity of BubbleSort, O(n2), is negligible compared to the time complexity of finding all the permutations of an array O(n · n!) that happens after it, and not outside of it so that the complexities multiply. A small constant speed-up (not quite linear, since there are other loops of length n running beside that loop -- those that check whether the count of the currently being added element is smaller in the current attempt than in the input array) of the permutations algorithm should speed up the program significantly, even if it is at the cost of a quadratic preprocessing of the input. I would really like to know why this seemingly-clever trick to speed up finding all the permutations does not seem to work.)
Your assembler must not crash on such a line of code, like mine used to.