<i>Homoiconic</i> has a pretty clear definition. It was coined by someone in reference the property of a specific system, many decades ago. That system stored program definitions in the same form that the programmer entered them in (either just the original character-level text, or some tokenized version of it), allowing the definitions to be recalled at runtime and redefined. He turned "same form" into "homoiconic" with the help of Greek/Latin. It's all in the Wikipedia.<p>Line numbered BASIC is homoiconic: you can edit any line of code and continue the program.<p>POSIX shell lets functions be redefined. They can be listed with the <i>set</i> command executed without arguments, and copy-pasted.<p>In Common Lisp, there is a function called <i>ed</i>, support for which is implementation-defined. If support is available, it is supposed to bring up an editor of some kind to allow a function definition to be edited. That is squarely a homoiconic feature.<p>Without <i>ed</i> support or anything like it, the implementation does not retain definitions in a way that can be edited; i.e. is not homoiconic. Some Lisps compile everything entered into them; you cannot edit a <i>defun</i> because it has been turned into machine language.
> Line numbered BASIC is homoiconic: you can edit any line of code and continue the program.<p>Oh man, anyone else remember those self-modifying BASIC programs, which would:<p>1. Clear the screen<p>2. Print a bunch of new BASIC lines on the screen, with a CONTINUE command at the end, thus:<p><pre><code> 100 PRINT $NEWVAR
110 <whatever>
CONTINUE
</code></pre>
3. Position the cursor at the top of the screen<p>4. Enable some weird mode where "Enter" was considered to be pressed over and over again<p>5. Execute the BREAK command, so that the interpreter would then read the lines just printed?<p>I forget the kinds of programs that used this technique, but thinking back now as a professional developer, it seems pretty wild...
I think this comment re-enforced my sense the author wanted to drive to a destination and didn't want to divert down a road of "why LISP homoiconic is different to eval()" which I think was .. lazy.<p>The idea has merit. Having the REPL deal with the parse structure of data in such a way that taking parsed data and presenting it as code has a lower barrier to effective outcome on the current run state than eval() is pretty big.<p>I'd say eval() isn't self-modifying. You can't come out the other side of eval() with future execution state of yourself different. As I understand it, the homoiconic features of LISP means you can.
><i>In Common Lisp, there is a function called ed, support for which is implementation-defined. If support is available, it is supposed to bring up an editor of some kind to allow a function definition to be edited. That is squarely a homoiconic feature.</i><p>It's enough that the language stores the current source code and can reload it for that. So hot-code-swapping/reload is enough, not homoiconicity needed - which makes it not so squarely a homoiconic feature.
> He turned "same form" into "homoiconic" with the help of Greek/Latin.<p>Well, sort of. Mostly that's just English.<p>There's no Latin at all, but <i>hom-</i> [same] and <i>icon</i> [image] are arguably Greek roots. The Latin equivalents would be <i>eadem</i> [same, as in "idempotent"] and <i>imago</i> [image, and the feminine gender of this word explains why we need "eadem" and not "idem"]. I'm not sure how you'd connect those. (And you might have issues turning <i>imago</i> into an adjective, since the obvious choice would be <i>imaginary</i>.)<p>However, since <i>icon</i> begins with a vowel, I don't think it's possible for <i>hom-</i> to take the epenthetic <i>-o-</i> that appears when you're connecting two Greek roots that don't have an obvious way to connect. If the word was constructed based on Greek principles, it would be <i>hom(e)iconic</i>. Treating <i>homo-</i> as a prefix that automatically includes a final O is a sign of English; in Greek they're separate things.<p>I remember that when there was a scandal around cum-ex financial instruments, a lot of people wanted to say that cum-ex was Latin for "with-without", which it isn't; it's Latin for "with-from". ("Without" in Latin is <i>sine</i>, as compare French <i>sans</i> or Spanish <i>sin</i>.) Cum-ex is <i>English</i> for "with-without", and the same kind of thing is going on with <i>homoiconic</i>.
I'd like to offer some additional amateur translation options for "homoiconic" to Latin. There's already a decent word "conformis" which has the close English counterpart "conformal", but if we're inventing new words, I'd propose "coninstar", as in "con-" meaning "together in/sharing" and "instar" being "representation/form".
<i>Con-</i> before vowels is <i>co-</i>; compare <i>cohabit</i>; <i>coincide</i>.<p>(Technically, you wouldn't expect an N before vowels anyway because the root word ends in an M, so hypothetically you'd have "cominstar". But since the consonant just disappears before vowels, that's moot. [Though technically technically, disappearing when before vowels is expected of M - this is a feature of Latin pronunciation generally - and not of N.])
I'll plead ignorance here, and ask for clemency on the grounds that modern coinages like "conurbation" may be exempt, and also that there seem to be notable exceptions to this rule, like this example I've thrown together[0] :<p>"con"+"iacio" (also "jacio")
=> "conicio" (also "coicio" also "conjicio")<p>(Also "coinstar" is a trademark of those spare change gobblers you find after the register at Walmart.)<p>[0] <a href="https://www.perseus.tufts.edu/hopper/text?doc=Perseus:text:1999.04.0059:entry=conicio" rel="nofollow">https://www.perseus.tufts.edu/hopper/text?doc=Perseus:text:1...</a>
And Google's etymology feature says that <i>con-</i> and <i>-ation</i> are English, while <i>-urb-</i> is Latin.<p><a href="https://www.google.com/search?q=conurbation" rel="nofollow">https://www.google.com/search?q=conurbation</a>
> also "jacio"<p>It'd be a better example of an exception if it unambiguously started with a vowel. This is sort of the reverse of the case I pointed to above, where "habito" <i>does</i> start with a vowel, or rather it almost does, enough to trigger the same changes.<p><a href="https://www.etymonline.com/word/com-" rel="nofollow">https://www.etymonline.com/word/com-</a><p>> Before vowels and aspirates, it is reduced to co-; before -g-, it is assimilated to cog- or con-; before -l-, assimilated to col-; before -r-, assimilated to cor-; before -c-, -d-, -j-, -n-, -q-, -s-, -t-, and -v-, it is assimilated to con-, which was so frequent that it often was used as the normal form.<p>I and J aren't different letters in Latin, but they are different kinds of sound, if sometimes only hazily different. Same goes for U and V. By modern convention we have <i>convention</i> and <i>conjecture</i>; the hazy difference seems sufficient to explain why the Romans left us every variety of the compound, from <i>coniicio</i> through <i>conicio</i> to <i>coicio</i>. A naive analysis (the most I can really do) would say that <i>coniicio</i> comes from someone who sees <i>iacio</i> as starting with a consonant, <i>coicio</i> comes from someone who doesn't, and <i>conicio</i> is a reduced form of <i>coniicio</i>.
It seems that the Rust macro system is inspired by a similar idea: In the first step (the "reader" in this article's terminology), the source is converted into something called a <i>token tree</i>.<p>A token tree is not a full parse tree with resolved operator precedence and whatnot. It only has child nodes for bracket pairs ((), [] and {}) and their contents, in part to determine where the macro call ends. Otherwise, it's a flat list of tokens that the macro (what this article would call the "parser") can interpret in any way it wants.
If I understand the gist of this article it goes like ...<p>1. Scanner divides source-code-string into ordered chunks each with some identifying information, what is the type and content of each chunk.<p>2. The next stage better NOT be a "Parser" but a "Reader" which assembles the chunks into a well-formed tree-structure thus recognizing which chunks belong togeether in the branches of such trees.<p>3. Parser then assigns "meaning" to the nodes and branches of the tree produced by Reader, by visiting them. "Meaning" basically means (!) what kind of calculation will be performed on some nodes of the tree.<p>4. It is beneficial if the programming language has primitives for accessing the output of the reader, so it can have macros that morph the reader-produced tree so it can ask the parser to do its job on such a re-morphed tree.<p>Did I get it close?
> 2. The next stage better NOT be a "Parser" but a "Reader" which assembles the chunks into a well-formed tree-structure thus recognizing which chunks belong togeether in the branches of such trees.<p>> 3. Parser then assigns "meaning" to the nodes and branches of the tree produced by Reader, by visiting them. "Meaning" basically means (!) what kind of calculation will be performed on some nodes of the tree.<p>So, an "AST builder" that is followed by a "semantic pass". That's... how most of the compilers have been structured, at least conceptually, since their invention. In particularly memory-starved environments those passes were actually separate programs, launched sequentially; most famously the ancient IBM FORTRAN compilers were structured like this (they couldn't manage fit both the program being compiled <i>and</i> the whole compiler into the core; so they've split the compiler into 60-something pieces).
It helps to read the article... the author was not introducing this as a novel concept, but elaborating on how this is a better mental model for how an interpreter or compiler works. It's not Tokenize -> Parse, it's Tokenize -> Read -> Parse.<p>The article discusses this particularly with regards to the meme of LISPs being "homoiconic". The author elaborates that the difference between LISPs and other programming languages lies actually not in "homoiconicity" (a Javascript string can contain a program, and you can run `eval` on it, hence Javascript is "homoiconic"), but in what step of the parsing pipeline they let you access: with Javascript, it's before Tokenization happens; with LISPs, it's after Reading happened, before the actual Parse step.
I've actually read the article, thank you; the author also argues that this "bicameral" style is what allows one to have useful tooling since it can now consume tree-like AST instead of plain strings. Unfortunately, that is <i>not</i> the unique advantage of "languages with bicameral syntax" although the author appears (?) to believe it to be so. The IDEs has been dealing with ASTs long before LSP has been introduced although indeed, this has only been seriously explored since the late nineties or so, I believe.<p>So here is a problem with the article: the author believes that what he calls "bicamerality" is unique to LISPs, and that it also requires some S-expr/JSON/XML-like syntax. But that's not true, isn't? Java, too, has a tree-like AST which can be (very) easily produced (especially when you don't care about the semantic passes such as resolving imports and binding names mentions to their definitions, etc.), and it has decidedly non-LISP-like syntax.<p>And no, I also don't believe the author actually cares all that much about the reader/parser/eval being available inside the language itself: in fact, the article is structured in a way that mildly argues against having this requirement for a language to be said to have "bicameral syntax".
<p><pre><code> > So here is a problem with the article: the author
> believes that what he calls "bicamerality" is unique to
> LISPs, and that it also requires some S-expr/JSON/XML-
> like syntax.
</code></pre>
I didn't find that assumption anywhere in the article. My reading is that all interpreters and compilers, for any language, are built to implement two non-intersecting sets of requirements, namely to "read" the language (build an AST) and to "parse" the language (check if the AST is semantically meaningful). Therefore, all language implementations require Tokenization, Reading and Parsing steps, but not all interpreters and compilers are structured in a way that cleanly separates the latter two of these three sets of concerns (or "chambers"), and (therefore) not all languages give the programmer access to the results of the intermediate steps. Java obviously has an AST, but a Java program, unlike a LISP program, can't use macros to modify its own AST. The programmer has no access to what the compiler "read" and can't modify it.
I liked the first half of the article, but I'm not sure I got anything from the second half. As the author notes, in order to be useful a definition must exclude something, and the "bicameral" distinction doesn't seem to exclude anything; even Python eventually gets parsed into a tree. Conceptually splitting out "parsing" into "tree validation" and "syntax validation" is slightly interesting (although isn't this now a <i>tricameral</i> system?), but in practice it just seems like a simple aid to constructing DSLs.<p><i>> These advantages are offset by one drawback: some people just don’t like them. It feels constraining to some to always write programs in terms of trees, rather than more free-form syntax.</i><p>I think this is misdiagnosing why many people are averse to Lisp. It's not that I don't like writing trees; I love trees for representing data. But I don't think that thinking of code as data is as intuitive or useful as Lisp users want me to think it is, despite how obviously powerful the notion is.
I also struggled with the "bicameral" definition. The best I could come up with is that because e.g. Scheme represents code and and data in the same way (isn't there a word for this?) it's possible to represent and manipulate (semantically) invalid code. This is because the semantics are done in the other "chamber". The example given was `(lambda 1)` which is a perfectly good sexp, but will error if you eval it.<p>This could be contrasted with C where code (maybe more precisely program logic) is opaque (modulo preprocessor) and can only be represented by function pointers (unless you're doing shellcode). Here the chamber that does the parsing from text (if we don't look inside GCC) also does semantic "checking" and so while valid functions can be represented within C (via the memory contents at the function pointer), the unchecked AST or some partial program is not represented.<p>I've tried not to give too many parentheticals above, but I'm not sure the concept holds water if you play tricks. Any Turing machine can represent any program, presumably in a way that admits cutting it up into atoms and rearranging to an arbitrary (potentially invalid) form. I'd be surprised if this hasn't been discussed in more detail somewhere in the literature.<p>This
I thought part of the beauty of homoiconicity, which doesn't seem to be mentioned here, is not just that it's natural to interpret tokens as code, but that it's possible to interpret <i>the code of the program that's currently running</i> as tokens, and manipulate them as you would any other data in the program?
It is not only Lisp. PostScript is also homoiconic; tokens have values like any other values (and procedures are just executable arrays (executing an array involves executing each element of that array in sequence), which can be manipulated like any other arrays). The {} block in PostScript is a single token that contains other tokens; the value of the token is an executable array whose elements are the values of the tokens that it contains.<p>Strings don't make it "homoiconic" in the usual way, I think; so, JavaScript does not count.
i have been using python as syntax "carrier" for many Domain languages/DSL. Re-purposing what constructs like class:.., with..: certain func-calls, etc. mean within that. Works wonders.. though one has to be careful as it may not look like python at all :/
Another language with this property is FORTH, which has many surprising similarities with LISP. I like to call it “LISP, but the other way round.” It usues RPN instead of PN, stacks/arrays instead of lists, and is procedural instead of functional.
I was thinking about this reading the article. In fact, I’ve recently seen Lisp implemented in Forth[0] and Forth implemented in Lisp[1]. In both cases, the implementations are decently complete and surprisingly efficient (i.e. not “toy” interpreters).<p>I think this is due to a significant property shared by both languages: the parser’s primary role is distinguishing between numbers and anything that’s not a number. No need to worry about operator precedence, keywords, or building complex syntax trees. Tokens are numbers and “not-numbers”, and that’s it.<p>In Forth, a “not-number” is a Word, and in Lisp a Symbol, both of which can be variables or functions. The only difference between the two is that Forth checks for Word definitions first, and Lisp checks for numbers first. If you wanted to redefine 4 to 5 for some reason, Forth’s got your back, but Lisp will save you ;).<p>A Forth Dictionary is very similar to a Lisp Environment; they both serve as a lookup table for definitions, and they both allow the programmer (or program!) to redefine words/symbols.<p>They also both have REPLs to facilitate a much more dynamic development cycle than other REPLs in most languages.<p>I could go on, but on a fundamental level the similarities are striking (at least to me, anyway). It’s an interesting rabbit hole to explore, with lots of “drink me” bottles laying around. It’s fun here.<p>[0] <a href="https://git.sr.ht/~vdupras/duskos/tree/master/item/fs/doc/comp/lisp.txt" rel="nofollow">https://git.sr.ht/~vdupras/duskos/tree/master/item/fs/doc/co...</a><p>[1] <a href="https://github.com/gmpalter/cl-forth">https://github.com/gmpalter/cl-forth</a>
> <i>Data are data, but programs—entities that we can run—seem to be a separate thing.</i><p>Is this a view some people actually hold? Would be interesting to see some argumentation why someone would think this is the case.
How one could have spent any time at all studying Lisp starting in the 80s (!) and not understand what the word "homoiconic" means is <i>baffling</i> to me!
The term homoiconic does not come from the Lisp culture. I think it might have been in the 1990s that it came into use as a way of describing a property of languages in the Lisp family, using a different definition from the original homoiconic, and it might have been introduced by outsiders.<p>Using Google Books search, we can identify that a 1996 book called <i>Advanced Programming Language Design</i> by Raphael A. Finkel uses the word in this new way, claiming that TCL and Lisp are homoiconic.<p>The word returns to flatlining towards the end of the 1990s, and then surges after 2000.
"We started with Lisp, so let’s go back there. What is Lisp? Lisp is a feeling, an emotion, a sentiment; Lisp is a vibe; Lisp is the dew on morning grass, it’s the scent of pine wafting on a breeze, it’s the sound of a cricket ball on a bat, it’s the…oh, wait, where was I. Sorry."<p>Leaving this here, with the deepest respect.<p>Eternal Flame - Julia Ecklar
<a href="https://www.youtube.com/watch?v=u-7qFAuFGao" rel="nofollow">https://www.youtube.com/watch?v=u-7qFAuFGao</a>