A Brutal Look at Balanced Parentheses, Computing Machines, and Pushdown Automata

(raganwald.com)

59 points by warrenm94 days ago

9 comments

macintux85 days ago
One of the few lessons I distinctly remember from college was finite automata in my PL class. I really enjoyed exploring the concepts and writing a grep tool; we were supposed to write either a NFA or DFA processing application, but I decided to write both.20 years later I got to apply some of the same ideas to a language processing application, and it was such a pleasure to actually use something conceptual like that. Made me briefly regret landing in more hybrid infrastructure/automation roles instead of pure software development.Somewhere I may still have my copy of Preperata and Yeh that my professor recommended at the time for further reading. Like most of my books, it was never actually read, just sat around for years.
userbinator85 days ago
we’ll ask, “What’s the simplest possible computing machine that can recognize balanced parentheses?”A counter. That's the difference between theory and practice. Because in practice, everything is finite.
- jibal85 days ago
 The counter is simply the stack depth without bothering with the actual stack. If the stack is empty when you encounter a closer then it's unbalanced. If the stack isn't empty when you reach the end of the input then the items in the stack are unbalanced.If you have multiple kinds of brackets then you need the same number of counters. Each counter corresponds to the number of openers of that type currently on the stack. EDIT: this is wrong. Counters can't distinguish between [() and ([)If you're writing a parser and you want to report the location of an unclosed opening bracket then you need the actual stack.
 - vidarh85 days ago
 You need the actual stack, I think, in the case of multiple types of openers without additional constraints, because if you just have raw counters you'd get tripped up by ([)] or similar.So to generalise your point you need a counter for each transition to a different type of opener.So (([])) needs only 2 counters, not 3.You could constrain it further if certain types of openers are only valid in certain cases so you could exclude certain types of transitions.EDIT:([)] could indeed be handled by just additionally tracking the current open type. (([]]) is a better example, as it shows that to handle deeper nesting you need additional pieces of data that will grow at some rate (at most by the number of opens, possibly lower depending on which types can validly appear within which types)
 - agumonkey83 days ago
 maybe there's an encoding that can allow counting different ordered accumulations succintly.. (thinking out loud here)ps: apparently there's already a lot of research on multidimensional dyck languages (somehow mentionned below)<a href="https://arxiv.org/pdf/2307.16522" rel="nofollow">https://arxiv.org/pdf/2307.16522</a><a href="https://omelkonian.github.io/data/publications/d3.pdf" rel="nofollow">https://omelkonian.github.io/data/publications/d3.pdf</a>
 - spyrja85 days ago
 FWIW it's a fairly straightforward algorithm. In C++:<pre><code> bool balanced(const string& text, const string& open, const string& close) { size_t length = text.size(), brackets = open.size(); assert(close.size() == brackets); stack<char> buffer; for (size_t index = 0; index < length; ++index) { char ch = text[index]; for (size_t slot = 0; slot < brackets; ++slot) { if (ch == open[slot]) buffer.push(ch); else if (ch == close[slot]) { if (buffer.empty() || buffer.top() != open[slot]) return false; buffer.pop(); } } } return buffer.empty(); }</code></pre>
 - gpderetta85 days ago
 Wouldn't two counters report "([)]" as being properly balanced?
 - jibal85 days ago
 No, there's an open [ when the ) is encountered. The problem is the other way around -- my algorithm would report [() as an error. Oops, back to the drawing board. Clearly no counting can tell the difference between [() and ([).
- nmadden85 days ago
 > Because in practice, everything is finite.Indeed! <a href="https://neilmadden.blog/2019/02/24/why-you-really-can-parse-html-and-anything-else-with-regular-expressions/" rel="nofollow">https://neilmadden.blog/2019/02/24/why-you-really-can-parse-...</a>
- testaccount2885 days ago
 you don't need a full counter. increment, decrement, and check_if_zero are enough. no need for get_value.
 - stellalo85 days ago
 you also need check_if_negative to detect close-before-open
 - jibal85 days ago
 The counter is at 0, which indicates an error ... that plus the counter being non-zero when reaching the end of input is the entire point.
- pfortuny85 days ago
 Yes. Actually, a more interesting example which does not complicate the statement (not the problem) too much is to check for nested parenthesis and brackets:(([[()])) -> ok ((([](])) -> not okHope OP gets this message.
 - _0ffh85 days ago
 In case anybody is interested, when we generalize the concept we're talking about Dyck languages.<a href="https://en.wikipedia.org/wiki/Dyck_language" rel="nofollow">https://en.wikipedia.org/wiki/Dyck_language</a>
 - reuben36485 days ago
 I was surprised to not see a connection made to free groups in the article.EDIT: The wikipedia article that is.
 - jibal85 days ago
 .
 - immibis85 days ago
 Your solution incorrectly fails ({}). You need the stack.
 - jibal85 days ago
 You're right ... no counter can tell the difference between ({ and {(. Oops.
praptak85 days ago
"But on a day-to-day basis, if asked to recognize balanced parentheses?"On day-to-day basis you will never encounter this problem in pure form. As the consequence the solutions are not good for the day-to-day stuff.Even if you only are only writting a verifier (which is already a bit unrealistic), you'll need to say something more than "not balanced". Probably rather something along the lines of "closing brace without a matching opening at [position]" or "[n] unclosed parentheses at <end of stream>" which rules out the simple recursive regex approach (counter still works).
- jibal85 days ago
 To report the location of an unclosed opener you need a stack.
 - vidarh85 days ago
 Depends. You want a stack, as it's certainly more efficient, but if you can rewind the position pointer you don't need one (you can count backwards).EDIT: It gets complicated if you need to count multiple different types of openers. In that case I think you need the stack, at least unless there are constraints on which openers can occur within others - you at the very least need to know which closer you're looking for right now, but if you can't deduce what is outside, you obviously then need to keep track of it.In practice, of course, we'll generally use a stack because it's just pointless to make life harder by not using one for this.
 - jibal85 days ago
 If you've encountered 1 million unclosed parentheses, any or all of them could be unbalanced, so to report which ones are, you need 1 million pieces of information. The obvious way to organize them is as stack. Of course there are worse ways to do it. Rewinding the position pointer means that you've kept the entire input as a stack of characters, and now you have to keep track of all the closers on a stack in order to balance them with their openers.You NEED a stack.(And no, I didn't presume anything ... I addressed rewinding above.)
 - vidarh85 days ago
 You're presuming you have only a non-rewindable stream as opposed to a file interface, which is why I was explicit about the requirement to be able to rewind the position to avoid a stack. If you only have a non-rewindable stream, then, yes, you need a strack. If you have a file handle, you do not.(and yes, you did presume something; if you have rewindable file handle, you do not need to keep the characters; you can instead-re-read them)
senorqa85 days ago
The pictures of Brutalist architecture are awesome!
- sevensor85 days ago
  I was hoping for more captions on those, they’re quite fascinating. I wonder if the architects understood what a half century of weathering would do to the surface.
jgalt21285 days ago
Bummer, I thought Reginald Braithwaite was publishing again. When I first entered JavaScript world, I really enjoyed and benefited from his writing and talks.
- a4isms73 days ago
 Here I am!I still enjoy writing code like the code in TFA, but these days people seem a lot less interested in code than organizing their agentic LLMs, so I don't have the same incentive to share whatI find interesting. And it would be terrible marketing, like showing up to audition for a job driving F1... In a Jaguar E-Type.Elegant and beautiful, but that isn't the game any more.
firechickenbird85 days ago
The proof of non-regularity is a bit convoluted. You can easily apply the pumping lemma there
Antibabelic85 days ago
What is some further reading y'all could recommend on formal languages?
- praptak85 days ago
 That's what I learnt from as part of CS curriculum at MiMUW. Can recommend: <a href="https://en.wikipedia.org/wiki/Introduction_to_Automata_Theory,_Languages,_and_Computation" rel="nofollow">https://en.wikipedia.org/wiki/Introduction_to_Automata_Theor...</a>
 - nmadden85 days ago
 Not sure why you're being downvoted for recommending a classic textbook!
- tehnub85 days ago
 sipser's theory of computation
stefantalpalaru85 days ago
[dead]