Oops
Oops is a project that I’ve been working on in some form since May of last year. It is my passion project: my attempt to create what I would consider to be my ideal programming language. Building upon my long experience with various programming languages, it makes some decisions that certainly differ from typical programming languages.
The beginning
I started working on Oops at some point in May of last year. That was when the Github repository was created, anyways. The 1st implementation of Oops was in Python, and I don’t think any record of it remains. Even the Git branch is gone. Oops back then was also quite a different language; I hadn’t refined a lot of my thoughts on what I like in programming languages. As far as I remember, it was basically Lua with a proper object and class system. The Lua-style syntax using “end” to denote the end of blocks instead of C-style curly braces is present even in the modern version of Oops. I tend to like languages that use keywords as opposed to symbols for some reason.
I didn’t know OCaml yet, and at that time I was in Computing IV getting more practice with C++, so I don’t think I was even comfortable enough to use that yet. All things considered, Python isn’t a terrible language to write an interpreter in. Since Python 3.10, it’s had pattern matching, and named tuples allow you to simulate ADTs pretty well. Lark is a really great parser generator library. The obvious downside though: it’s slow. Python’s already pretty slow as it is, so any language built on top of it will also be slow. I’ve heard about RPython which can automatically create JIT compilers, but I’ve yet to write anything using it.
Switching to OCaml
As part of the Organization of Programming Languages class at UML, I learned OCaml which is now my favorite programming language. At this point I highly doubt I would ever choose to write a programming language using anything else. OCaml does have some problems, like Dune being a confusing mess, the standard library not being great, and Menhir’s nonexistant error messages, but I can deal with them in exchange for static typing and functional programming.
The next iteration of Oops was a tree-walking interpreter in OCaml. As I said previously, the language itself changed quite a bit in between the Python and OCaml implementations as I did further exploration into the possibility space of programming languages. I wanted 3 specific things out of Oops: a language that was 1. functional, 2. dynamically typed, and 3. object oriented, but without all the problems and weird design patterns of conventional OOP. This space of programming languages is pretty underdeveloped, and I’d say the only languages that fit here are Lisp (and most of its derivatives), Erlang, and Elixir. Elixir in particular was undoubtedly the one I drew the most inspiration from for this iteration of Oops, with it happening to have very similar syntax to Lua, which the previous iteration drew inspiration from. The agent stuff is interesting, but I didn’t care about it much. 2 particular design choices I took from it were 1. its lack of iteration (while/for loops), which I thought would encourage a more functional, recursion-oriented style of programming and 2. its system of protocols. Protocols are also present in Clojure, as well as in Rust under the name of “traits.” They originally come from Haskell, where they’re called “typeclasses.” The idea of them is essentially that a protocol/trait/typeclass/whatever specifies a list of functions/methods/whatevers that a type must have in order to “implement” the protocol. In return, the protocol provides some new behavior for the type, usually using the required behavior provided for implementation. This just sounds like an interface from conventional OOP, right? But the difference is that when you create a class, you have to specify what interfaces it implements at the same time as it’s created, because you can’t add methods later to implement anything else. Protocols can be implemented at any point, during or after the type’s creation. In Oops, where they’re called traits like in Rust, they work like this:
trait Stream
# 3 required methods called head, tail, and done. Since Oops is dynamically typed, no point in requiring parameter counts or anything.
head
tail
done
# In exchange for implementing those 3 methods, your type will receive:
def map(f)
if self.done() then
[]
else
f(self.head()) :: self.tail().map(f)
end
end
def filter(f)
if self.done() then
[]
elseif f(self.head()) then
self.head() :: self.tail().filter(f)
else
self.tail().filter(f)
end
end
def fold(f, i)
if self.done() then
i
else
self.tail().fold(f, f(i, self.head()))
end
end
end
struct FibStream
# A type with 2 fields called current and next. No method declarations yet.
current
next
end
impl Stream for FibStream
# We define the 3 methods that Stream requires of us, and in exchange...
def head()
self.current
end
def tail()
FibStream(self.next, self.current + self.next)
end
def done()
self.current >= 4000000
end
end
# ...we receive a very easy solution to Project Euler problem 2 in O(n) time.
print(FibStream(1, 1).filter(fun(x) x % 2 == 0 end).fold(fun(x, y) x + y end))
This tree-walking interpreter implementation worked perfectly well, but there was one particular feature that had to be implemented a little weirdly: pattern matching. The implementation consisted of 4 main modules: Exp, which just has the type for ASTs; Value, which defines the types of values, runtime errors, and types; Env, which is used for variable lookups; and Eval, which is where the actual interpreter is. As you’d expect for a tree-walking interpreter, it’s 1 giant pattern match on the AST. But the Match AST node invoked a function called match' which had to be defined in the Value module… why? It seems it was because pattern matching on struct constructors (expressions like FibStream(1, 1) from above) accessed some of the internal details of types which the Value module didn’t export. Looking back on this, this is absolutely a case of me trying to make it so each module reveals as little as possible about its implementation. On its face, this is a smart thing and a well-known programming principle, but there’s times when it goes too far, and it’s something I’ve had to learn how to balance as I’ve written more and more OCaml. As far as I’m concerned now, it’s okay if you reveal a record type’s implementation if it means not having to have a bunch of pointless getter/setter functions. Of course, when I wrote this I also wasn’t aware of private types, and that certainly would have made things easier to reveal.
Looking back at the tree-walker, it’s kind of shocking how small it is. .mli files and lexer/parser stuff aside, the implementation of the 4 modules + main.ml comes out to not even 400 lines of code. This is a fully-featured programming language that includes pattern matching, error handling, new type definitions, traits, modules, and importing from other files. People like to pretend that making a programming language is some monumental, Herculean task, but it really isn’t. Making a practical, production-ready, user-friendly one that has good error messages, better performance, a LSP implementation, a package manager, etc., that’s probably a lot harder, but I haven’t tried that yet.

The bytecode compiler
With the language being basically fully-implemented by the tree-walker, I decided I wanted to take it to the next step and make a language that compiled to bytecode. Step 1: read the 2nd half of Crafting Interpreters that I never got around to. Why did it have to be in C…? With Oops being a serious project of mine, I decided I didn’t want to jump right in and make it my first compilation project because I knew it would create a bunch of really bad, hard to understand code that was probably based on bad understanding of what a compiler even is. So after doing some very basic experiments, I made Mickey. Since writing that post, I’ve actually started working on Mickey again, and uh… it’s ballooned in scope quite significantly. I had to do things like write my own MIC-1 simulator, and an assembler, and a linker… and that was all just so that I could write unit tests for the compiler. Sorry to Jim Canning, his students, Richard Boccuzzi, Bill Moloney, and others, but your simulator was not good enough for my purposes. Anyways, I made Mickey and learned stuff about compilation.
I mostly followed along the architecture that was specified in Crafting Interpreters for Oops’ bytecode compiler. It has a stack-based VM. Code is stored in a “chunk,” which also contains any constants, names, and exception handlers that that code might use. What the VM receieves to run code is not a chunk though, but a closure, which contains the chunk, some function data like the number of parameters and locals it has, the module (namespace it was defined in), and the upvalues.
“What’s an upvalue?” You may ask. It’s kind of like an updog, if you know what that is.
Okay, upvalues were actually an extremely tricky concept to understand and especially to implement. When you have a lexical closure and it references a value that’s defined outside of that closure, that’s an upvalue.
def f()
x = 5
def g()
y = x + 5
y
end
g
end
In g, x is an upvalue. If you were to call f, you would get a function as a result. If you want to call that function, it has to keep track of x’s value, but by the point it’s being called, x doesn’t exist anymore. It’s already out of scope since it’s a local variable of f, which has returned.
In the tree-walker, closures can keep track of nested scopes like this. They have their own local scope, and then a reference to the scope of the function it’s being defined within, which gives it access to its variables too, and then any successively higher functions until eventually it reaches the global level.
In the bytecode compiler, the section of g’s closure that keeps track of the upvalues would have an entry that references x. Specifically, it would contain the index into f’s local table that can be used to find x.
def f()
x = 5
def g()
y = x + 5
def h()
x + y
end
g
end
f
end
Okay, but what about if the variable being referenced is defined even higher up? The closure table only allows you to point to locals defined in the next highest function. What if the variable is an upvalue itself in that function, like how x is an upvalue in both g and h? An upvalue is either a local from the surrounding function, or it’s an upvalue from the surrounding function, which is a nice recursive definition. In my implementation, an upvalue has type int * level where level = Global | Upvalue | Local. If it’s a Local, it points to the value at index n in the surrounding function’s local table. An Upvalue references upvalue #n in the upvalue table. I think you can imagine what a Global points to.
Problems arise when you want mutable upvalues. If you want to do something like the accumulator factory, you need a way to reference an upvalue and also change its value. OCaml doesn’t let you just make pointers/references out of any value, you have to specifically create it as a reference. So we have this very awkward bytecode instruction called MakeCell that replaces a local in some function with a “cell,” which is just a ref holding whatever its previous value was. That way, when we do something that mutates an upvalue, it’ll do it on the cell and thus change its value in its defining function too. This is exactly how Python does it. Lua, which as far as I know is the language that invented upvalues, has an instruction called SETTABUP which I think just kind of lets mutate the upvalues directly. Lua is written in C, so I think they can in fact just do that. (But Python is too?)
Exception handling is thankfully quite painless. Each chunk in the call stack has a table of ranges with keys and an address as an integer. If an error occurs while the current frame’s IP is within that range (and yes each frame has its own instruction pointer as opposed to a global one, it makes things easier trust me), it jumps to the indicated address. If it doesn’t find a suitable range, it goes up 1 frame in the stack and tries again. If it goes up the whole stack and fails, that’s an uncaught exception. This is basically how Java handles it, except I think their table has the ability to discriminate based on the thrown error’s type. I think that could be implemented in Oops, I just didn’t for some reason.
Some things that got changed in the language itself were that I made modules a file-level thing and not just something you could make anywhere (the tree-walker had things like module M exports A, B, C (and then a bunch of definitions) end), better import/export control including stuff like import X as Y, import X for Y, Z, W (from X import Y, Z, W in Python), and that structs have a slightly different syntax:
struct FibStream
current = 1 # Default values now allowed
next = 1
end
FibStream{current: 1, next: 1} # Arguments are passed like by keyword rather than position
This actually wasn’t something I wanted to add, the compiler made me do it, I swear. Pattern matching was a huge pain to implement, and I think the problem was because I wasn’t using an intermediate representation in between the AST and the assembly, the compiler just went straight from one to the other. Pattern matching creates a really weird CFG and the way that I compile jumps makes things like that very hard to keep track of. The specific reason I had to change the constructor syntax was for a similar reason to pattern matching being weirdly implemented in the tree-walker. Structs have their values stored in an array rather than a hash table. The struct has a reference to its type, which itself has a hash table that points field names to indices into the array. This was meant to make structs more space-efficient, even if it does probably make accessing fields a bit less time-efficient. We have instructions to reference and get values for fields by their names, but not any to retrieve them just from a raw index. I didn’t really want to add one either so instead I just redesigned the entire syntax of constructors? Wait, what was I thinking?
The sheer difficulty of implementing pattern matching (and even to this day it’s not implemented 100% correctly or to the level that I want) heavily demotivated me and kind of made me give up on Oops for a good while. Compiling pattern matching is actually something that’s been written a lot about, like here or here, but these are mostly in the context of making them efficient and optimizing them, and they involve strange things like doing linear algebra on a pattern matrix(?!). I didn’t want to do that at first, I just wanted to make it work, but I could barely even get it to work at all in the end.
As a whole though, I was pretty pleased with how far I got in the compiler. It achieved feature parity with the tree-walker in everything except pattern matching, and in some features like structs and modules I’d even say it surpassed it and added new stuff. By mid-July, this stage of Oops was pretty much finished.
And uh, that’s actually where we are today. The last commit was on July 18th with the message “pattern matching… finally.”
What now?
I haven’t touched Oops at all since then, but before I started working on Mickey again, I was making preparations for the next stage: compilation to LLVM. I now have a job working with LLVM IR, so hopefully what I learn there will help me here when I start working on Oops again. It certainly seems like quite the task though, especially given that OCaml’s LLVM bindings, while officially maintained and supported by LLVM themselves, aren’t very well documented. The last version of LLVM with ANY official OCaml documentation at all was 12. Before I look up when that was released, I’m going to take a guess and say it was in like 2008 or something.
