Early Literature Review

Overview

In anticipation for my upcoming MEng project, I’ve been doing some reading on the ecosystem around language servers and related topics like compilers. Admittedly, there doesn’t seem to be much out there on language server implementation apart from @matklad’s work with rust-analyzer.

The MIT EECS website recommends the substantial work of the thesis be done while in residence, so I’ve mostly been starting a literature review so I can hit the ground running when the semester starts.

I’ve also been trying to brush up on my software engineering skills, which have gotten a little rusty from disuse. I’ll need them when I start implementing.

Language Servers and Compilers

A language server is similar to a compiler in that it takes in a set of source files as input and derives meaning from those files. The main difference is that while compilers use their powers for code generation, language servers use their powers to help the programmer be more productive.

Unlike compilers, language server design is not yet an established academic discipline. They haven’t started teaching these things in school. There is no counterpart to the dragon book. Language servers barely existed ten years ago, and only as proprietary pieces of IDEs.

Today, language servers are everywhere as productivity tools for programmers. The rise of language servers can be tied directly to the rise of Visual Studio Code and TypeScript in the past decade, both by Microsoft. The Language Server Protocol itself is an open standard started by Microsoft and used to allow code editors and language servers to talk to each other. We could further trace language servers’ lineage to the IDE work pioneered by IntelliJ and Eclipse, a decade before VS Code.

Every modern programming language now has its language server, or several. Language servers have become so common that shipping a language without at least a modest language server is like shipping a language without a compiler.

Scoping

There is rich potential for research into language servers at the intersection of HCI, compilers, and static analysis. Developers are actively investigating ways to add productivity in the code editor, especially for language-specific features. Recent developments with generative AI have brought tools like NVIDIA’s ChipNeMo or GitHub Copilot, but by no means is machine learning the only frontier left.

With computing where it is in 2023, there is a very, very high ceiling for how sophisticated smart editing can get. All that research is at the bleeding edge of the field. For now, most of that is beyond my reach.

My project is not so ambitious. I’m just looking to make a usable language server for Bluespec, a language that doesn’t have one yet. I’m doing it to improve the classroom experience of students who are using Bluespec as a learning tool.

When students are using sophisticated code editing features in all their other classes, e.g., with Python, TypeScript, or C, it can be disheartening if they aren’t provided the same quality of tools for Bluespec. If hardware development is as important as we make it out to be, it should have dignified tools. That’s the same reason why I wrote the Bluespec syntax highlighter for VS Code.

I’m striving to provide simple, quality of life features that are common across languages, like go-to-definition, hover, signature help, and similar features. These would be built on top of the core of the language server, which I would have to architect, build, and (re)write a grammar of Bluespec for.

If time remains, there is ample room for useful features specific to Bluespec. A language server for Bluespec, like for any programming language, can take advantage of unique language-specific semantics. One such optimization might be to embed scheduling information from the Bluespec scheduler into the editor, such as with conflicting rules. I’m sure there are plenty of other opportunities.

I was first considering writing the language server in TypeScript, but I’m increasingly leaning toward writing it in Rust thanks to the ecosystem of language server implementation tools built by @matklad and the rust-analyzer team, as well as their exemplary documentation.

It’s common for software languages to have their language servers written in their own language. It should be obvious why it’s not an option for Bluespec.

Resources

The closest thing I’ve found to a canonical source of knowledge on language servers is the writing of @matklad, or Alex Kladov. He leads the rust-analyzer project, the cutting-edge language server for Rust. And he’s been very generous with sharing his experience online. I’m learning most everything I can from his writing (both from his site and the rust-analyzer blog) and videos.

I’ve also been supplementing with documentation from other well-made language server projects like the TypeScript Compiler¹ and others from the growing set of language servers.

And of course, there are the official Microsoft documents specifying the Language Server Protocol and a Visual Studio Code quick-start guide for language server development. The issue with these respectively is that the LSP is only a small part of the challenge of writing a language server, and the quick-start guide doesn’t cover any of the details required for a useful language server.

Our best bet is relying on examples of good language servers to know what to build and how to build it.

Code editors, IDEs, Smart Editing

I plan on ignoring the whole “what is a code editor versus what’s an IDE (☝️🤓)” discourse. The main idea is that language servers are the programs that enable smart editing for code.

Language servers enable features like hover, go-to-definition, code folding, type hints, autocomplete, and all the good things one expects from a modern development experience. We give the engineer all the useful information that we can from statically analyzing the code. Features like these used to be exclusive to fancy programs called IDEs, but now they’re supported by all sorts of code editing programs.

The Language Server Protocol is one way (currently, the standard way) for language servers to communicate with different editors like Visual Studio Code. It’s up to the code editor how to expose smart editing features to the developer, and it’s up to the language server to provide the information to the editor.

A good approach, like rust-analyzer’s, may be to isolate the Language Server Protocol part of the project from the language server itself, so that we can handle other protocols should they come along.

Compiler (Non-)Reuse

Can we use the existing Bluespec Compiler? It sure seems like it’d save a lot of work! My current reading suggests probably not. We may be best served starting from scratch.

In particular, there are fault tolerance and latency demands on language servers that aren’t imposed on compilers. A language server is expected to be useful even (maybe especially) on code that doesn’t compile. It’s also expected to be responsive, sometimes at keystroke frequency. That responsiveness involves thinking about both latency and energy consumption.

From what I’ve read by @matklad, compilers and language servers are similar only enough to get you into trouble. While compilers and language servers appear similar, their tasks are very different, and that has extreme implications on the way they need to be implemented.

He writes on his experience writing rust-analyzer in “Why an IDE?”,

LSP did achieve a significant breakthrough — it made people care about implementing IDE backends. Experience shows that re-engineering an existing compiler to power an IDE is often impossible, or isomorphic to a rewrite. How a compiler talks to an editor is the smaller problem. The hard one is building a compiler that can do IDE stuff in the first place. Check out this post for some of the technical details. Starting with this use-case in mind saves a lot of effort down the road.

He writes further in “Why LSP?”,

Before LSP, there simply weren’t a lot of working language-server shaped things. The main reason for that is that building a language server is hard.

The essential complexity for a server is pretty high. It is known that compilers are complicated, and a language server is a compiler and then some.

and

And, when compiler authors start thinking about IDE support, the first thought is “well, IDE is kinda a compiler, and we have a compiler, so problem solved, right?”. This is quite wrong — internally an IDE is very different from a compiler but, until very recently, this wasn’t common knowledge.

Language servers are a counter example to the “never rewrite” rule. Majority of well regarded language servers are rewrites or alternative implementations of batch compilers.

and, most vindicating,

[LSP] moved us from a world where not having a language IDE was normal and no one was even thinking about language servers, to a world where a language without working completion and goto definition looks unprofessional.

It was a joy to find @matklad’s writing because it was consistent with everything I observed when using Bluespec, back when I was a student accustomed to fancy code editing from software languages and put off by Bluespec’s lack of similar support. And @matklad’s writing is so informative in lighting a path forward.

It’s not even that Bluespec’s editor support was always bad. Back when Bluespec was shiny and new, editor support was bad for everything. The world just moved really fast while Bluespec stayed the same. It’s time to catch up.

I know I take pains to distinguish between compilers and language servers. TypeScript’s compiler is new enough and the relationship between TypeScript and JavaScript is such that they made the language server a first class concept in designing the language and compiler. ↩