1. Introduction
  2. Writing
  3. Reading
  4. Hardware
    1. Multiplication/Division Functional Unit
    2. Floating Point Unit
    3. Out-of-Order Issue FIFO
    4. Sophisticated Search FIFO for Store Buffer
    5. Associative Caches
    6. (Maybe) Architectural Exploration + Branch Prediction
  5. Hardware-Adjacent
    1. “Remedial Study”
  6. Footnotes

Introduction

Now that I’ve got the website up and running and I’ve begun my job search, it’s time to figure out what to do with all my time while unemployed.

I’ve been in the US for a little under two weeks since I got back from my post-graduation trip to visit family in Hong Kong, and I’ve pretty much spent this time working on writing up my recent projects and setting up the website. A couple days ago, I sent out a couple of feelers for jobs (one of which I’m very excited about 👀), and I’m giving those feelers a little bit of time to see where they end up. If they don’t work out, then I’ll widen the search.

Tomorrow morning, I’ll be going on another two week trip, this time to visit friends back at MIT.

Six cheesesteaks wrapped in foil paper.

I’m packing my suitcase full of cheesesteaks to share with them, since I’m coming up from Philadelphia. I’m not sure what to bring back down.

I’m also planning on helping a bit with the EC build effort, as I’ve done the past two years (though that was as an undergraduate; see me in the video). I haven’t yet had time to make new friends from after college, so there you go. I’ll be spending a lot of time away from my computer and doing a lot of hauling lumber and impact driving and helping people move, so I doubt I’ll have much energy left over to do stuff for this website.

Before I leave, I figured I should commit a list of things I might do during my period of unemployment after I get back. Since I’m coming off a period of pretty high activity (working on my processor while in Hong Kong, working on this website and the project write-ups for the past two weeks), there are a lot of potential actionables bouncing around my head. Better put them all down while I’m pumped and full of energy.

Writing

I made this website mainly to showcase my projects, but I’m also glad to have it as a place to put my thoughts. One of my favorite non-technical1 classes was 21W.735: Writing and Reading the Essay. I took it because I wanted to get better at writing about myself and things I care about. Most of the content we read were memoirs or memoir-like.

With time to write, I can start committing some opinions or guides to my blog. For example, the recent COVID-19 pandemic was so significant and singular an event, and yet I feel like I and so many others I know haven’t really processed it. I figure I better write some thoughts about it now that I’ve had a little bit of distance from it, but before it all fades from memory. I did spend 1.5 years of my 4 year MIT education remotely, and a good chunk more on a very COVID-cautious campus. If it was such a big part of my experience, I may as well get some decent material out of it.

I might also blog about silly things. Maybe I’ll do something with my years-long dream journal. Maybe I’ll just write what comes to mind.

Reading

I’ll have some time to read again.

I’d like to finish reading a couple of short story collections that I’ve started reading. One is Labyrinths by Jorge Luis Borges. Another is Dear Life by Alice Munro. I might also continue reading Kafka’s Letters to Milena. I’d have to see how I feel in a few weeks.

Some time ago I started reading but put down The Art of Memoir by Mary Karr. If I begin writing about myself online more seriously, I may pick it back up.

I might also catch up on some old articles of the New Yorker and the Atlantic (I know, basic; whatever) that I’ve been missing since I started being busy. Some of the articles are no good (like the ones where the writers just summarize their week on Twitter) but some of them are very good.

(Not exactly reading, but I can also catch up on the movies and TV shows that I want to see but haven’t yet.)

Hardware

The fortune of having a processor project is that it behaves as a scaffold from which I can add other projects. I’ve been careful to organize the code in such a way that I’m able to easily extend it. That isn’t to say there isn’t some technical debt,2 but it’s a clean enough surface to continue on.

I’ve only included things that I would seriously do as immediate next steps for the processor. If I was only trying to make a long list of things I could possibly do, then it would be a very, very long list without much use to anyone. I’ve already mentioned most or all of these things in the original project write-up, but here they are in greater detail. Some things that I could’ve included, I didn’t, like adding a reorder buffer or a register renaming scheme. Those feel further down the line than these.

I’m not likely to be doing anything groundbreaking with these exercises. Everything here’s been done before. My ten-instructions-in-flight rv32i processor isn’t going to be going toe-to-toe with Intel’s or AMD’s hundreds-of-instructions-in-flight-with-a-thousand-extensions processors. But I think doing them myself (and doing them in Bluespec) should be good practice for gaining familiarity with computer architecture. I’m in a region of tech where (I think) it’s still important to have a solid mental model of what’s going on.

Multiplication/Division Functional Unit

The M standard extension for RISC-V. A multiplier has a very straightforward interface that I can design for and test.

interface MultiplierDivider;
	method Action start(Word a, Word b);
	method ActionValue#(Tuple2#(Word, Word)) result;
endinterface

I would write some scaffolding to test it, then I can just implement it with a multicycle approach. The test might involve either handwriting test cases or doing some metaprogramming in C or Python to generate test cases.

A hardcoded test case might look like this:

struct MultiplicationTestCase {  // similar for division
	int32_t input_1;
	int32_t input_2;
	int32_t expected_lower;
	int32_t expected_upper;
};

Once I have it working, I can pipeline it (if applicable) and work on reducing the critical path enough to co-exist alongside my ALU functional unit. I’d probably lean pretty heavily on a textbook design to figure out how to organize my implementation.

Floating Point Unit

This one’s a little tougher, and corresponds to the F extension for RISC-V. One reason why it’s tougher because we need to add another register file just for floating point.

It would make the most sense for me to separately implement and test a floating-point adder, multiplier, and divider according to what RISC-V would like to see. I’d probably use a similar scheme of hardcoded test cases (either handwritten or generated) as the multiplication unit.

This exercise would probably take a decent while since I’d need to follow the standard rather closely. There are a lot of components where I could mess up.

Out-of-Order Issue FIFO

I don’t have an appropriate data structure for my processor to actually do out-of-order issue in a way that allows us to see more than two instructions deep and at a time. That’s a serious limitation if we want a serious out-of-order processor.

The data structure might look something like this:

interface OOOFIFO(type Token);
	method Action enq1(Token token);  // add to the FIFO (port 1)
	method Action enq2(Token token);  // port 2 for above
	method Vector#(8, Maybe#(Token)) packets;  // inspect the past 8 tokens
	method Action pick1(Bit#(3) index);  // take that index out of the FIFO (port 1)
	method Action pick2(Bit#(3) index);  // port 2 for above
endinterface

I would like pick to cause all the following tokens to move forward in line. Of course, I might need to adjust the interface or functionality if implementing this turns out to be impractical. I wonder if it’s a similar situation to how the LRU cache replacement policy can be too expensive to implement.

To integrate this data structure into my processor, I’ll need to have additional logic that checks each Token in the packets (in a good enough order) to see whether that token can be picked. For my processor, it would mean not having an outstanding dependency in the scoreboard.

I’ll take a look at the literature to see if there’s a smart way to do it. Also, note that the interface I’ve presented can be used for either a centralized issue buffer or a per-functional-unit buffer. When it’s time to integrate into my processor, I’d need to think through which one to use.

Sophisticated Search FIFO for Store Buffer

I mentioned in my processor report that my store buffer is implemented using a data structure with really bad fanout (or otherwise doesn’t scale very well with the number of values held in the FIFO). I can take some time to design and implement a better search FIFO for my store buffer. Then I can increase the size of the store buffer from 8 to some higher value.

interface SearchFIFO#(type Token);
	method Action enq(Token token);  // add to the FIFO
	method Action deq;  // remove last token
	method Token first;  // get last token (without removal)
	method Token get(Bool isMatch);  // pluck applicable token
endinterface

Notice the interface doesn’t require that we remove the token we pluck. We’re just trying to see whether the value is there. And also, because of the nature of our store buffer, we want the get method to prioritize more recent values (because those stores overwrite earlier stores).

This might also include looking into the literature to see how fully associative buffers search are implemented.

Associative Caches

I think my cache implementation already lends itself to introducing some associativity, but I would just need to actually do it. It should reduce our cache eviction rate at the cost of some critical path. Part of this task is to also assess the improvement and cost.

(Maybe) Architectural Exploration + Branch Prediction

There are some possible improvements to my processor that I don’t actually know the potential amount of benefit for. I would first need to get ahold of some benchmarks, probably from the RISC-V benchmarks again, and compile them into files that my processor can run.

I remember hearing architecture exploration of the design space often occurs at a higher level than RTL, maybe with something like C++ that isn’t necessarily synthesizable. I’ve seen this strategy used with branch prediction, for example. I’ll need to see if that’s worthwhile for me, even with the promise of Bluespec of being able to do it all.

If it is (and only if it is), I’m thinking of testing things like:

  • Adding a branch history table (BHT)
  • Adding a return address stack (RAS)
  • Increasing associativity of my caches
  • Increasing out-of-order instruction window.
  • Increasing the size of the store buffer.

These are improvements to existing functionality, so this sits apart from adding further functionality (like adding the multiplication or floating-point extensions).

I’d need to consider whether implementing the features on an even higher level than BSV (and forgetting about synthesis) saves me enough work that I should do the modeling at all. It would require me to learn to model the processor in something other than BSV. And upon a successful discovery, I would still need to implement the thing in BSV (but hopefully after narrowing down the possibilities).

Hardware-Adjacent

Aside from hardware itself, there’s quite a lot of stuff around hardware that I could also be working on.

  • The syntax highlighting in VS Code for Bluespec isn’t where it should be, which detracts from my development experience when I work on hardware. I’d need to figure out what the extension uses to determine syntax highlighting, then adjust it. I can submit a pull request to add it into the extension, if it’s still maintained.
  • The Rouge lexer doesn’t support Bluespec syntax highlighting for Jekyll blog parsing, which detracts from my ability to present more readable Bluespec code on this blog. I’d need to learn how to write such a lexer, but when I’m done, I can submit a pull request to add it in.
  • There’s either no existing way or no obvious way to generate a graphical representation of a Bluespec module and its submodules. I could write a program that either parses my *.bsv source files or some downstream set of files (*.bo, *.ba) and convert them to something like a JSON that reflects the high-level organization of my module. Then I can either find a program to plug that JSON into to get a graphical representation, or I can use some of my TypeScript and web stack knowledge to build one. I just want to be able to have nice schematics from Bluespec without needing to draw them myself.
    • Bluespec does have scheduling graphs that show relationships between rules. I’ll take a look at that too.
      • (And it’s a little annoying how the Bluespec website is almost always down. And even when it isn’t down, all the content is from May 2011 and before).
  • I can write some resources that I wish I had when I was starting out learning computer architecture and Bluespec. Either the ecosystem around Bluespec sucks or I’m just not finding the resources (which still says something about the ecosystem).
    • If this whole job search thing doesn’t work out, I might consider going into the MIT EECS MEng program and seeking a TA or RA position. I do have some interest in pedagogy; it just doesn’t seem like a very good deal.

“Remedial Study”

If it turns out that I’m not quite hirable in this state (which, I don’t know; I think I’m hirable, but I have about as good a view of the job market as a grasshopper has of its forest), then I can brush up on tools that I may be expected to know as someone who is trying to enter the computer architecture industry.

I’ve been using ASIC WORLD’s tutorials for SystemVerilog and Verilog, but I might hunt for other resources. I’ve been assured by my professor Thomas that other HDLs should come pretty easily to me as someone who already knows how to think in terms of computer architecture. Plus, (I’m told) good habits or patterns from Bluespec translate pretty cleanly to other HDLs.

If I find out other big things I’m missing for a job search in computer architecture, then I’ll tack those onto the list.

Footnotes

  1. Colloquially, HASS classes are called non-technical, but it’s not like we didn’t spend the class learning and practicing techniques for writing. “Non-technical” is sort of a STEM-centric view of things. 

  2. I know there are some dead cycles in some of the pipeline visualizations. These are most likely from little conflicts between rules (either real conflicts or scheduler-doesn’t-know-enough-to-know-better conflicts) that I can sort out given a little bit of time.