Reverse Fetch at Box
As a Software Engineer Intern on the Messaging subteam of the Data and Compute team, I worked on adding a feature called reverse fetch to the asynchbase library library that is used to make scans on the HBase messagebus. After adding the feature, I also started on its integration into the Box backend in which it is used to fetch the most recent events in a particular user’s folders.
Tools: Java, Scala
Code: Github, Updated merge into source
Deck: Box intern fair presentation
Apache HBase (named from Hadoop database) is an open source, scalable, distributed key-value store modeled after Google’s Bigtable. The asynchbase library is an open source library that performs asynchronous functions on the data base. All entries are ordered chronologically in HBase, so a forward scan means, given a starting time and ending time, give all data in the database in between those two times, and reverse fetch was the opposite. While the native client for HBase supported reverse scans, the asynchbase library used by Box did not have this functionality yet, and my summer project was to provide this functionality. As my first foray into distributed systems and open source, I learned a lot about the challenges of building a feature for a fault-tolerant distributed system and publishing clean, concise code for use by anyone.
For more about my first days at Box, check out my blog post on the intern website