Bob Jenkins, Software Developer

bob_jenkins@burtleburtle.net
https://burtleburtle.net/bob

I like algorithms, physics, math, and combinatorics. I like experimentation, but I also like seeing my work widely used. I want to design and write and maintain my code myself. C, Java, C++, C# are fine languages. I want to give users what they want.

I don't want to be a manager. I don't want to monitor other people. I like working with other developers and being helpful.

Work Experience

IBM (1988) I spent an undergraduate summer as an intern at IBM (Owego, New York) working for Dick Steflik. I debugged a large educational management system written in REXX. The existing code tended to shrink when I edited it. I wrote a total of minus 5000 lines of code that summer. I also developed a new tool to automatically lay out text hierarchy charts that looked like ones built by hand.

Oracle (1989-2006) I joined Oracle in October 1989 and worked there for 17 years, except for nine months sabbatical while I pursued (did not complete) a doctorate at Berkeley (August 1992 - June 1993). Oracle's code (C and PL/SQL) is very good. It usually didn't shrink when I edited it.

  • Arithmetic (1989-1990) (select 1+2 from dual) I made Oracle addition faster, multiplication 4x faster, division 6x faster, square root 66x faster, and added sin, cos, tan, sinh, cosh. Nobody after me made much headway until Edmund Waugh, about 1998, doubled the speed of most routines again. (He replaced my if-statements with table lookups.) I later did a floating point big number library at home designed along the same lines.
  • Snapshots (1990-1995) (create snapshot s as select * from t@link): I did the first implementation of snapshots (now called materialized views), updatable snapshots, and the job queue. Replication took over snapshots and the job queue in 1995 when I returned to constraints full-time.
  • Constraints (1990-2006) (create table t (t1 primary key, t2 references t)) I inherited constraints after the basics had been finished. I implemented delete cascade, deferred constraints, unique constraints enforced by nonunique indexes, constraint/trigger interactions (mutating errors), and enable-novalidate constraints (which allow constraints to be enabled without any long term locks). A vital component was a skip list with rollback integrated with Oracle transactions.
  • Character semantics (2000-2006) (varchar2(10 char)) I was in charge of writing the spec for Unicode support for Oracle 9i (the meetings were many and vigorous) and implementing it in the RDBMS (three others helped me straighten out the system quirks that blocked it). I also fielded most issues with the CHAR datatype, which uses blankpadded semantics. For example, what do you pad to when there is both a byte limit and character limit?
  • Function-based indexes (1997-2004) (create index ti on t (t1+1)) I did the functional index design and interface. Three of us implemented it, combining it with parallel projects for virtual columns and drop column. I also did descending indexes, including descend and undescend functions (sys_op_undescend(sys_op_descend(x)) = x), since the index layer could only support binary comparison.
  • Parser (1998-2006) (select p from from q) I owned Oracle's 25 year old 100,000 line recursive-descent SQL parser. This meant I reviewed, but did not write, one to three changes to it a day. I slowly straightening out the parser's quirks. For example, I added tokenizer support for UCS2, and I would have removed the requirements that SQL strings be mutable and null-terminated if I hadn't left in 2006 before it was completed.
  • General cursor compilation (1989-2006) (select * from (select * from dual)) I've dealt with most of cursor compilation, as well as row sources and the index layer. I was a catch-all for problems that didn't fit cleanly in any area, or belonged to code that nobody knew anymore. For example I exposed a fixed table of all built-in SQL operators and fixed two dozen crashes that happened when you fed them simple (but inappropriate) arguments. I taught the RDBMS architecture course twice.
  • Hashing (1997-2006): I was Oracle's resident expert on hash functions. Hashing is not a one-size-fits-all problem, so I was regularly consulted on what hash to use where, how many collisions are expected with so many buckets, and how many bytes of hash are needed to make collisions "never" happen. I also sought out issues with random numbers, algorithms, combinatorics, and math.

Microsoft (2006-present) I am working in the Cosmos team at Microsoft. Large streams of data are split into 250MB compressed append-only extents. Extents are stored in triplicate, and can be read by user processes that run on the same machines. Mirrors copy streams from one cluster to another.

  • Cosmos Store (2006 - 2007) I worked under Andrew Kadatch on the Cosmos Store, written by Andrew in C++. I modeled the system, documented its rules, enhanced the logging, wrote pretty-printers for the binary files, and fixed bugs. I worked on the lengths of extents, monitoring disk space, and made CRC computation 6x faster.
  • Cosmos PN (2007) The PN (Process Node) is in charge of managing the user processes running on a given machine. I owned the PN about a month. I revise password management so we could handle passwords aging out without any deployments or any downtime.
  • Scope (2007 - 2010) The Scope language is roughly SQL clauses and C# expressions, and it maps to execution graphs where the nodes run on the Cosmos Store. I implemented SELECT, aggregates, expressions, and parsing. I replaced GREP-style parsing with a top-down parser, which I later unwillingly replaced with YACC. I designed and implemented constant folding, expression normalization, and compiletime variables.
  • Cosmos Store (2010 - present) Several things.
    • An in-memory B-tree to quickly determine which extent holds the nth byte of a stream
    • Fetching, in one call, the locations of extents covering many byte ranges in a large stream
    • Improved reporting of data loss; general ownership of store reliability
    • Manual and automated measures of data coldness, availability and reliability
    • Configuring mirrors through a source-controlled XML file that describes all mirrors (rolled back)
    • Refactoring the distributed concurrent mirrors tests to use that config (kept the refactoring)
    • Determining extent placement through a tree of hash tables, cached on all clients (rolled back)
    • Removed so many bottlenecks from lazy replication. Around 20x more throughput.
    • Parallelized garbage collection, allowing metadata rings to hold 4x more data.
    • All main data objects in smoothly growable hash tables based on virtual memory arrays
    • Faster and smaller storage of updatable lists of integers (a skip list of short array)
    • Unlimited cluster scaling (only implemented for temp data)
    • Fairer and lower latency job scheduling (not implemented)
    • 60x faster CRC concatenation
    • Erasure coding: 4/3 or 7/3 instances per extent. Less space and more reliable than 3-instance.

Education

Undergraduate (plus masters) (1985-1989) Carnegie Mellon University. I graduated with a BS in Computer Science and a masters in math, QPA 3.7. My master's thesis was a new dynamic algorithm for calculating the HOMFLY polynomial for directed knots and links. I drew cartoons for both campus papers.

Graduate (1992-1993) I spent a year at Berkeley in the doctoral program for theoretical computer science, but returned to industry after deciding I didn't want to be a professor.

Other Projects

I develop software tools and primitives at home. I put it all on burtleburtle.net, public domain.

I'm interested in cryptographic primitives. My pseudorandom number generator, ISAAC, was presented at the 1996 Fast Software Encryption Conference. It hasn't been broken, it hasn't seen much analysis, it hasn't been widely used. I developed some test suites to test ISAAC, and have also used them to find biases in RC4 and many other CPRNGs proposed on sci.crypt. I can break 5-bit RC4. I submitted Maraca to the SHA-3 competition in 2008, but it was rejected, and later thoroughly broken (arbitrary collisions can be calculated with 2 seconds of computing time). In 2012 I wrote SpookyHash, a fast 128-bit noncryptographic hash.

Hashes for hash table lookup are similar to cryptographic block ciphers. I developed some theory for what makes a good hash function for hash table lookup. I wrote a hash sieve to generate random hashes and see if they are good. It could screen ten a second. I have released several good hashes. They have been used far and wide.

The surprising efficiency of my hash sieve led me to develop jenny, a pairwise testing tool.

I wrote an orbital applet in Java, later rewritten to Javascript, to simulate a set of orbits I'd designed that should form a Dyson Sphere, or maybe even a dense computing core. I have online simulations of the solar system, Cruithne, and Klemperer Rosettes. I had to derive a 9th degree symmetric multistep method because the existing methods I could find (leapfrog, Runge Kutta, Runge Kutta Fehlberg) weren't accurate enough for my purposes. I later found my multistep method was reinventing the wheel.

I have a collection of about 50 boy scout skits that I've seen, which is fairly popular. I've written short stories. Some years I write music.

I don't watch TV, or play sports, or participate in any organized activities. I do follow science, the stock market, read science fiction, and practice musical instruments. Most of my time outside of work these days goes to raising kids.

References are available upon request. Contact me via email at bob_jenkins@burtleburtle.net.