Optimizing BigInt for small integers

#1

Currently, BigInt is a wrapper around java.lang.BigInteger.

The Spire library has a SafeLong type with identical semantics, which specializes into two cases:

  • an instance when the underlying number fits in a Long,
  • another instance that wraps BigInteger.

That would help tremendously the BigInt performance profile when the underlying value is reasonable.

Spire is considering dropping SafeLong in the future due to the induced triplication (BigInteger, BigInt, SafeLong); however, the optimizations SafeLong provide could be ported to the Scala standard library.

It seems that source and binary compatibility could be completely preserved.

Would that contribution be appreciated? What would be the criteria used to decide whether the PR is merged?

6 Likes
#2

Excuse my ignorance, but I’m wondering: once you wrapped a Long and added logic to detect any overflow, aren’t you already more than half-way towards re-implementing BigInteger? In other words, are we sure that something that switches between Long and BigInteger will perform better than just using a BigInteger straight?

It appears BigInteger is based on an Array[Byte], which makes me wonder whether it would perform better if it was based on an Array[Int] or Array[Long]?

#3

I’m not considering reimplementing BigInteger. As per OpenJDK code, it is already based on Array[Int], and some operations are candidates for intrinsics.

The logic is here: Scala BigInt wraps BigInteger which wraps Array[Int]. Due to people possibly using the class type pattern matching, I doubt “BigInt” will be made an opaque type in future releases.

I makes total sense to optimize the small path of BigInt, which can then be used as a general purpose integer type with minimal overhead and sensible semantics.

However, I don’t have the original benchmark to back up my claim and the speed up would be workload dependent.

(I wouldn’t reimplement BigInteger anyway; the machinery for huge integers is positively scary – Montgomery multiplication etc. The Long/BigInteger split however, can be implemented and tested quite straightforwardly).

1 Like
#4

If this is really a win and if it doesn’t introduce pathological edge cases, wouldn’t it be better to implement it in the upstream OpenJDK BigInteger type so that everyone can benefit?

#5

The JVM BigInteger type is not exactly performant compared to other languages. I have no idea why it is not optimized further (see http://www.wilfred.me.uk/blog/2014/10/20/the-fastest-bigint-in-the-west/ )

However, this is not my immediate concern. What I have on my hand is well performing, thoroughly Scala code that could replace the stdlib wrapper without breaking compatibility.

4 Likes
#6

Hey Denis,

Thanks for your message, yes I think such a contribution would be highly appreciated :slight_smile:. If your implementation of BigInt is an order of magnitude faster than the current one for values that fit in a Long, I have no doubts that your PR will be merged! De you need any help?

4 Likes
#7

Hi Julien,
thanks for your message! I haven’t hacked the stdlib before (apart maybe from bug reports and documentation), so I welcome help in setting up a good process.

First and foremost, we can’t introduce bugs during the change. The Long/BigInteger split will introduce additional logic, and the interesting cases are when the numbers are just at the boundary.

I think the change should be introduced in a few steps:

  1. Change BigInt into an abstract class and a BigIntegerBigInt implementation class. Verify that binary compatibility is preserved, and that tests pass.

  2. Introduce a LongBigInt implementation class; still perform the computations using BigInteger, but allocate the result as a LongBigInt if it fits in a Long. Be sure that existing tests cater for the boundary cases.

  3. Migrate piece by piece the logic from Spire to avoid the detour through BigInteger.

I’m currently working with Ubuntu and Scala on the JVM. I need help setting up a minimal process that verifies that binary compatibility is preserved, and tests pass. The testing process should include parts of the community build as well after each step above, before we merge the PR in the stdlib.

In particular, I’ve never run any tests on my own computer; I’m familiar with testing with the JVM, but with Scala JS or Scala native. I may want to use test procedures with different levels of exhaustiveness to get feedback in a shorter amount of time (hacking Scala is sadly not my day-to-day job).

BTW, I’m sure of the positive performance impact this would have. Is there a standard framework to benchmark improvements in the stdlib, and where is it documented?

4 Likes
#8

MiMa does this for you. I don’t know how to run it manually in sbt (I’m sure someone here or in Gitter can help you with that, or google), but Jenkins runs it as part of the build regardless.

JMH is used for benchmarking; you can find a README and code under test/benchmarks

1 Like