Improving toInt and friends

martijnhoekstra · May 7, 2017, 9:28pm

Do have you any evidence for this?

Yes. If you run the overflow benchmarks in the repo, you see ~1000,000 ns vs ~15,000 ns

RichardBradley · May 7, 2017, 10:54pm

In that test, you do:

Try(value.toByte).toOption

This makes it very hard for the JVM to elide the exception for each result, which isn’t a realistic test. I would expect to see the 20% slow down (or perhaps parity or better, depending on the proportion of invalid input) with code more like:

try {
   value.toByte
} catch {
    case _: NumberFormatException => -1
}

You are talking about applications which parse a great many numbers in a tight loop, right? So what will they do on invalid input? Presumably most will either skip, insert some kind of sentinel or abort the whole thing. The first two of those cases should look like the above – with a try/catch in the same method, which allows the JIT to eliminate or heavily optimise the exception. (And in the third case the exception perf doesn’t matter.) Your benchmark forces the JIT to keep the full exception mechanisms, which isn’t a fair test.

martijnhoekstra · May 7, 2017, 11:04pm

Fortunately, there is a test harness there, so you could easily test that assertion. If you did, you’d see 1000% slower execution.

EDIT: That’s wrong, you’d see 10000% slower (100 times) longer runtime.

Ichoran · May 8, 2017, 12:20am

Thanks for trying to help us get this right, but

You’re ignoring or misinterpreting all the other answers on that SO post that document just how slow exceptions with stack traces are;
You are misinterpreting the result that you’re citing; and
The result that you’re citing is done using very poor methodology, making it one of the least trustworthy answers on that page.

As to point (2), the quote says that 4% of the cases are errors. That’s an extra ~100 ns for 1/25th of a thrown exception, suggesting that exceptions take about 2500 ns, roughly in line with other results. Not very good compared to a 10-20 ns parse time. Absolutely not “20% slowdown” if you have lots of failed parses.

It is a good point that Try should be elided in favor of try (though escape analysis probably takes care of most of that), but you found a good resource to learn from and you might want to read and think through it more deliberately to understand why Martijn and I are concerned about the overhead of exceptions (and/or run the test that Martijn wrote that demonstrates the overhead).

If you can find a case where the JIT compiler elides a stack-trace-forming exception, that would be great. Maybe we can use that pattern here! But I’ve never found a case where it works unless you have stackless exceptions, and even that is very finicky.

retronym · May 8, 2017, 1:07am

HotSpot has an optimization to use pre-allocated exceptions on for built-in exceptions (e.g., the NPE thrown if the receiver of an invokevirtual bytecode is null).

⚡ scala -J-XX:-OmitStackTraceInFastThrow
Welcome to Scala 2.12.1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112).
Type in expressions for evaluation. Or try :help.

scala> val thrown = collection.mutable.Set[Throwable]()
thrown: scala.collection.mutable.Set[Throwable] = Set()

scala> def test(c: AnyRef) = try { c.hashCode } catch { case e: NullPointerException => thrown += e; 0 }
test: (c: AnyRef)Int

scala> val cs = Seq[AnyRef](Int.box(0), "", new Object(), null)
cs: Seq[AnyRef] = List(0, "", java.lang.Object@251c8145, null)

scala> (1 to 10000000).foreach(_ => cs.foreach(test))
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
^C^CException in thread "JLine Shutdown Hook" java.lang.OutOfMemoryError: GC overhead limit 

⚡ scala -J-XX:+OmitStackTraceInFastThrow
Welcome to Scala 2.12.1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112).
Type in expressions for evaluation. Or try :help.

scala> val thrown = collection.mutable.Set[Throwable]()
thrown: scala.collection.mutable.Set[Throwable] = Set()

scala> def test(c: AnyRef) = try { c.hashCode } catch { case e: NullPointerException => thrown += e; 0 }
test: (c: AnyRef)Int

scala> val cs = Seq[AnyRef](Int.box(0), "", new Object(), null)
cs: Seq[AnyRef] = List(0, "", java.lang.Object@191c6e13, null)

scala> (1 to 10000000).foreach(_ => cs.foreach(test))

scala> (1 to 10000000).foreach(_ => cs.foreach(test))

scala> (1 to 10000000).foreach(_ => cs.foreach(test))

scala> (1 to 10000000).foreach(_ => cs.foreach(test))

scala> thrown.size
res4: Int = 14396

You can also turn off all stack traces (not recommended in general!) with -XX:-StackTraceInThrowable. That can be useful for performance analysis.

retronym · May 8, 2017, 1:23am

It’s also worth noting this benchmark of various flavours of exceptions vs returning an error code. (@Ichoran presented similar charts as Scala Days San Fran a few years back)

https://shipilev.net/blog/2014/exceptional-performance/

Turns out that stackless exceptions are faster than error codes if the error rate is below 1/100 because the control flow is more efficient.

I think there is value in providing more choice to the user here (as the library author, you don’t know the user’s error rate!), but I thought this was good background to help the discussion here.

szeiger · May 8, 2017, 3:58pm

See also the discussion leading up to https://twitter.com/Blaisorblade/status/856548987066830848 where we got some interesting performance from exceptions.

Ichoran · May 8, 2017, 4:47pm

This is true only in particular local cases. Otherwise the runtime stack-unwinding overhead gets you. (You were testing within-method throw/catch.)

NthPortal · August 25, 2017, 3:32am

I’ve been thinking a lot recently about how to deal with the ‘fail with an exception’/'return an Option ’ problem recently, and I have come up with a solution using a typeclass. The typeclass allows using either failure mode, while only writing the parsing code once. It does not deal with the performance cost of parsing BMP digits, though it reduces the cost of writing an extra parse method which does or doesn’t handle BMP digits.

The basics of the typeclass are as follows (names tentative):

trait Convert {
  type Result[T]

  def conversion[T](res: => T): Result[T]

  /* MUST be called within a `conversion` block */
  def fail(ex: => Exception): Nothing
  def unwrap[T](result: Result[T]): T
}

Result[T] defines the type returned by the conversion - either T or Option[T]
conversion is a block within which parsing takes place
fail terminates its parent conversion block with the appropriate failure - either an exception or None
unwrap gets the value of a previous conversion (from the same Convert), immediately terminating its parent conversion if the previous Result[T] was a failure

The two typeclass instances are Convert.Valid and Convert.Any:

object Convert {
  object Valid extends Convert {
    override type Result[T] = T
    override def conversion[T](res: => T): T = res
    override def fail(ex: => Exception): Nothing = throw ex
    override def unwrap[T](result: T): T = result
  }

  object Any extends Convert {
    override type Result[T] = Option[T]
    override def conversion[T](res: => T): Option[T] = {
      try {
        Some(res)
      } catch {
        case FailControl => None
      }
    }
    override def fail(ex: => Exception): Nothing = throw FailControl
    override def unwrap[T](result: Option[T]): T = result match {
      case Some(t) => t
      case None => throw FailControl
    }
  }

  private case object FailControl extends ControlThrowable
}

Convert.Valid is used when only valid inputs are allowed, and an exception should be throw if an input is invalid
Convert.Any is used when any input is allowed, and:
- if the input was valid, the result is returned wrapped in Some
- if the input was invalid, None is returned

The following simple example shows how easy it is to change code which always throws exceptions on failure, to code which can also return an Option:

/* always throws on failure */
def parseBooleanThrowing(s: String): Boolean = {
  s.toLowerCase match {
    case "true" => true
    case "false" => false
    case _ => throw new IllegalArgumentException(s"'$s' is not 'true' or 'false'")
  }
}

/* can throw on failure, or return results in `Option`s */
def parseBoolean(s: String)(implicit c: Convert = Convert.Valid): c.Result[Boolean] = {
  c.conversion {
    s.toLowerCase match {
      case "true" => true
      case "false" => false
      case _ => c.fail(new IllegalArgumentException(s"'$s' is not 'true' or 'false'"))
    }
  }
}

I believe this typeclass enables both types of parsing with relatively little overhead cost, and could be useful for toInt and related methods.

The full code for this prototype can be found here (at commit 498d826).

Any feedback about improving the typeclasses or about whether or not they help solve the issues with toInt is greatly appreciated

Side note: I called it Convert and not Parse because it is sufficiently general that it might be used (in the following contrived example) to convert a Seq to a 3Seq (a Seq of size 3), which is not a parsing operation per se. I’m 100% open to other names.

benhutchison · August 18, 2019, 2:12am

Where did this issue end up? IMO it’s a worthy complaint and a better solution that avoided exceptions would be welcome.

martijnhoekstra · August 18, 2019, 7:12am

It’s included in 2.13