Do have you any evidence for this?
Yes. If you run the overflow benchmarks in the repo, you see ~1000,000 ns vs ~15,000 ns
Do have you any evidence for this?
Yes. If you run the overflow benchmarks in the repo, you see ~1000,000 ns vs ~15,000 ns
In that test, you do:
Try(value.toByte).toOption
This makes it very hard for the JVM to elide the exception for each result, which isnât a realistic test. I would expect to see the 20% slow down (or perhaps parity or better, depending on the proportion of invalid input) with code more like:
try {
value.toByte
} catch {
case _: NumberFormatException => -1
}
You are talking about applications which parse a great many numbers in a tight loop, right? So what will they do on invalid input? Presumably most will either skip, insert some kind of sentinel or abort the whole thing. The first two of those cases should look like the above â with a try/catch in the same method, which allows the JIT to eliminate or heavily optimise the exception. (And in the third case the exception perf doesnât matter.) Your benchmark forces the JIT to keep the full exception mechanisms, which isnât a fair test.
Fortunately, there is a test harness there, so you could easily test that assertion. If you did, youâd see 1000% slower execution.
EDIT: Thatâs wrong, youâd see 10000% slower (100 times) longer runtime.
Thanks for trying to help us get this right, but
As to point (2), the quote says that 4% of the cases are errors. Thatâs an extra ~100 ns for 1/25th of a thrown exception, suggesting that exceptions take about 2500 ns, roughly in line with other results. Not very good compared to a 10-20 ns parse time. Absolutely not â20% slowdownâ if you have lots of failed parses.
It is a good point that Try
should be elided in favor of try
(though escape analysis probably takes care of most of that), but you found a good resource to learn from and you might want to read and think through it more deliberately to understand why Martijn and I are concerned about the overhead of exceptions (and/or run the test that Martijn wrote that demonstrates the overhead).
If you can find a case where the JIT compiler elides a stack-trace-forming exception, that would be great. Maybe we can use that pattern here! But Iâve never found a case where it works unless you have stackless exceptions, and even that is very finicky.
HotSpot has an optimization to use pre-allocated exceptions on for built-in exceptions (e.g., the NPE thrown if the receiver of an invokevirtual bytecode is null).
⥠scala -J-XX:-OmitStackTraceInFastThrow
Welcome to Scala 2.12.1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112).
Type in expressions for evaluation. Or try :help.
scala> val thrown = collection.mutable.Set[Throwable]()
thrown: scala.collection.mutable.Set[Throwable] = Set()
scala> def test(c: AnyRef) = try { c.hashCode } catch { case e: NullPointerException => thrown += e; 0 }
test: (c: AnyRef)Int
scala> val cs = Seq[AnyRef](Int.box(0), "", new Object(), null)
cs: Seq[AnyRef] = List(0, "", java.lang.Object@251c8145, null)
scala> (1 to 10000000).foreach(_ => cs.foreach(test))
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
^C^CException in thread "JLine Shutdown Hook" java.lang.OutOfMemoryError: GC overhead limit
⥠scala -J-XX:+OmitStackTraceInFastThrow
Welcome to Scala 2.12.1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112).
Type in expressions for evaluation. Or try :help.
scala> val thrown = collection.mutable.Set[Throwable]()
thrown: scala.collection.mutable.Set[Throwable] = Set()
scala> def test(c: AnyRef) = try { c.hashCode } catch { case e: NullPointerException => thrown += e; 0 }
test: (c: AnyRef)Int
scala> val cs = Seq[AnyRef](Int.box(0), "", new Object(), null)
cs: Seq[AnyRef] = List(0, "", java.lang.Object@191c6e13, null)
scala> (1 to 10000000).foreach(_ => cs.foreach(test))
scala> (1 to 10000000).foreach(_ => cs.foreach(test))
scala> (1 to 10000000).foreach(_ => cs.foreach(test))
scala> (1 to 10000000).foreach(_ => cs.foreach(test))
scala> thrown.size
res4: Int = 14396
You can also turn off all stack traces (not recommended in general!) with -XX:-StackTraceInThrowable
. That can be useful for performance analysis.
Itâs also worth noting this benchmark of various flavours of exceptions vs returning an error code. (@Ichoran presented similar charts as Scala Days San Fran a few years back)
https://shipilev.net/blog/2014/exceptional-performance/
Turns out that stackless exceptions are faster than error codes if the error rate is below 1/100 because the control flow is more efficient.
I think there is value in providing more choice to the user here (as the library author, you donât know the userâs error rate!), but I thought this was good background to help the discussion here.
See also the discussion leading up to https://twitter.com/Blaisorblade/status/856548987066830848 where we got some interesting performance from exceptions.
This is true only in particular local cases. Otherwise the runtime stack-unwinding overhead gets you. (You were testing within-method throw/catch.)
Iâve been thinking a lot recently about how to deal with the âfail with an exceptionâ/'return an Option
â problem recently, and I have come up with a solution using a typeclass. The typeclass allows using either failure mode, while only writing the parsing code once. It does not deal with the performance cost of parsing BMP digits, though it reduces the cost of writing an extra parse method which does or doesnât handle BMP digits.
The basics of the typeclass are as follows (names tentative):
trait Convert {
type Result[T]
def conversion[T](res: => T): Result[T]
/* MUST be called within a `conversion` block */
def fail(ex: => Exception): Nothing
def unwrap[T](result: Result[T]): T
}
Result[T]
defines the type returned by the conversion - either T
or Option[T]
conversion
is a block within which parsing takes placefail
terminates its parent conversion
block with the appropriate failure - either an exception or None
unwrap
gets the value of a previous conversion
(from the same Convert
), immediately terminating its parent conversion
if the previous Result[T]
was a failureThe two typeclass instances are Convert.Valid
and Convert.Any
:
object Convert {
object Valid extends Convert {
override type Result[T] = T
override def conversion[T](res: => T): T = res
override def fail(ex: => Exception): Nothing = throw ex
override def unwrap[T](result: T): T = result
}
object Any extends Convert {
override type Result[T] = Option[T]
override def conversion[T](res: => T): Option[T] = {
try {
Some(res)
} catch {
case FailControl => None
}
}
override def fail(ex: => Exception): Nothing = throw FailControl
override def unwrap[T](result: Option[T]): T = result match {
case Some(t) => t
case None => throw FailControl
}
}
private case object FailControl extends ControlThrowable
}
Convert.Valid
is used when only valid inputs are allowed, and an exception should be throw if an input is invalidConvert.Any
is used when any input is allowed, and:
Some
None
is returnedThe following simple example shows how easy it is to change code which always throws exceptions on failure, to code which can also return an Option
:
/* always throws on failure */
def parseBooleanThrowing(s: String): Boolean = {
s.toLowerCase match {
case "true" => true
case "false" => false
case _ => throw new IllegalArgumentException(s"'$s' is not 'true' or 'false'")
}
}
/* can throw on failure, or return results in `Option`s */
def parseBoolean(s: String)(implicit c: Convert = Convert.Valid): c.Result[Boolean] = {
c.conversion {
s.toLowerCase match {
case "true" => true
case "false" => false
case _ => c.fail(new IllegalArgumentException(s"'$s' is not 'true' or 'false'"))
}
}
}
I believe this typeclass enables both types of parsing with relatively little overhead cost, and could be useful for toInt
and related methods.
The full code for this prototype can be found here (at commit 498d826).
Any feedback about improving the typeclasses or about whether or not they help solve the issues with toInt
is greatly appreciated
Side note: I called it Convert
and not Parse
because it is sufficiently general that it might be used (in the following contrived example) to convert a Seq
to a 3Seq
(a Seq
of size 3
), which is not a parsing operation per se. Iâm 100% open to other names.
Where did this issue end up? IMO itâs a worthy complaint and a better solution that avoided exceptions would be welcome.
Itâs included in 2.13