From looking at this on a surface level (note that I don’t really use Spark that much) I can think of some recommendations for a postmortem/retrospective
- It sounds like there is an argument for putting the closure cleaning (or a part of it) inside Scala stdlib somehow (or maybe the compiler, not sure which abstraction level works best). This may put more effort onto the Scala compiler team, but it turns out that implementing the closure cleaner properly required effort from the Scala compiler team anyways. At least if the closure cleaner is part of the official Scala release, it will always be available when Scala gets released and it appears that this isn’t as Spark specific as we think it is (i.e. look at Fink)
- Alternately it seems like it may be a good time to revisit Spore’s (https://docs.scala-lang.org/sips/spores.html) which were deliberately designed to solve this problem. I know there were some technical reasons as to why they couldn’t be completely finished, but there might be some argument in getting Spore’s over the hurdle so they can actually be used (or maybe even part of Scala itself)? As far as I understand, if you use Spore’s instead of Scala’s closure, you don’t even really need a closure cleaner since with Spore’s you have to explicitly define which variables get included inside a closure.
As you said, and its important to re-iterate, there were very real reasons for this delay. It was an incredibly technical problem, so much so that it required Scala compiler engineers to get it over the fence, so I don’t think we should over dramatize what happened.