Proposed enhancements to scala.util.Using

joshlemer · November 27, 2021, 7:33pm

Scala 2.13 introduced the scala.util.Using utility which helps us deal with cleaning up resources before exiting a code block similar to Java’s try-with-resources functionality.

After using Using for some time, a couple pain points have become apparent with respect to the Using.Manager.apply factory method.

The Using.Manager.apply method is used in the following way:

val result: Try[Long] = Using.Manager { (use: Using.Manager) => 
  val file1 = use(new File("file1.txt"))
  val file2 = use(new File("file2.txt"))
  file1.length() + file2.length()
}

Every time the passed Manager instance is applied, essentially a callback is pushed on the stack to close the resource after the codeblock returns, no matter if it returns through return or through exception.

Problem 1: everything is wrapped in Try, always

The first issue is that there is no way to use the manager in this way, without going through scala.util.Try. So if you are in a section of code where you are dealing with errors by throwing them, you will have to remember to call .get on the result each and every time. Forgetting to call .get is very likely to happen, especially when in Unit-returning methods (such as unit tests!), and at the very least are cumbersome and a pain point.

For instance, use in a ScalaTest test suite runs a high risk of accidentally swallowing errors in code such as:

test("doing something with files...") {
  Using.Manager { use => 
    val file = use(new File("f"))
    file.length() mustBe 1024L
  } // <-here! we forgot to do .get !!
}

I can testify that this has happened multiple times on my team, to myself and to others, and slipped through code review!! So for some weeks, we had test suites that were essentially testing nothing since they always passed.

I would propose that a new higher order method be added to either object Using or object Manager, don’t much care what name you choose, something like newScope perhaps:


def newScope[A](f: Manager => A): A = (new Manager).manage(f)

test("doing something with files...") {
  Using.newScope { use => 
    val file = use(new File("f"))
    file.length() mustBe 1024L
  } 
}

Problem 2: Missing a more flexible go-style `defer`, to register general cleanup code

Sometimes you want to do some cleanup action(s) which are not exactly the “canonical” way to shut down a resource. For instance, maybe you temporarily have an ExecutorService in your block, which doesn’t implement AutoCloseable by the way. At the end, how you want to clean up the ExecutorService may vary by application and by use-case. Should I .shutdownNow() or should I .shutdown(); .awaitTermination()? The point is that often cleanup logic isn’t coupled to the compile-time type of a resource but deserves first class application code.

That use-case actually is supported by Using, because the Manager.apply takes in a SAM type Releasable[A], so any arbitrary callback can be passed in as the typeclass instance:

Using.Manager { use =>
  val ec: ExecutorService = use(Executors.newSingleThreadExecutor()) { ec => 
    ec.shutdown()
    ec.awaitTermination(10L, TimeUnit.SECONDS)
  }
}

But there’s just one sort of use-case which becomes a bit awkward and that is when you aren’t really opening and closing a single encapsulated resource object, but you have some other cleanup code to register anyways. Like, maybe you want to println("exiting!") or maybe you want to collect all of your files you opened and zip them up and put them in an archive folder. For such general purpose cleanup code, the only way to register such actions is to find some “dummy” receiver, like "" or () or whatever (null resources are not considered valid):

Using.Manager { use => 
  use(())(_ => println("done!"))
  val archiveDirectory: File = ???
  val filesOpened = ListBuffer[File]()
  use(1) { _ => 
    zipUpAndArchive(filesOpened.toSeq)
  }
  use("") { _ => 
    println("some other action...")
  }
}

This is a little awkward, we’re not opening a resource with use(), but that’s the only way to hook into the callback registration. I would propose that a method be exposed in the manager which lets users directly register callbacks. I’ll call this defer after go’s defer statements:

Using.Manager { use => 
  use.defer(println("done!"))
  val archiveDirectory: File = ???
  val filesOpened = ListBuffer[File]()
  use.defer {  
    zipUpAndArchive(filesOpened.toSeq)
  }
  use.defer { 
    println("some other action...")
  }
}

Problem 3: Thread Safety

I think for a utility that is to be used as a basic control structure, it’s worth considering if in this case it’s worth making the utility thread-safe. The Manager class has 2 pieces of mutable state:

private var closed = false
private[this] var resources: List[Resource[_]] = Nil

It’s not protected by any synhronization though, worth considering if it should be so that even if a user passes the manager to an other thread there’s no chance of corruption. For instance, this code will sometimes allow one thread to register a resource, but then never closes it:

  Using.Manager { use =>
    new Thread(() => {
      var i = 0
      while(true) {
        try {
          use(i)(x => println(s"closed ${x}"))
          println(s"registered ${i}")
          i += 1
        } catch { case NonFatal(_) => }
      }
    }).start()

    Thread.sleep(30)
  }

Verify in the stdout (272 is registered but never closed):

//...
registered 267
registered 268
registered 269
registered 270
registered 271
registered 272
closed 271
closed 270
closed 269
closed 268
closed 267
//...

This could be fixed with just some synchronized blocks.

Thanks folks!

martijnhoekstra · November 27, 2021, 8:31pm

I literally today found a bug in our tests with the exact problem of not calling get in a test. Would value discard warnings would have saved my bacon there? If so, that would be enough for me.

What I do really mis is an intermediate Using type that could be mapped and flatMapped and passed around. It was part of the scala-arm prototype and it is sorely missed here. My brain has fp rotted to the point that only with the greatest difficulty I can convince myself that val x = use(a); val y = use (b) and val y = use(b); val x = use(a) are not equivalent.

julienrf · November 29, 2021, 12:04pm

Hey, thank you Josh for the detailed post!

Regarding problem 1, I am surprised that you get no warning “discarding non-Unit value”. I wonder if this warning requires an opt-in flag, or if the compiler missed it in your situation? In case the warning was issued, would that be a satisfactory solution?

I agree with the two other problems mentioned, and their proposed solutions.

Jasper-M · November 29, 2021, 1:07pm

I think -Wvalue-discard is required to get those warnings.

joshlemer · November 29, 2021, 4:20pm

In my opinion, even if I were to have a compiler warning about this, it would still be better to not have to write .get .get .get .get all over the place. I think compile warnings are not really paid much attention in a lot of contexts also.

morgen-peschke · November 29, 2021, 5:24pm

Because of the way ScalaTest is designed, it’s really common to have a value discard at the end of a test case, so you have to either manually put () everywhere or disable that warning.

DWvanGeest · November 29, 2021, 6:42pm

I second @joshlemer’s concerns, particularly on the ability to register arbitrary cleanup code. A common use-case for me is service startup/shutdown, where logging progress is useful.

But, like @martijnhoekstra, I do wish for a more monad-like solution. I don’t know the context of why Using seems to deviate so far from the scala-arm prototype - maybe someone can fill me in on how that decision was made. Given that scala-arm, twitter-util’s Managed, and cats-effect Resource all take a monadic approach to this, why did the stdlib go in such a different direction? Diverse users had already converged on something that feels very appropriate in Scala.

FWIW, this has bothered me enough in the last year to make a variation of twitter-util’s Managed, with no dependencies and a better (IMO) API, somewhat inspired by cats-effect Resource: https://github.com/dvgica/managerial. It has methods specifically for arbitrary side effects (e.g. Managed.eval, Managed.evalTeardown). To me, this approach feels like pretty idiomatic Scala, whereas I struggle with the imperative magic of Using.

I don’t want to derail this thread or its concrete suggestions, but I do idly wonder if Using could be more of a high-level syntax layer on a low-level monadic implementation, where developers could use whichever API they prefer.

Ichoran · November 29, 2021, 7:18pm

I think it was just to provide the most bare-bones version. It also happens to be super-easy to understand how it works, and to perform as well as such things reasonably could, neither of which are as true of a version that is monadic. (Note also that it uses Try, which itself is not monadic over all values, since it prioritizes maximum safety over monad laws.)

Isn’t it really easy to wrap Using in some monadic structure if that’s what you want?

Ichoran · November 29, 2021, 7:25pm

Is there a reason to even use Using.Manager? Not that the problems shouldn’t be fixed, but given that there are the problems, why bother with Manager? Why not just use plain Using{ ... } or Using.resource...?

som-snytt · November 29, 2021, 8:08pm

On curtailing warnings in tests, -Wconf is helpful:

"-Wconf:cat=lint-nullary-unit&site=.*Test:s", // normal unit test style

lets you omit the parens in the test definition, and -Wconf:help reports that the category for value discard is:

w-flag-value-discard

lrytz · November 30, 2021, 8:28am

For reference, there are good discussions about the design on the PR that introduced Using: https://github.com/scala/scala/pull/6907

som-snytt · November 30, 2021, 5:48pm

There’s a link to further speculation on scala-dev, which ends with “we can improve it in 2.14.”

NthPortal · December 3, 2021, 3:34am

I am also bothered by this. I believe it was a concession to those who wanted an API that was safer or cleaner by using Try (@Ichoran and perhaps others). I would love to add an API that just throws. I don’t think newScope is a very good name, but I feel confident that we could bikeshed a good name. After all, we already have Using.apply that returns Try and Using.resources that throws.

If all of the threads are joined (or all tasks submitted to an EC are completed and awaited) before the Using.Manager block closes, then synchronized is sufficient to make it thread-safe. In all other cases, which is most of them, it is still broken.

I could maybe be convinced that we make the default API have a happens-before order of some sort, or that we add a new API for that, but I’m not currently convinced.

I’m not at all convinced that this is within the intended use of Using. If I open a JSON file, and the JSON parsing throws an exception, I use Using to make sure I don’t leave the file handle open (I don’t want to leak resources by accident). Zipping all the files you’ve opened, or printing/logging something, is not a concern of not cleaning up and not leaking resources. When an exception is thrown and not caught, including a VirtualMachineError, we don’t want to be performing arbitrary actions; we only want to be performing the most essential cleanup actions. If the JVM has run out of memory, and is now fundamentally unstable, we want to do as little as possible before shutting down. That does not include zipping up some files you’ve opened.

If you want to zip those files or print something, explicitly catch the relevant exceptions, and either handle them or rethrow them after creating your zip.

You could try to convince me with another use case, but I’m very much convinced it is a bad idea right now.

If my memory serves me (which it often doesn’t), it was decided that this was not in the style common to the standard library, and external libraries and platforms (e.g. typelevel) would be in a better place to provide this.

@joshlemer for reference, only problem 3 can be potentially addressed in 2.13. The others would have to wait for 3.T where we can break forward bincompat.

DWvanGeest · December 3, 2021, 4:40am

To clarify a bit, I’m using the term “monadic” pretty loosely. Mostly I just want to flatMap in a for comprehension, I’m not a purist by any means. I did like the original implementation quite a bit! It looks like subsequent iterations removed the monad-ish thing though, because it was deemed unnecessary.

Maybe there was other discussion, but given the precedent of Try, Future, and Option as monad-ish things in the stdlib, a Using monad seems perfectly idiomatic. But maybe that’s just me.

Maybe this is the crux of the matter - I (and perhaps @joshlemer too) want the scope of Using to be a bit broader. My desired use-case is pretty well illustrated here - starting and stopping arbitrary software components within an application. scala-arm, twitter-util Managed and cats-effect Resource do support this kind of usage, but perhaps it’s a step too far for the stdlib.

Thanks for the reply!

joshlemer · December 4, 2021, 7:48pm

Hey not sure I exactly see the problem. We can’t stop a reference to the Manager leaking and continuing to live on in an other thread, but we can still make the operations on Manager thread-safe. Once the manager is (thread-safely) closed, we could guarantee that no other thread can successfully register a callback with manager.apply(). Wouldn’t this work (below)?

    def acquire[R: Releasable](resource: R): Unit = {
      if (resource == null) throw new NullPointerException("null resource")
      this.synchronized {  // <---- here
        if (closed) throw new IllegalStateException("Manager has already been closed")
        resources = new Resource(resource) :: resources
      }
    }

    private def manage[A](op: Manager => A): A = {
      var toThrow: Throwable = null
      try {
        op(this)
      } catch {
        case t: Throwable =>
          toThrow = t
          null.asInstanceOf[A] 
      } finally {
        var rs: List[Resource[_] = null 
        this.synchronized {  // <---- here
          closed = true
          rs = resources
          resources = null 
        } 
        while (rs.nonEmpty) {
          val resource = rs.head
          rs = rs.tail
          try resource.release()
          catch {
            case t: Throwable =>
              if (toThrow == null) toThrow = t
              else toThrow = preferentiallySuppress(toThrow, t)
          }
        }
        if (toThrow != null) throw toThrow
      }
    }

Ichoran · December 4, 2021, 8:48pm

For what it’s worth, I didn’t weigh on on the Manager thing at all, as far as I recall. I did want an interface that prioritized consistency and safety over monad laws, as Try does, but Using.apply is exactly that, which satisfies the concern: the default thing should be the safest and most straightforward.

I’m with you 150% on this one. I can’t envision how anything like Using.Manager could be made sufficiently safe in a concurrent context to do anything but add a bunch of pitfalls. If we removed all the ones that we possibly could, it would still leave so much conceptual complexity that it would be an anti-pattern to use it.

Also with you on this one. To me, Using works best to perform one very specific task, and to perform it well. Generalized flow control for error conditions I think is well beyond what it should do.

Most notably, I think Using is a bad choice for this because there are a variety of choices that people tend to make for their error flow control–is it done monadically? Do you sequence by chaining calls or with ;? Do you ever allow exceptions, or always convert them to values as soon as possible? Etc…

Better to adopt a library that does your error condition flow. It could use Using, possibly, for some of its fundamentals.

I don’t remember either. My mockup had map and flatMap IIRC, but I also agree that monadic-style solutions are somewhat less common. I really don’t know what happened.

I also think that context functions are very interesting to explore as yet another way to work with resources. So by the time we actually revisit this to make significant changes (of the scale of making Using monadic), we might want a rather different-looking solution. I’m reworking my standard library for Scala, and will probably try out something along these lines.

Yes, and that’s pretty much a disaster, isn’t it? Because now you could possibly have resources open in multiple threads and in a way that would require coordination to even use successfully, let alone to gracefully handle shutting them all down.

Enabling this kind of use case in any way seems like a losing battle, so making it burn people slightly less badly in the specific case you’re outlining doesn’t seem like an effort that’s worth it.

NthPortal · December 5, 2021, 6:50am

That’s not the problem; you can actually leak the Manager in a single thread if you really want to. The problem is that if the other threads aren’t joined, the parent thread will close the resources while they’re still in use, not only breaking things but breaking things in confusing and inconsistent ways from a distance. “Why did my FileInpuStream break in the middle of the file?”

morgen-peschke · December 6, 2021, 5:31pm

Not strictly related to Using, but I ran into an equivalent issue with splitting Akka streams, and figuring out why only ~90% of the results were written to disk was an exasperating experience.

Proposed enhancements to scala.util.Using

Problem 1: everything is wrapped in Try, always

Problem 2: Missing a more flexible go-style defer, to register general cleanup code

Problem 3: Thread Safety

Problem 2: Missing a more flexible go-style `defer`, to register general cleanup code