Adding better-files to the Scala Platform

This is a proposal to add better-files to the Scala Platform.

Introduction

better-files is a dependency-free pragmatic thin Scala wrapper around Java NIO. It is an alternative to the IO interface present in the Scala standard library and defines idiomatic helpers to handle IO in a sane and elegant way. It is licensed under the MIT license.

Features

  • Java interoperability
  • Easy to use
  • Efficient NIO wrappers
  • Zero external dependencies (aside from JVM-bundled dependencies)

Merits

better-files has got a lot of traffic lately:

  • Gave couple of well received talks on it at Scala Days and Scala by the Bay in 2016
  • 2000 Maven downloads. Met a lot of people who use the library at their job/personal use at conferences.

  • 700+ stars on Github
  • Fairly small and self-contained: <500 LOC and no external dependencies
  • Search on sbt files in Github has >200 hits

Example usage

better-files allow users to work with IO in different styles. Here, I only describe the most common. For more information on the supported use cases and explanations on the syntax, have a look at the README.

File instantiation

import better.files._
import java.io.{File => JFile}

val f = File("/User/johndoe/Documents")                      // using constructor
val f1: File = file"/User/johndoe/Documents"                 // using string interpolator
val f2: File = "/User/johndoe/Documents".toFile              // convert a string path to a file
val f3: File = new JFile("/User/johndoe/Documents").toScala  // convert a Java file to Scala
val f4: File = root/"User"/"johndoe"/"Documents"             // using root helper to start from root
val f5: File = `~` / "Documents"                             // also equivalent to `home / "Documents"`
val f6: File = "/User"/"johndoe"/"Documents"                 // using file separator DSL
val f7: File = home/"Documents"/"presentations"/`..`         // Use `..` to navigate up to parent

File read/write

val file = root/"tmp"/"test.txt"
file.overwrite("hello")
file.appendLine().append("world")
assert(file.contentAsString == "hello\nworld")

Streams and codecs

Various ways to slurp a file without loading the contents into memory:

val bytes  : Iterator[Byte]            = file.bytes
val chars  : Iterator[Char]            = file.chars
val lines  : Iterator[String]          = file.lines
val source : scala.io.BufferedSource   = file.newBufferedSource // needs to be closed, unlike the above APIs which auto closes when iterator ends

Java interoperability

We can go from better-files wrappers to Java wrapper and the other way around at any time.

val file: File = tmp / "hello.txt"
val javaFile     : java.io.File                 = file.toJava
val uri          : java.net.uri                 = file.uri
val reader       : java.io.BufferedReader       = file.newBufferedReader 
val outputstream : java.io.OutputStream         = file.newOutputStream 
val writer       : java.io.BufferedWriter       = file.newBufferedWriter 
val inputstream  : java.io.InputStream          = file.newInputStream
val path         : java.nio.file.Path           = file.path
val fs           : java.nio.file.FileSystem     = file.fileSystem
val channel      : java.nio.channel.FileChannel = file.newFileChannel
val ram          : java.io.RandomAccessFile     = file.newRandomAccess
val fr           : java.io.FileReader           = file.newFileReader
val fw           : java.io.FileWriter           = file.newFileWriter(append = true)
val printer      : java.io.PrintWriter          = file.newPrintWriter

Better pattern matching

better-files defines extractor objects that help pattern matching on files and avoiding if-else expressions.

/**
 * @return true if file is a directory with no children or a file with no contents
 */
def isEmpty(file: File): Boolean = file match {
  case File.Type.SymbolicLink(to) => isEmpty(to)  // this must be first case statement if you want to handle symlinks specially; else will follow link
  case File.Type.Directory(files) => files.isEmpty
  case File.Type.RegularFile(content) => content.isEmpty
  case _ => file.notExists    // a file may not be one of the above e.g. UNIX pipes, sockets, devices etc
}
// or as extractors on LHS:
val File.Type.Directory(researchDocs) = home/"Downloads"/"research"

File system operations

Utilities like ls, cp, rm, mv, ln, md5, diff, touch, cat are easy to use. See this.

Implementation

The implementation has no extra dependencies aside from the NIO module bundled into the JDK8.

Alternatives

  • There are other good libraries out there e.g. Apache Commons IO, Guava but they are in Java
  • In Scala world, only Li Haoyi’s ammonite-ops comes close.
  • This is the biggest: By no means, this is side-effect free purely functional library. There are purer idioms we can use (IOMonads) that better-files does not touch. better-files is simply a wrapper around Java NIO and happily does side-effects and throws Exceptions that Java throws. This may not be palatable to many Scala programmers.

Bibliography

Previous discussion: Scala IO fix-up/overhaul · Issue #19 · scala/slip · GitHub

12 Likes

Hey @pathikrit, thanks a lot for this proposal!

I think that the success of better-files is pretty obvious and it’s an excellent candidate for the Platform, as IO is a major concern for the SPP Committee. I’m forwarding this to Committee members since we’re reviewing it on the 17th January.

I have updated your thread to include a more complete proposal, alike to the one of Enumeratum. Please, review and change the bits you don’t agree with, I’m not an expert in the library so you may want to emphasize or de-emphasize different sections of the proposal, especially the implementation section. It would be great if you could also describe a little bit of the underlying design!

I’ve got some questions for you:

  1. Do you want to include all the functionality that better-files provide?
  2. With regard to:

Have you considered writing a wrapper around better-files that provides a monadic interface? Would you welcome this feature later on in the incubation period?

The only way we could enrich more this proposal is to get more Scala developers join you to maintain better-files!

I’d like to call Scala developers that like or use this library to help you do it. better-files is a good Scala library and could be a good first step for people to improve their skills in Scala or get involved into something meaningful for the community. IO is a module of high importance – increasing the bus factor of the library and getting sponsors would be a win for all of us. If you or the company you work for want to contribute, please contact me or @pathikrit!

Note: There is also the sbt IO that does a fair amount of useful work. https://github.com/sbt/io

My immediate comments on the general proposal (a better IO library) and also this specific proposal (better-files).

Scala does obviously need a better situation for IO, deferring to Java IO is less than ideal for many reasons

  • The API is overly verbose
  • It probably reflects the JVM platform to closely (matters for stuff like Scala.js and possibly scala-native).

The issue is that doing an IO library is not a trivial task, there are a lot of things to take into consideration especially how large your scope is, i.e. should we only be handling basic stuff that you would see in something like Python (aka reading items in one line) or should we also handle more complex stuff like streaming and async (i.e. providing an async implementation that returns Future's)

Regarding this specific proposal, I have the following remarks

  • Will the library possibly be renamed? If we are doing a IO library, something that is very central to languages in general, it should probably be called scala.io and be in a scala.io package or something along those lines
  • Re-iterating @jvican point, the interface should not expose any Java like stuff and it should be idiomatically Scala.
  • I am not a fan of just being a very light wrapper around Java NIO, mainly because I think we should be caring for multiple platforms and hence stuff like scala-native shouldn’t have any concepts of java-nio (so this library wouldn’t be a nice design for a platform agnostic IO)

In conclusion, if we are going to do the hard task of providing a Scala IO library, its something that should be done correctly and with some future proofing in mind, and I don’t think that a light wrapper over Java’s NIO fullfills this task. The Scala.io proposal should be very idiomatically Scala, and shouldn’t (for default usage) expose Java NIO details (obviously we can bring in this functionality similarly to how java converters work for collections)

From a design perspective I like that the API is simple, I just think that it should abstract over more of the JVM specific details

Yes, unless there are specific objections to certain things (e.g. the UNIX utils here).

There are various reasons why I did not:

  1. Its fairly trivial to wrap all better-files methods with (Scala’s Future or Scalaz Tasks or cat’s Monads).
  2. There’s fair amount of disagreement about which one amongst above to use (amongst others - see cat-Reader, fs2 and Monix streams and Tasks etc)
  3. I did not want to depend on external library in better-files and nor did I want to re-invent yet another IOMonad

Yes! Currently the bus-factor of better-files is 1 :frowning:

Yes better-files is a terribly unfortunate name. It started its life as:

implicit class BetterFile(javaFile: java.io.File) {
   // utils !
}

I suggest this:

  1. A better designed, pure, monadic, side-effect free, purely-Scala IO library that is compatible with Scala native (and Scala JS?) belongs to scala.io._. No such thing currently exists.

  2. better-files belongs to something like scala.io.unsafe._ or scala.io.JavaConverters._ package. The understanding here is this library is a simple impure wrapper around Java. Interacting with Java’s IO stuff is an unfortunate truth that many developers have to deal with.

Alternatively, I am fine with keeping better-files out of the Scala Platform and we can wait for 1) to happen and people can use better-files as a non-platform library.

I am not a fan of just being a very light wrapper around Java NIO, mainly because I think we should be caring for multiple platforms and hence stuff like scala-native shouldn’t have any concepts of java-nio

There is no reason we can’t implement a facade for java.nio in scala-native, or for that matter in Scala.js for all the synchronous APIs that exist (in-memory FS, node’s sync FS, …). That’s what we’ve done with Java collections, for example, and the effort required is relatively trivial compared to designing a whole new IO library from scratch.

There are plenty of reasons to want something better than better files (betterer files?) but scala-native/scala-js compatibility is not one of them. I’ve reimplemented a large chunk of java.io.File in Scala.js in-memory to port some Scala-JVM code; took less than an hour to get quite good coverage, and I see no reason that java.nio would be different.

IMO this is a showstopper for the idea of writing a new IO library satisfying whatever monadic/async/portable/etc. criterion and putting it in the platform. The platform should be about stability and consensus, not experimentation and research. But I’m of the opinion that any candidate library should already exist and have a reasonably large following before being considered for inclusion in the “scala platform”, whichever interpretation of that term you want to take.

That’s not to say we can’t go and write a whole new Scala IO library and explore; it’s just that an add-to-platform-proposal is not the place to brainstorm ideas for that. “no library is satisfactory, including this one, is satisfactory to include in the platform” would also be a fine answer here, but let’s not kid ourselves that there’s a whole new idiomatic, good, safe, monadic, async, cross-platform IO library just around the corner…

4 Likes

To the best of my knowledge, there is no IO library that meets this requirement. It’s good to have, but not a hard requirement that should be taken into consideration for the acceptance of this library – it’s hard to get right and requires lots of time and maintenance. We could figure it out later on :smiley:.

I think that async support is not necessary and can be easily done by third parties. If people do show interest in this feature, why don’t we work it out in the incubation period or later versions of this module alongside @pathikrit?

better-files has already a lot of value by what it provides to JVM users. Asking for more is good, especially if it’s followed by PRs and interesting discussions, but the mentioned features are not essential to solve the IO problem and can wait for later. What better-files buys us is a battle-tested and sane library for doing IO and interoperating with Java files (a bad evil for those in the JVM world). As data shows, it enjoys a good usage among Scala developers that look for an alternative of the IO utilities in the standard library.

Have you thought about an alternative name?

I totally agree.

We need to turn our hopes/requirements into actions that help us improve what we have right now. There’s always room for improvement. I personally think that some better-files features could be improved and more typesafety could be added. I’m happy to share these insights in the future and follow them with a PR. But these opinions are not conceptually incompatible with this library proposal. The incubation period exists to achieve consensus via patches and it was added because it proved to work very well in Apache projects. I think it’s the right solution for this kind of disagreements and invites us all to collaborate together.

I think the thin wrapper over NIO is the correct approach because it can be used as the basis for higher-level abstraction if others decide this is necessary. It also fills a clear need in a “mainstream scala” style, which I think is realistic for the platform. As much as I would love to see pure mondic IO, that’s not where most of the community is right now, there is insufficient default support for this style of programming, and it’s not the direction Martin wants to go.

So this seems like a great contribution to me. :thumbsup:

9 Likes

Java NIO is a very low level library, and it doesn’t really make sense on some platforms (Scala.js) and could be very different on other libraries (scala-native). If Scala.js would add suppor this library via platforms like node.js, I suspect it would be very hard to make a proper interface to Java NIO within Scala.js and for scala-native I suspect that the use case in scala-native (low level accesss to C/C++ libraries) would mean that implementing Java NIO would be less than ideal

The collections analogy isn’t apt, its an apples to oranges comparison. Collections API is designed to be high level, IO access unless its a very high level API isn’t (unless we deliberately design the scope of this SPP to be very small which gets to my earlier point)

java.io.File is deliberately a high level API, java.nio is not (at least in its entirety of its design, i.e. memory mapping of files and channels). It provides very low level access to filesystem operations, and also its idea of “async” (or non blocking) is quite different to what typical non blacking actions look like in Scala.

I for example do not see how channels would work with Scala.js, and memory mapped files isn’t something thats also official supported with node.js

To be clear, if we wan’t this SPP to just be very high level without async or any other features like that, than the better abstraction would actually be Java’s original java.file and not java.nio.

1 Like

The thing is, if we hardcode our scala IO interface to java.nio then we have to deal with with the ramifications in the future which would make it really hard to revert. If you we are using java.nio, then we providing the expectation that it works similarly across all platforms, if not then it causes issues that Scala.js has issues with right now (regex is a very good example).

I would much rather prefer if the types that this IO library implements are Scala defined types (which on JVM can just wrap over java.nio) and on other platforms do whatever they need to do. This way you can actually document that “this behavior is platform specific” and that other behaviour is “platform independent”.

Point is, and in general, I think we should only be using java types if we can say the behaviour is similar in all platforms, else we should create our own types.

1 Like

I agree, this would be very nice, and it’s something that we could all work on during the incubation period. But to be honest, putting this as a requirement raises the barrier too much, not for this library but for others to come. Cross-platform is a feature that should be prioritized, but in cases where no other libraries have it there’s nothing to do.

Could you open an issue in better-files and we discuss it there? I’m happy to help on this one, and perhaps @sjrd can also voice his opinion there. This effort could also help future modules to become cross-platform. It would be great to explore ways to bring together Java compatibility and cross-platform at the same time! But my point is that this requirement should not be deciding for its acceptance given that no other library provides this support now.

Since I was summoned …

IMO cross-platformness of libraries should be a significant concern of the Scala Platform, insofar as it is straightforward to design for any particular library.

For example, let’s say we have a math library. In all likelihood, the API provided by this library can be implemented regardless of the platform. Better: there’s a high probability that the entire source code of that library would cross-compile without effort. In that case, IMO the process should favor and encourage a cross-platform library. Even more, it should probably question why such a library would be proposed without cross-platform support.

On the other hand, I do not think we should demand cross-platformness from a library addressing a concern that is not “trivially” portable. I/O might just be the ultimate example of this problem. Yes, some things can be made sort-of cross-platform. But in general, any I/O library design will make trade-offs when supporting multiple platforms. In that case, I would avoid holding it against the library the fact that it is not cross-platform. If it complies with the SP requirements for at least one platform, and is not provided for other platforms, it’s fine.

3 Likes

The thing, its very easy for library to create more issues (indrectly) than it solves if it sets a precedent that is very hard to change in the future. If everyone ends up coding against better-files (highly common since it is an IO library) then the library is forced to maintain a very long term compatibility.

And honestly, at least for the initial release, there doesn’t need to be any support for the other libraries. The first release of the library can just support JVM by wrapping over the java.nio type and not exposing it (or any other types that are needed for that matter). For Scala.js, a library can provide opt in support for file operations (since its not native to Javascript, only to stuff like node.js) and scala-native can implement the file support as it sees fit.

Although having something is usually better than having nothing, if its something thats very central we shouldn’t just include it because it doesn’t exist yet, I don’t believe that is very prudent.

Furthermore there are alternatives that exist, in fact quite a few. sbt-io is an example, and is probably (indirectly) one of the most used file implementations out there, considering how SBT is also used on many platforms and in many different situations

Sure thing, the issue is here (Wrap over java.nio types instead of exposing them directly · Issue #97 · pathikrit/better-files · GitHub). Will also try to help out to get it moving

Re Java.NIO I’d distinguish two issues:

  • java.nio APIs like java.nio.file.* which could be replicated like @lihaoyi says and like it has been done.
  • low-level APIs (like memory-mapped I/O) which might be platform-specific (currently, work everywhere but on Scala.JS) — in the proposal, this seems to be newFileChannel. Maybe users should have a way to opt-in to those features, possibly standardized by the platform?

Finally: do you want the Scala Platform to use scala.*, not something like scalax.*?

6 posts were split to a new topic: Should the Scala Platform modules use the Scala namespace?

This library has been officially incubated in the Scala Platform, congratulations! This means the following:

  • Library authors have complete access to Scala Platform’s infrastructure.
    • Automatic release process.
    • Drone integration with caching and several customization features.
    • Official group ids are at maintainers’ disposal. They can release under them if they desire so.
  • Library authors will take part into future decisions regarding Scala Platform Process’s rules.
  • There will be a final vote to accept this proposal into the Scala Platform. This final vote will be done whenever library maintainers feel it’s the right moment to end the incubation.

More information in the official Scala Platform Process.

Incubation period

The incubation period is the perfect moment for gathering developers around your library, creating a community, cleaning up APIs (note that changes in public APIs cause binary incompatibility and are done every year and a half), accepting PRs, creating well-documented issues that people can work on, et cetera.

Next steps:

  • Library maintainers accept Scala Center’s Code of Conduct and use it in their projects.
  • Library maintainers decide the license they will use (they can stay with the same they have).
  • Library maintainers decide whether they endorse C4 or not.
  • Libraries have Gitter channels and pertinent CONTRIBUTION guidelines for people to submit paches/PRs!

Remember that taking decisions on these issues is extremely important for creating a community around the modules – our end goal! You can also participate in the current open debates to abstain from recommending C4 / MPLv2 or changing the major cycle to one year instead of 18 months; your opinion is highly valuable, so please comment.

At the Scala Center, we’re planning to run a series of hackathon in well-known Scala conferences to encourage people to hack on open issues of Scala Platform modules and join their community. Our goal is to boost the success of the Platform and help us get to a point where we can all benefit from a high-quality collection of modules. This is why having CONTRIBUTION guidelines, tips to developers and a getting started guide is important – consider writing them.

Better files

In the case of this proposal, there have been several recommendations by the Scala Platform Committee:

  1. Change name (@pathikrit has already agreed this is a good idea). Does someone have any idea for alternative names to Better files?
  2. Clean up the API, especifically the extreme use of symbols. /cc Eugene Yokota and @Ichoran can probably be more specific in an issue about what they’d like to be polished.

Also, in this conversation we’ve been discussing new ideas that would be worth to be implemented in the Incubation period. @mdedetrich has already open an issue to discuss ways to make Better files cross-compatible with Scalajs. Please, if you’re interested in this, get in touch with him!

@larsrh or @tpolecat: Do you know of something that would be interested in creating a pure API based on better-files?

Regarding infrastructure, I’ll contact library maintainers to make the transition in the next days. Thank you very much for getting involved in this collaborative effort to improve the experience of Scala developers all around the world.

4 Likes

IMO, better-file belongs scala.nio._ with the understanding that these are wrappers around Java’s NIO. A future pure library should go in scala.io._?

Please comment here: Move all special symbols (`~`, `>>`) and Unix DSL to File.dsl package · Issue #102 · pathikrit/better-files · GitHub
Basically plan to move all the symbolic DSL to an optional import.

Please file issues here: Issues · pathikrit/better-files · GitHub
I created a new tag related to SPP here.
Please tag any issues related to SPP with above tag.

I don’t think platform modules should be in the scala namespace, which implies that they are part of the language core.

2 Likes

You are right. Well I am open to better name for better-files (I am terrible at naming)