Scala-json-ast SP Proposal

Note that the current Scala Platform Process document says of modules, “Platform modules should stress stability and usability alike, and enjoy widespread use in the Scala community. Modules should be of a nature that aids the goal of the Scala Platform and should have compatible licenses with the rest of the Platform modules.”

It doesn’t define the words “should” and “must” and so on, but normally the words should and recommended refer to things that are not absolutely necessary but count against a proposal or implementation. I think it’s totally fair for the lack of widespread use presently for scala-json-ast to count against it, but I don’t think it’s fair for that to knock it out of consideration without further examination, especially in light of expressed interest.

The issue is, these 2 points can conflict with eachother and in this case they are.

No, the need for a standard library in the Scala Platform isn’t lessened just because there’s a library that is already widely used. The batteries still aren’t included until it’s in.

I don’t think we should just dismiss that the “wide adoption” part is missing. Even if it is a chicken-and-egg situation with respect to this particular library, it nonetheless raises somewhat more concern when something isn’t battle-tested.

But when you really need these kind of batteries, it’s worth examining things that one can reasonably expect to be widely used. The question for the committee is whether it fits the goals of the Scala Platform so well that the lack strict adherence to every single guidelines (they are guidelines!) should be overlooked.

And I agree that in this particular case, adoption is a chicken-and-egg situation. It’s very clear that it is: the library authors who might adopt it have flat out said so, repeatedly.

I am not trying to completely “dismiss” the point, I am just saying its relevance is dependant on context of the SPP.

The current design of the AST, for example, is a slightly altered one which many projects have used (its based off the original json4s/spray JSON design). So although you can claim that it doesn’t have widespread adoption, at the same time its not as if its never been used before. Other points is that its design is very trivial (as opposed to, lets say, a HTTP library or a collections library).

Some of the battle testing has already been done, just under a different name. I also suspect that once there is an indication of its approval, library authors will use the library in the next milestone of the major release, where it will undergo a lot of testing.

My point was that the relevance of widespread adoption was dependent on context! I’m not sure whether we’re agreeing, or whether you’re going one level more meta.

Anyway, I don’t think in practice we particularly disagree. I just think that the discussion should be along the lines of “it is worth it because” rather than “it’s totally irrelevant” w.r.t. adoption.

1 Like

Yeah I think we pretty much agree and we are just getting more meta, as you said.

This library has been officially incubated in the Scala Platform, congratulations! This means the following:

  • Library authors have complete access to Scala Platform’s infrastructure.
    • Automatic release process.
    • Drone integration with caching and several customization features.
    • Official group ids are at maintainers’ disposal. They can release under them if they desire so.
  • Library authors will take part into future decisions regarding Scala Platform Process’s rules.
  • There will be a final vote to accept this proposal into the Scala Platform. This final vote will be done whenever library maintainers feel it’s the right moment to end the incubation.

More information in the official Scala Platform Process.

Incubation period

The incubation period is the perfect moment for gathering developers around your library, creating a community, cleaning up APIs (note that changes in public APIs cause binary incompatibility and are done every year and a half), accepting PRs, creating well-documented issues that people can work on, et cetera.

Next steps:

  • Library maintainers accept Scala Center’s Code of Conduct and use it in their projects.
  • Library maintainers decide the license they will use (they can stay with the same they have).
  • Library maintainers decide whether they endorse C4 or not.
  • Libraries have Gitter channels and pertinent CONTRIBUTION guidelines for people to submit paches/PRs!

Remember that taking decisions on these issues is extremely important for creating a community around the modules – our end goal! You can also participate in the current open debates to abstain from recommending C4 / MPLv2 or changing the major cycle to one year instead of 18 months; your opinion is highly valuable, so please comment.

At the Scala Center, we’re planning to run a series of hackathon in well-known Scala conferences to encourage people to hack on open issues of Scala Platform modules and join their community. Our goal is to boost the success of the Platform and help us get to a point where we can all benefit from a high-quality collection of modules. This is why having CONTRIBUTION guidelines, tips to developers and a getting started guide is important – consider writing them.

Scala JSON AST

In the case of this proposal, there has been a lot of interest to have a published jar available under an official group id. Matthew showed interest in our last meeting to use something along the lines of org.scala-lang.modules. I’m currently figuring out the official name for modules with the Lightbend team, but another suggestion would be org.scala-lang.platform. I’ll let you know the final group id that will be fetchable from Maven.

Regarding infrastructure, I’ll contact library maintainers to make the transition in the next days. Thank you very much for getting involved in this collaborative effort to improve the experience of Scala developers all around the world.

3 Likes

@jvican Can you please provide a response to https://github.com/mdedetrich/scala-json-ast/issues/12

i.e. What GroupId should I be using? Also for the new package, I was thinking of putting it under json.ast instead of scala.json.ast

Thoughts?

Also I want to try and get this released before SBT 1.0.0 is released

Hey Matthew, thanks for the ping.

I’ve just answered your question in the issue.

Would like to announce that the first release of scalajson on the proper organization and package name has occurred, you can view the details in the github repo here https://github.com/mdedetrich/scalajson (tl;dr the dependency is now "org.scala-lang.platform" %% "scalajson" % "1.0.0-M1" and "org.scala-lang.platform" %%% "scalajson" % "1.0.0-M1" for scala.js)

Thanks for everyone that has helped

2 Likes

Artifacts are already in Maven Central: https://search.maven.org/#search|ga|1|org.scala-lang.platform.

I wrote down a few notes about how spray-json could integrate with scalajson here: https://github.com/spray/spray-json/issues/232. I think it would be good if the scalajson project itself would give a few suggestions how to integrate.

I guess, so far, the simple “shallow” integration is the most likely outcome for spray-json.

I wonder if it is too late to make any substantial changes to the proposal? My suggestions would be removing the unsafe AST and not providing a concrete implementation of the AST at all but only interfaces of the AST nodes for json libraries to implement. What do other maintainers of other json libraries say? (See the discussion for circe here: https://github.com/circe/circe/issues/690#issuecomment-311956866) Is this a forum where json library maintainers could discuss these things?

I will respond in the issue on spray-json

As a maintainer of play-json I pretty much agree with @jrudolph. I think the goal here should be to provide common APIs. It’s great to have another set of models for helping with parsing, but I don’t see why that needs to be part of the Scala platform, especially since I don’t think everyone will agree that those specific models are the right one to use. For example I don’t agree that Array/js.Array is necessarily the most efficient data structure to use for JSON objects.

The AST for the “safe” version is very similar to the play-json AST so we could probably easily do the conversion in play-json 3.0.

@gmethvin and @jrudolph - There are a number of choices for how to represent JSON which are not entirely arbitrary, and by leaving an API without implementation, the consumer of the API cannot have any confidence about e.g. performance. Thus, having a common structure with implementation is a considerable addition of value to consumers (not producers) of JSON ASTs. Having to use reflection to inspect whether maybe you actually got the slow or non-compliant or whatever version from Group X is not very much fun.

So I don’t think implementing API only is workable. One could leave out unsafe and just say: the safe API is the real one, and it’s your job to get it there. On the other hand, the unsafe API was built with careful attention to what you actually need during JSON parsing, without assuming things that may simply not be true. For example, JSON objects may (though it is not recommended) have multiple values for the same key. It’s very easy to decide that JSON object representation is going to be map only, and then the AST is permanently unable to represent entirely valid JSON.

I think if you look through carefully, you’ll find that the normal AST covers the common use cases quite well, and unsafe covers the possible use cases quite well. As someone who might like to write stuff downstream from the API, that’s as much as I want to have to think about. I don’t want to have to worry that play-json and spray-json might secretly have different behavior once I’ve gotten a supposedly common API for them.

So I know as maintainers of libraries that produce JSON ASTs, it seems like a pain to have to use just one, but if you don’t, I don’t know why I as a consumer would care about a common AST at all.

2 Likes

Thanks for the explanation @Ichoran. I appreciate the work that has already gone into the current version. The problem I see is while I might want to use particular parts of the implementation (like the numericStringHashcode), I don’t want to be forced to use it. Performance and API requirements are different for different use cases and that’s one of the reasons that there are so many different JSON implementations.

So, it seems like two steps are taken with one: first step would be if anyone at all can agree about an API (and maybe provide tools or building parts of an implementation) before going a step further to see whether anyone can also agree on (parts of) an implementation.

For me, that’s not the most important part about such an integration library. As a user the most important part that code written against one library can interact with code that was written against the other library. As a library author it’s important that integration with other libraries is possible without being restrained on implementation choices.

Reality seems to disagree as there are multiple JSON libraries with different kinds of APIs and users seem to care enough not to have predominantly chosen a single one.

Users weren’t really given the choice because these JSON libraries were part of a wider framework (i.e. play-json/spray-json/lift-json). In fact, the only framework free JSON libraries were json4s (and we were actually approached by the spray guys years ago to make a common AST because spray wanted to use json4s-ast) and circe, which is quite recent

I released sjson-new 0.8.0-M2 if anyone is interested in Jawn binding for parsing, Spray-origin pretty/compact printers, or backend-agnostic codecs for ScalaJSON 1.0.0-M2.

I don’t really understand the goals of the unsafe implementation. It seems to me the best argument for having it is that it supports some edge cases in the spec like duplicate keys, but it also supports some invalid things like NaN for JSON numbers that I don’t want. And I don’t see the value of having two steps for parsing, first creating unsafe.JValue then converting to JValue. That is just going to make things slower.

For all practical purposes the safe API accomplishes everything that is necessary for an interoperable JSON library. If someone wants to do something unsafe or non-interoperable they can use another API, but I doubt we are going to have a real consensus on what is the best API for doing that, and I think it’s distracting us from the more important goal.

I think it might be nice to have just an interface you could instead of a full implementation, but I acknowledge there are some potential issues with that, since it then allows implementors to change the way equality works or dramatically change performance characteristics. As long as enough people agree that this AST makes sense, I’m ok with either providing integration code or switching over completely to the scalajson implementation in the next major version of the various JSON libraries.

The JSON standard specifies multiple ways you can parse JSON, and some of those ways aren’t considered sane (i.e. duplicate keys in a map). The goal of unsafe is being able to parse JSON according to the specification. Also if you are parsing JSON that you always know will be valid JSON (i.e. from the jsonb column in a postgres database) then you can use unsafe.

The fact that unsafe always maintains the original format of the JSON means you can use it do proper checksum comparisons too

I wish this was the case, but unfortunately the JSON spec has a lot of corner cases. Ordering of keys makes sense (although its annoying in terms of datastructures because making an immutable map that also preserves insertion order is hard), however stuff like duplicate keys in a JSON object/how number handling is done makes things a bit more complicated (unfortunately)