Scala-json-ast SP Proposal

@gmethvin and @jrudolph - There are a number of choices for how to represent JSON which are not entirely arbitrary, and by leaving an API without implementation, the consumer of the API cannot have any confidence about e.g. performance. Thus, having a common structure with implementation is a considerable addition of value to consumers (not producers) of JSON ASTs. Having to use reflection to inspect whether maybe you actually got the slow or non-compliant or whatever version from Group X is not very much fun.

So I don’t think implementing API only is workable. One could leave out unsafe and just say: the safe API is the real one, and it’s your job to get it there. On the other hand, the unsafe API was built with careful attention to what you actually need during JSON parsing, without assuming things that may simply not be true. For example, JSON objects may (though it is not recommended) have multiple values for the same key. It’s very easy to decide that JSON object representation is going to be map only, and then the AST is permanently unable to represent entirely valid JSON.

I think if you look through carefully, you’ll find that the normal AST covers the common use cases quite well, and unsafe covers the possible use cases quite well. As someone who might like to write stuff downstream from the API, that’s as much as I want to have to think about. I don’t want to have to worry that play-json and spray-json might secretly have different behavior once I’ve gotten a supposedly common API for them.

So I know as maintainers of libraries that produce JSON ASTs, it seems like a pain to have to use just one, but if you don’t, I don’t know why I as a consumer would care about a common AST at all.

2 Likes

Thanks for the explanation @Ichoran. I appreciate the work that has already gone into the current version. The problem I see is while I might want to use particular parts of the implementation (like the numericStringHashcode), I don’t want to be forced to use it. Performance and API requirements are different for different use cases and that’s one of the reasons that there are so many different JSON implementations.

So, it seems like two steps are taken with one: first step would be if anyone at all can agree about an API (and maybe provide tools or building parts of an implementation) before going a step further to see whether anyone can also agree on (parts of) an implementation.

For me, that’s not the most important part about such an integration library. As a user the most important part that code written against one library can interact with code that was written against the other library. As a library author it’s important that integration with other libraries is possible without being restrained on implementation choices.

Reality seems to disagree as there are multiple JSON libraries with different kinds of APIs and users seem to care enough not to have predominantly chosen a single one.

Users weren’t really given the choice because these JSON libraries were part of a wider framework (i.e. play-json/spray-json/lift-json). In fact, the only framework free JSON libraries were json4s (and we were actually approached by the spray guys years ago to make a common AST because spray wanted to use json4s-ast) and circe, which is quite recent

I released sjson-new 0.8.0-M2 if anyone is interested in Jawn binding for parsing, Spray-origin pretty/compact printers, or backend-agnostic codecs for ScalaJSON 1.0.0-M2.

I don’t really understand the goals of the unsafe implementation. It seems to me the best argument for having it is that it supports some edge cases in the spec like duplicate keys, but it also supports some invalid things like NaN for JSON numbers that I don’t want. And I don’t see the value of having two steps for parsing, first creating unsafe.JValue then converting to JValue. That is just going to make things slower.

For all practical purposes the safe API accomplishes everything that is necessary for an interoperable JSON library. If someone wants to do something unsafe or non-interoperable they can use another API, but I doubt we are going to have a real consensus on what is the best API for doing that, and I think it’s distracting us from the more important goal.

I think it might be nice to have just an interface you could instead of a full implementation, but I acknowledge there are some potential issues with that, since it then allows implementors to change the way equality works or dramatically change performance characteristics. As long as enough people agree that this AST makes sense, I’m ok with either providing integration code or switching over completely to the scalajson implementation in the next major version of the various JSON libraries.

The JSON standard specifies multiple ways you can parse JSON, and some of those ways aren’t considered sane (i.e. duplicate keys in a map). The goal of unsafe is being able to parse JSON according to the specification. Also if you are parsing JSON that you always know will be valid JSON (i.e. from the jsonb column in a postgres database) then you can use unsafe.

The fact that unsafe always maintains the original format of the JSON means you can use it do proper checksum comparisons too

I wish this was the case, but unfortunately the JSON spec has a lot of corner cases. Ordering of keys makes sense (although its annoying in terms of datastructures because making an immutable map that also preserves insertion order is hard), however stuff like duplicate keys in a JSON object/how number handling is done makes things a bit more complicated (unfortunately)

The JSON spec allows implementations to allow duplicate keys, but I think it’s safe to assume you’re only working with “interoperable” JSON as defined by the spec, for example, from http://rfc7159.net/rfc7159#rfc.section.4

An object whose names are all unique is interoperable in the sense that all software implementations receiving that object will agree on the name-value mappings. When the names within an object are not unique, the behavior of software that receives such an object is unpredictable. Many implementations report the last name/value pair only. Other implementations report an error or fail to parse the object, and some implementations report all of the name/value pairs, including duplicates.

So my interpretation is that as an implementor of a JSON parser, you’re not required to support duplicate keys. Considering that many implementations do not support duplicate keys, including the original JavaScript implementation, I can’t think of a good practical reason to allow that.

But I recognize that other people have different opinions on this, and that’s exactly why I think the goal here should be to provide a common interface (and perhaps a simple default implementation) to provide the interoperability rather than trying to completely replace the AST.

Unfortunately its probably a bit too late for that now. This issue has been open for literally a year, and everyone (including Play) knew the intentions of what scala-json-ast. In fact even a PR was made against play-json for this reason. As @eed3si9n pointed out in Release 1.0.0 ¡ Issue #13 ¡ mdedetrich/scalajson ¡ GitHub, a lot of these discussions are going round in circles and these points have been discussed and decided on a very long time ago

I don’t recall the PR you made against Play. I saw the branch you linked in https://github.com/playframework/play-json/issues/76, but as far as I can tell that was never submitted to either play-json or playframework in the past.

There wasn’t a PR submitted, I just made a branch from a fork (you can see it here https://github.com/mdedetrich/playframework/tree/scalaJsonAstIntegration).

It wasn’t submitted as a PR because ScalaJSON library wasn’t properly released yet (it wasn’t accepted into the scala-platform at the time). The point of the branch was to demonstrate its actually quite easy to integrate the AST while maintaining mostly source compatibility (iirc that branch also passes the JSON tests)

YAML is an extension to JSON, among others aimed at human readability.

Could we also have a standard Scala YAML AST module that extends the standard Scala JSON AST?

Current document with the current scalajson issues https://docs.google.com/document/d/1Xcaj7Alvmhch8nUf31L0UzELd5Pf8tzf2x6XbwQCeUQ/edit?usp=sharing