Require C4 process for libraries?

sjrd · November 30, 2016, 8:55pm

I have read most of the draft documents, and one overall idea bothers me. I feel like, as a library author, including my library in the Platform would be like abandoning it in the hands of a community, losing control over it. After thinking quite a lot about this today and this evening, I think most of my feeling boils down to C4.

Honestly, I do not like C4. C4 basically forbids code review by design. Yes, you have to provide a “correct” patch, and that implies adhering to coding style, passing tests, etc. But it gives no room for deep reviews to discuss and improve the design of a patch. According to C4, if a patch satisfies its mathematical criteria, it SHALL be merged, and rapidly to boot. Quoting from Section 2.4:

.15. Maintainers SHALL NOT make value judgments on correct patches.
.16. Maintainers SHALL merge correct patches from other Contributors rapidly.

These two bullets, by definition, go against deep reviews.

C4 also overall removes any notion of a library author, which leads to the feeling that you are abandoning your library to the system. And C4 effectively removes all control from the hands of a library author. Instead, it puts the control in the hands of contributors (not even maintainers).

C4 might work for ZeroMQ. It might even work for lots of different projects. I do not think it will work for most Scala libraries. Why? Because API design in Scala is hard. We can achieve amazing library APIs in Scala thanks to its power, but this requires careful thought. Not uncontrolled patch “wars”. Design is hard precisely because we have so much power and so many ways to address a concern; finding the right one takes thought and discussion. Moreover, binary compatibility is hard; designing an API addition in a way that it will be able to evolve in source and binary compatible ways later is hard. If something is merged without thinking it through, there is a risk that we are stuck with designs we cannot evolve.

To be completely honest, if C4 is imposed on my library the minute it enters the Platform, I will be more than happy to keep my libraries out of it. Maybe I’m the only one who feels that way, and then it doesn’t matter. But if several library authors voice the same concern, there might be something to change.

How to fix?

I strongly suggest that the Process not impose C4 on all libraries of the platform. Library “founders” (authors) should have the choice regarding the kind of development process they want for their libraries. If they have made it so far, there’s a good chance they know what they’re doing and want to keep it that way. The Process could suggest C4 as a default choice for authors without a strong opinion.

None of what I have read in the draft documents suggests that C4 is necessary for high-level goals of the Platform, so dropping this restriction should be fine, AFAICT. I think this would make the Platform Process much more friendly to library authors.

Ichoran · November 30, 2016, 9:04pm

I share your concerns, especially with regard to the existing standard library which could rapidly become a cluttered and inconsistent mess.

Would it still work if #15 were dropped and/or if correctness were extended to include adhering to a design goal document? That is, if a design goal of a module is to provide the minimal functionality that gives expressive power, you might very well say that adding a collectNot method, while somewhat convenient, is not correct by virtue of missing the design goal (in addition to being something of a misuse of PartialFunction).

For brand new projects this wouldn’t work, but I think the idea is that things that go in should be reasonably mature. It ought not be too much to ask to have a short document describing the philosophy behind the module, against which technically correct but design-incorrect patches can be measured.

As an example, the design document for the existing Scala collections would state that methods should appear as high in the hierarchy as they reasonably make sense so that, for instance, a patch implementing sliceUntil[A](Int, p: A => Boolean) on Vector alone would be rejected.

sjrd · November 30, 2016, 9:11pm

Maybe. It might be tricky to write such a document in sufficiently rigorous terms that it encompass all the necessary “rules”. It would be instructive–although extremely time-consuming–to try one’s hand at this game: try and come up with a rigorous document for an existing library, then “apply” it retroactively to a large-ish number of (external) closed pull request, to see if the document would have taken the same decision as the reviewer, including the comments that led to improvements of the PR.

Blaisorblade · December 2, 2016, 11:58am

I was also concerned about that. What’s crucial, and not stated, is why is C4 proposed, and whether there are other approaches. I’ve never seen a process imposed by other platforms. I suspect the goal is that the community can improve the libraries that are included, but imposing C4 might be a bit extreme. Maybe the committee can make sure the libraries have a working approach for contributor engagement?

To be sure, I think C4 is cool, and I’ve seen a pathology that clause is probably meant to address: working and useful patches held up for years because of “we want to rewrite the entire codebase to do this properly”. That’s a standard (anti)pattern in Haskell land. I’d like to see again Hjentjens rationale for that though.

In fact, we can observe the same pattern on the Scala library — think of the design issue with mapValues. We have multiple plausible solutions and nothing happened. If C4 is ever appropriate, it should be applied there, and I’d bet we’d have something better by now. This way, scala-library would lead by example.
And I’m not sure the standard library has enough active maintainers that take ownership to make it work (or that have the time for it), so I’m not sure @sjrd’s concerns about authorship apply.

C4 already has provisions for API compatibility (which could be tuned maybe) so it’s not insane. I mean, ZeroMQ has had libraries and compatibility problems, and solved them through C4.

But I don’t think the platform can succeed if it imposes such a specific process over libraries. It’s also not proven that you can introduce this process in arbitrary existing communities without destroying them—if a library became worthy of inclusion it must have something right. I’m not sure the danger is as high as you’d fear (the pull request hack is more extreme, yet I haven’t seen it destroy any project), but at least the perceived danger is an obstacle.

Finally, as with all social processes, I’d expect that C4 has a lot of magic sauce in how you use it which is (hopefully) in Hjentens docs. And imposing it from outside the community sure violates that “magic sauce”.

jvican · December 2, 2016, 2:05pm

I see how you could have drawn these conclusions from these two C4 clauses, but I don’t think “C4 forbids code review by design”. In fact, if module maintainers have problems with a patch, they are encouraged to submit other patches to fix it or to “guide” the original contributor about a good design (being more of a mentor). C4 specifies that “correct” patches shall be merged rapidly, but everything hinges on the definition of correct. To consider a patch “correct”, the Patch Requirements section specifies that:

2.3.2 A patch SHOULD be a minimal and accurate answer to exactly one identified and agreed problem.

So, in this case, an identified and agreed problem is defined mainly by the module maintainers, that is, you. I think that along with the problem definition, you can also point out concrete API or application requirements, and some ideas to tackle the problem.

I agree that the clauses that you mention may sound too harsh, and we should at least lower its tone. I’m happy to remove them also and find an alternate solution, but I don’t see why C4 should not be the contract that specifies the contribution guidelines for all the Scala Platform. Having a unique contract for contributors to learn instead of several and different rules that change from module to module is an improvement over the status quo. And, also, I think it’s interesting to note that C4 has worked not because it was applied to concrete programming languages, but because the social rules were reasonable for both maintainers and contributors.

This is exactly how C4 was born, although I understand that you want this experiment to be performed in Scala land. For those wondering about the rationale behind C4, check: https://hintjens.gitbooks.io/social-architecture/content/chapter4.html.

jvican · December 2, 2016, 2:16pm

The benefit of defining the rules upfront is predictability. Everyone knows what to expect from other people. Remember that these rules are up for discussion. At this stage of the debate, I’m not sold on the argument that we should just remove C4 completely because two clauses are problematic, we can always change the rules as proposed by @Ichoran.

A cool thing that Pieter explains about his process is that it didn’t require him to “enforce” or “use” it, it automatically regulated people in the Community and kept bad actors out of the game without his intervention. I’m unsure if there’s such a thing as a “magic sauce” that makes the process effective.

SethTisue · December 2, 2016, 3:41pm

I agree — very strongly. Sébastien and Paolo have expressed the reasons well.

I think this should definitely not be part of the SPP at launch time — unless it’s included merely as a suggestion of a possible process that individual projects could consider adopting if they want.

I don’t think “require C4, but with individual clauses we don’t like removed or amended” is an improvement on “require C4”. The problem here is requiring something contentious and complicated. I think the platform should not do this. Fiddiling with the details of what’s required doesn’t fix that.

SethTisue · December 2, 2016, 3:50pm

P.S. To be clear, it isn’t that I don’t like C4 in particular. (I have never participated in an open source community that was governed by it.) I’m objecting to imposing a required process at all — except any bare minimum that is truly necessary (e.g. accepting the code of conduct).

Blaisorblade · December 2, 2016, 8:12pm

I think, in practice, I’d add to that being on GitHub and using pull requests to that. That’s a lower bar, but I think we want the platform to allow contributions.

I agree having clear processes is important. I also think that rules like “be on GitHub, review PRs” and so on are important and would get sufficient buy-in.
It’d be cool in a sense to have uniform processes if possible; but beyond some point, this requires “herding cats”.

That’s one reason why the standard library (and other official Scala repository) would have to set the example. I’m honestly convinced that C4 would solve some serious problems we do have. Introducing that process will still take some serious cat herding, but

If that fails, there’s no chance of having this in the Scala Platform
If that succeeds, we’ll gain experience on how C4 works for us, and other projects might follow the example.

You might be right—but since we have seen other processes been misapplied disastrously*, I’d expect it might be the case for C4. For instance, the concept of “correct patch” has lots of leeway (as seen in this thread) that allows for dysfunctional instances.
I’ve also skimmed a bit Chapter 4 - The ZeroMQ Process: C4 · Social Architecture, but I’m not convinced by the evidence that it “just works”. But I’m convinced by the arguments that it’s worth trying on the Scala Library.

@sjrd Code review still happens through further patches, which might even revert the existing ones.

Ichoran · December 2, 2016, 8:27pm

I would prefer if C4 were made recommended rather than optional. That way if someone doesn’t feel like being a guinea pig, they don’t have to adopt it.

If you want to demonstrate that C4 is clearly the way to go, I’d suggest adopting it (and highlighting it) for all the code that goes into the infrastructure to help SPP authors.

And it is possible that some modules will “fail”, in that they will go unmaintained/poorly maintained. We need to solve that problem anyway, C4 or no, and if they fail because of a lack of C4, a rescue mechanism could impose it (if its lack was really a major problem) among other steps, while an abandonment mechanism could allow space for an alternative library (whether fork or simply a different one) to replace it; if C4 is important in maintaining high-quality modules that can be improved over time, then it will naturally be selected for.

Blaisorblade · December 2, 2016, 8:56pm

Thinking back to one point:

C4 doesn’t allow adding features willy-nilly, you’d need to demonstrate a problem and seek consensus on its diagnosis first.
OTOH, I’m not sure the current standard library can claim that much consistency.

Blaisorblade · December 2, 2016, 8:57pm

That’s also an important point (which might deserve a separate thread)—how do libraries get added/replaced/removed? The SPP already mentions this, so kudos:

Those modules that fall unmaintained are removed in every major release.

but I suspect this might need to be refined to ensure smoother migration (similar to the deprecation policies). If, say, HTTP library need be replaced, how would the process look like? Will there be a drop-in replacement? If not, I’d love a way to migrate gradually, such as making the new library available for the old platform or something.
In fact, would the platform ever contain 2 HTTP libraries? If yes, don’t I have to pick again? If no, how would I migrate?

odersky · December 3, 2016, 12:24pm

I think, generally we need to give much more power to contributors of the platform, and limit the power of the review committee. Otherwise, why would anybody contribute to the platform? What’s the incentive? The way I see it, the oversee committee is a good first step. But it should largely step aside and let the platform be defined by the people of commit to it and maintain it, once there is a group of such people.

In that sense, platform and SIP are really fundamentally different. SIP is a standardization effort. We want to make sure that all Scala implementations move in the same direction, and that that direction is communicated clearly. The platform is “just” a convenience thing: A collection of modules that are kept reasonably coherent and that people might find useful. There is no assumption that it should be the only one of its kind.

jvican · December 3, 2016, 10:48pm

The SPP Committee only decides the library to be included in the Scala Platform and “incubates” it. That’s all its control over modules. If that process is successful, it’s immediately released with the next Platform version. Details here.

No, the idea is that one need, one module. For instance, let’s hypothesize and say that the SPP Committee decides that it needs test frameworks, one for property-checking and the other one for unit testing. In this case, both scalacheck and scalatest would be eligible because they solve different needs.

I think that once a module falls unmaintained, the process would look like this:

The SPP Committee analyzes the situation. Why has the module fell unmaintained? Was it used?
1.1. If it was not used, the SPP Committee can decide not to replace it.
1.2. If it is used, the SPP Committee calls for an alternative (and a member that submits a SPP proposal).
If there are several alternatives, the Committee starts the review process as usual.
If there is only one alternative, it automatically incubates it to be released as soon as possible.

This process would be executed before removing the unmaintained module from the Platform.

I’m starting to see this as a viable solution. We could just recommend it, and I like your idea of enforcing it as a “rescue mechanism” if a library falls unmaintained. However, I want to highlight that C4 would be more useful preventing these issues if it was adopted from the beginning.

These are the reasons why we decided that C4 was a good idea:

It prevents uncomfortable situations with bad actors. Our community has already had issues with this in the past. Prevention is better than cure.
It gives reasonable defaults to lead a project and engage contributors. In fact, it gives contributors some guarantees that make it more likely that they continue their contributions and even become maintainers. Library authors will not stay forever, they will come and go, so having a mechanism to elect them is important.
It makes contributions easy for outsiders. Imagine a company that wants a quick way of assigning a developer to work on a project (or maintain it) because they depend on it in production. If there are concrete rules to do this, following the process is straightforward. Maintainers will need a hand… sooner or after, to take care of their projects. We cannot completely rely on them because their absence and the lack of motivation in contributors will lead our projects to fall unmaintained.

Having said this, if people still feel that C4 is an inconvenience to library authors (to be honest, I just see it as an advantage that makes my life easier) we can indeed recommend it and not require it. But I’m dubious if this will interfere with the end goals of the SP: creating communities around the Platform and helping Scala developers by providing a stable collection of modules.

Ichoran · December 4, 2016, 7:48pm

Requiring C4 will interfere if it prevents a critical mass of modules being added to the Scala Platform. If I had something that I thought might warrant inclusion (I do not), I would seriously think twice if I had to adopt C4 instead of whatever was already working for me.

Concerns about bad actors and engaged contributors are irrelevant if there isn’t any acting or anything to contribute to. (Besides, the code of conduct covers most of the “bad actor” stuff. C4 on top doesn’t seem like it would do much more.)

If C4 is that big an advantage, it should be able to prove itself and be adopted by the initiative of the module maintainer(s). If it’s not that big an advantage, it’s not that important whether it’s used or not.

Ichoran · December 4, 2016, 7:59pm

Problem descriptions are rarely so intricately detailed as to prevent inconsistency when implemented piecemeal, and it is very easy to implement library functions piecemeal. That the current library isn’t great in this regard should be a cause for concern about C4’s approach, not a reason to shrug and say, “how much worse could it be?”

It could be much, much worse.

Suppose we decide that it makes sense to rename takeWhile to takeUntil and to add a takeTo also, mirroring the until/to distinction on ranges. One person submits a Vector patch, another submits one for List, and nobody bothers ArrayBuffer. Now what? Or we get separate implementations on half the collections, then someone who understands CBF better builds it into IteratableLike but with a different signature.

This sort of stuff is, in the long run, just awful, and it is very difficult to predict in advance everywhere it might arise. You can list some general principles: (“we favor maintainable code and code reuse over custom solutions whenever practical”), but it absolutely has, IMO, to be the review process where these things are applied and patches are refined or rejected.

It’s not 100% clear whether this is counter to the literal wording of C4 but it is counter to the “it fixed a problem, merge it” attitude.

Blaisorblade · December 5, 2016, 9:53am

Examples of C4 in practice seem to confirm your concern. For instance, in this example:

I don’t implement a matching zmq_getsockopt because “minimal” means what it says. There’s no obvious use case for getting the value of an option that you presumably just set, in code. Symmetry isn’t a valid reason to double the size of a patch. I did have to document the new option because the process says, “All Public Contracts SHOULD be documented.”

I’ve also looked at the libzmq pull requests and most have little discussion; though most PRs are about implementations, not APIs. In fact, discussions are in issues on problems (random example: More explicit docs for setsockopt() with strings regarding NUL-termination · Issue #2169 · zeromq/libzmq · GitHub).

I’m still not sure your scenario is so tragic for real humans — but that’s, again, an experiment to be done. If it attracts enough contributors, people would either revert a toxic patch or improve it. The API additions would have to be marked as @experimental.

Either way: we seem to be slowly converging on C4 for platform libraries, but I still think C4 for the standard library might be worth trying.

jvican · December 14, 2016, 3:26pm

7 posts were split to a new topic: Why is MPL recommended to module maintainers of the Scala Platform?

Blaisorblade · December 14, 2016, 3:27pm

A post was merged into an existing topic: Why is MPL recommended to module maintainers of the Scala Platform?

jvican · December 14, 2016, 3:45pm

A post was merged into an existing topic: Why is MPL recommended to module maintainers of the Scala Platform?