Mapping over Set - never (?) ending story

I know this subject was discussed hundreds times.

And this shows something is very wrong with core assumptions. Please take a look at
scala-collection-strict project - it contains sample code and description of how I believe this could be addressed.

Long story short: I believe Set builder should throw exception on duplicates rather dropping them silently. Same for Map. We all are used to current behavior (its been there since java 1.0) but maybe its high time to at least discuss if this was the best choice.

[added to clarify]

Of course old behavior has to stay available - it’s still needed to be able to easily include element in the set, regardless if it was already there or not. But I’d like to propose to separate those apis. Let Set.add() add elements, and Set.incl() ensure element is included.

  val a = Set("a", "A")
  a.incl("a") // Set("a","A")
  a.add("a") // duplicate element exception
  a union Set("a", "b") // Set("a","A","b")
  a concat Set("a", "b") // duplicate element exception
  
  a.map(_.toUpperCase) // duplicate element exception
  Set("a", "a") // duplicate element exception

m.

1 Like

I mean, that’s a reasonable desire – but there is tons of real-world code that relies on the fact that adding a duplicate entry to a Map replaces the existing entry with the same key. It’s a very useful tool.

Adding a StrictMap is plausible, but I don’t think changing the semantics of Map is.

6 Likes

Multiset is for retaining multiple elements. (You can check for duplicates in input.)

In this API, add tells you whether the element was added, so that you can fail if you like.

The question is whether Set.apply should be more helpful. The “obvious” example is duplicate literals, which could be linted. As a lint, it doesn’t change runtime behavior. It’s just an analysis of the source text.

An “obviously” useful extension method would be addStrictly which would take a collection and report false if any element was not added. Alternatively, return elements not added.

I’m not eager to see an API that throws.

There is also the gotcha about not assuming anything about set elements that doesn’t belong to its key. The reason it’s OK to discard a duplicate is that you don’t lose information (except that there were two of them).

It happens that I needed this facility recently. To construct a map, I wrote a one-liner named adderall (thanks for that reminder) that throws on put with previous element.

An alternative would be myMap.withDuplicate(kv => ()) which would be respected by addOne the way withDefault is respected by apply on retrieval.

Finally, the other question is whether it’s possible to exclude certain operations deemed unsafe. It’s too bad there aren’t disjoint interfaces for add and addOne. Of course, I’ve already forgotten which is which. The version that returns a boolean or previous is safe. With -Wnonunit-statement, you are required to cope with the result status.

A useful plugin would take a list of symbols to deprecate, and would tack on the deprecation as the symbols are created. Then you could get the compiler to report any API you wish to ban, such as unsafe add.

2 Likes

Is there any programming language’s build-in set with this behavior?

3 Likes

This exactly my point - I somehow feel everyone just repeats that over and over without questioning.
With solution proposed you get both - possibility to conveniently build set by dropping duplicates, and default strict behavior of failing when user-requested operation (of adding element) does not succeed due to unique constraint.

We just need two operations - add/concat separated from incl/union.

Maybe this is good direction - to try to think of Set in terms of constrained collection of unique elements rather than focusing on set algebra.

But set algebra is good !

Personally algebra is the thing I need in my programming languages, if I do something, it does not matter the context, it does the same thing.
If I add an element to a Set, then I know that set will contain that element, and I know it will always succeed !
(The reasoning is the same for maps)

If I care about the previous data, I should query it before I update the map.

I am all for questionning assumptions, and I did !
I did question it when I was learning Scala, but I came to the conclusion that this was the right way.

3 Likes

And if I reword it to “If I include element to a Set”?
It still feels right, isn’t it?

Its all about using name add for operation that sometimes does not add anything.
You are not adding element. You are referring to operation that ensures element is included in the set, adding it if needed. And proposed solution allows it - just tries to clean up this naming mess.

Perhaps this would be useful for mutable sets because then you assume you are changing the Set when adding one. The same behaviour would then be expected when removing elements: fail when the to-be-removed item does not exist.
For immutable Sets the failing would be a weird behaviour. If I do val newSet = someIntSet + 2 I will get back a set which includes the number 2, guaruanteed. If this set is existing or new does not matter.
The same goes for deleting: val newSet = someIntSet - 2 is giving me a set which guarantees it does not contain the number 2.

@ThijsBroersen

Yes, this is exactly how I implemented it. Adding throws and removal also throws.
And samples you wrote become:

 val newSet1 = someIntSet.incl(2)
 val newSet2 = someIntSet.excl(2)

Clear, isn’t it? We could have symbolic alias for incl/excl if one really needs it: val newSet1 = someIntSet | 2. But lets keep add/+ for adding, and incl/| for set algebra.

I think throwing exceptions is a bad practice. Maybe a two step insertion is better? If there is a conflict, let the programmer decides what to do by a return object. Let it be something like a transaction object with methods “cancel” and “acknowledge”.

I’m not so sure why throwing exception when someone tries to execute unallowed (for current conditions) operation would be considered bad practise.

List("a", "b").updated(3, "c") throws.
List.empty.head throws.

The exception means that something irrecoverable happens. Otherwise it is just another unpredictable type of a return value. Unpredictable becouse it is not in the function signature. If I am not sure that list is empty or not, I will use headOpt instead of trying to catch the error

Exactly.

If I’m not sure if element is already in set, I will use incl() instead of add() and trying to catch the error.

But if I’m sure element should not be there, I’ll use add() and expect to get notified if my assumption was violated - probably by some error in the code I should fix.

What if do it this way:

mapBuilder.keyRef(key) match
  case ref: Free => ref.put(newValue)
  case Occupied(oldValue) => throw exception

or

mapBuilder.keyRef(key) match
  case ref: Free => ref.put(newValue)
  case ref: Occupied(oldValue) => ref.put(newValue combine oldValue)

or (for the simplest case)

mapBuilder.keyRef(key).put(newValue)

I believe such way could cover all whims and wishes

Or let the KeyRef trait has 3 methods: insert, upsert, delete to make the code less verbose… It could also reduce the time for searching the existent value if you need to combine.

One thing I occasionally miss is an efficient way to known if an element was added to an immutable set. I don’t want to check first because that’s two lookups instead of one.

I don’t think this is valid (but I wish it were):

val newSet = set + elem
if newSet ne set then ... 

I suppose if newSet.size > set.size then ... is a possibility, assuming size is a cheap operation.

Acutally, your code works as your wish.

Is that guaranteed by the API? The documentation states:

Creates a new set with an additional element, unless the element is already present.

(which is ambiguous), and also:

Returns a new set that contains all elements of this set and that also contains elem.

(which would suggest a new set is always created).

For me to use my “wish code”, I’d need the API to explicitly state that this is returned if an element is already present, for any immutable.Set implementation.