New collection: SortedList

#1

It’s very easy to implement - just copy from SortedSet. The following is a mutable example for 2.11.8, the version I’m currently working on. I will post a complete example when I finish building it.

package scala.collection.mutable

import scala.collection.generic.MutableSortedSetFactory
import scala.collection.immutable.RedBlackTree.Tree
import scala.collection.immutable.{RedBlackTree => RB}
import scala.collection.{GenSeq, SortedSetLike}
import scala.runtime.ObjectRef

/**
  * A mutable SortedList heavily based on SortedSet which "is not designed to enable meaningful subclassing".
  *
  * Based on: https://github.com/scala/scala/blob/3cc99d7/src/library/scala/collection/mutable/TreeSet.scala
  *
  * @author Mygod
  */
//noinspection ReferenceMustBePrefixed
class SortedList[A] private(treeRef: ObjectRef[RB.Tree[A, Null]], from: Option[A], until: Option[A])
                           (implicit val ordering: Ordering[A])
  extends AbstractSeq[A] with LinearSeq[A] with IndexedSeq[A] with GenSeq[A] with BufferLike[A, SortedList[A]]
  with SortedSet[A] with SetLike[A, SortedList[A]] with SortedSetLike[A, SortedList[A]] with Set[A] with Serializable {
  def insert(elem: A): Boolean = {
    val old = size
    this += elem
    old < size
  }

  // AbstractSeq
  override def apply(i: Int): A = RB.nth(treeRef.elem, i).key // Should i out of range result in NullReferenceException?
  override def length: Int = size

  // IndexedSeqLike
  override def update(idx: Int, elem: A) {
    val x = apply(idx)
    if (!ordering.equiv(x, elem)) {
      remove(x)
      this.add(elem)
    }
  }

  // BufferLike
  override def +=:(elem: A): SortedList.this.type = +=(elem)  // prepending is the same as the list is sorted
  override def insertAll(n: Int, elems: collection.Traversable[A]): Unit = appendAll(elems)
  override def remove(n: Int): A = {
    val result = apply(n)
    remove(result)
    result
  }

  // Optimized methods
  override def clear(): Unit = treeRef.elem = null
  override def indexOf[B >: A](elem: B, from: Int): Int = lookupIndex(treeRef.elem, elem.asInstanceOf[A]) match {
    case result if result < from => -1
    case result => result
  }
  override def last: A = RB.greatest(treeRef.elem).key

  // Tree helper methods, TODO: expand to non-recursive form?
  private def lookupIndex(tree: Tree[A, Null], x: A): Int =
    if (tree eq null) -1 else ordering.compare(x, tree.key) match {
      case 0 => RB.count(tree.left)
      case cmp if cmp < 0 => lookupIndex(tree.left, x)
      case _ => lookupIndex(tree.right, x) match {
        case i if i < 0 => i
        case i => RB.count(tree.left) + 1 + i
      }
    }

  // Copy everything else from TreeSet
}
#2

Thanks for the suggestion. @julienrf | @szeiger are probably the good people to have a look at this. What do you think guys?

@Mygod By the way, have you benchmarked this implementation of sorted list?

#3

Hi, I see no reason to not include it, indeed!

#4

Oh finally somebody replied. I thought this thread is going to die.

Here’s the version I’m using in my app: https://github.com/shadowsocks/shadowsocks-android/blob/17bedaf/src/main/scala/scala/collection/mutable/SortedList.scala

Apparently one class can’t be a Set and Buffer at the same time (thanks to the implementation of SetLike[A, Container[A]], etc.) so I have to make it into a SortedSet that supports Buffer-like operation.

@jvican I haven’t benchmarked it but theoretically everything should run in O(log n). :slight_smile:

#5

Yes, I’m very sorry for the delay, I was on vacation!

Some benchmarks would be nice, but this is more of a long-term goal so it can wait for now. It would be good if you could synchronize with @julienrf to see potential ways we could integrate this into the scala collections. Though the ongoing efforts in the collections front is to redesign and simplify the hierarchy, adding this collection would be really useful for beginners that find it in other languages’ standard libraries.

Pinging @odersky in case he has some thoughts to share with us!

#6

I hope it will be available for Scala 2.11 since it will be years later when we can drop support for Android 4.4-6.0. :frowning:

#7

Given compatibility constraints, it would ship with 2.13 and the new collections library. @julienrf, @szeiger, and @odersky are collaborating on the development of this new library.

Since one of the authors of the new collections library seems open to the inclusion of a SortedList, I would guess that this is a green light for you to contribute an implementation to the collections-strawman repository, pending some benchmarks etc. Am I right @julienrf?

#8

Famous last words:

On a more serious note, I think an efficient SortedList would be a useful addition to the standard library. We already have sorted maps and sets. I"m not sure it should extend any Set base classes though because it’s not a Set (but I can’t think of a specific contract that it violates off the top of my head). I would expect it to extend Iterable.

#9

This is just my 2 cents, but I find a List that is also a Set a bit strange.

#10

It would indeed be useful to validate the compatibility of such a non-Set/Map/Seq collection with the new collections design. For this purpose it doesn’t have to be a complete or optimized implementation, we don’t even need to merge it at this point, but a proof-of-concept PR could demonstrate how it fits into the design.

#11

I think it’s more like a set with indexes. In that case all sorted sets/maps should be indexable IMHO.

#12

@Mygod Progress on the redesign is happening here: https://github.com/scala/collection-strawman. The best thing to get into the hook is to submit a proof-of-concept PR as @szeiger suggests. Either him or @julienrf will tell you what’s the best time to do that :smile:.

Sorting is not a feature inherent to sets. I think that such a collection would be really handy, I’ve required it some times in the past!

I think so too. I agree with the sentiment of benchmarking all the collections redesign to make progress on large strides and with strong evidence of performance improvements. It would be good if you guys can create an issue in the repo to keep track of it, this is a good candidate to outsource it to the community.

#13

I agree 100%. I meant that it is surprising (to me) to see that a *List is a subtype of Set.

#14

Well it wouldn’t be surprising if you’ve used C#/.NET: https://msdn.microsoft.com/en-us/library/system.collections.sortedlist(v=vs.110).aspx

#15

I’m not sure this is the best way to approach the implementation, though. I agree with @Jasper-M that conceptually it’s not very intuitive and the less classes/traits that collections inherit the best for performance (AFAIK).

#16

I think the best way is to extend the current functionalities of SortedSet and SortedMap. They should be indexable.

#17

Oh, I didn’t pay attention that your implementation extends Set! So, what’s the result of e.g. foreach? Do you skip the duplicates or not?

A collection that can contain duplicates can not be a Set.

That being said, the idea of having a SortedList (as well as a SortedVector, that should have better insertion performance) makes sense to me. I would be happy to see how we can include them in the new design.

#18

foreach in my implementation works exactly the same as SortedSet.

If you are thinking sorted collections with duplicate elements, we can have stuff like multiset and multimap in C++ STL.

#19

Ah, OK, so we are just having a naming issue here :slight_smile: It seems that your SortedList is more a Set supporting index access. Maybe we should name it SortedIndexedSet.

I’m not sure we should support all the possible combinations of Sorted, Indexed and {List, Set, etc.} in the standard library but for sure the design of the collections should make it easy to implement a custom combination without having to copy/paste dozens of lines of code of an existing implementation.

2 Likes
#20

Conceptually, SortedList allows duplicate elements. The C# SortedList is only useful when you want the same capabilities than C# SortedSet but you also want to access elements by index. I don’t think this is something as popular to include it in the collection. Wouldn’t a list that allows duplicate elements make more sense?

Having a look at your implemention, I see that you’re not following the C# SortedList interface (that adds a (key, value) pair instead of just an element to the collection).