Hash collision vulnerability in default Map and Set implementations

Currently implementations of immutable HashMap/HashSet are vulnerable under hash collision based attacks. Also they are selected as default implementations for Map /Set and are used widely for some operations with Scala collections underhood: toMap , toSet , keySet, keys , distinct , groupBy, etc.

In 2003, Crosby and Wallach wrote an excellent paper demonstrating a new form of Denial of Service vulnerability against applications by abusing algorithmic complexity in the data structures that they depend on: http://static.usenix.org/event/sec03/tech/full_papers/crosby/crosby_html/

In 2011, Alexander Klink and Julian demonstrated this form of attack against web servers that take attacker-provided HTTP headers and represent them internally as hashtables:
https://events.ccc.de/congress/2011/Fahrplan/events/4680.en.html

In 2018, James Roper opened a security issue against HashMap but it was closed with won't fix resolution - only mutable version of CollisionProofHashMap was added for Scala 2.13.x (without backporting to Scala 2.12.x):

Do we have a possibility to fix the problem for Scala 2 or Scala 3?

1 Like

I think we could come up with an immutable version also for the next release. Note that it has to be a new class because the solution is use an ordering-based map for collided keys, which means that the collision-proof maps must require an ordering on the keys coming in. Otherwise it’s not type-safe.

3 Likes

What about default Map/Set and using them in Scala collections? Will them be revised/refactored?

The default Map and Set do not require an Ordering[A], so no, they cannot default to the collision-proof implementations.

Unless we made them require one, but that would break a lot of code, in particular any code that uses a default Map[A, _] where A is unconstrained in that code.

1 Like

…and are we going to fix it for default implementations and Scala collections or not?

You could add overloads that take an Ordering, or add extra methods like collisionProofGroupBy.
Or you could add some kind of Hash{Map|Set} building abstraction:

def groupBy[K](f: (A) => K)(implicit b: CanHazCollisionProofMap[K]): b.Map[K, List[A]]

Well, if Map and Set start to require Ordering then probably everyone would start using HashMap and HashSet directly. Only a tiny percentage of classes have defined ordering, but maps and sets are universally needed.

1 Like

Sorry if this is a terrible idea, but just thinking aloud – what about making up an ordering that just covers the 99% of common cases? Basically, we could impose an ordering on Any which handles all the most common types: (boxed)-primitives, Strings, products, collections, and anything that doesn’t fall into one of the handled cases goes at the end in a big bucket of “equivalent” elements. Something like…

unit, null, all of the Ints in order…, all of the Chars in order…, all of the Longs in order…, (and so on for primitives), all of the Strings, in order…, all of the Seqs in order, all of the Sets in order… all of the Maps in order…, all of the Product1’s in order…, all of the Product2’s, in order…, anything else is at the end in a big bucket.

The exact shmattering of which special cases to include aren’t important for the point. But we could certainly include Strings.

Java’s HashMap does exactly that (special-case things), and it’s not collision-proof. It’s not even type-safe; they broke type-safety by fixing the common cases. From the API docs:

Note that using many keys with the same hashCode() is a sure way to slow down performance of any hash table. To ameliorate impact, when keys are Comparable , this class may use comparison order among keys to help break ties.

Sounds reasonable, right?

class A
class B(val value: String) extends A with Comparable[B] { ... }
class C(val value: Int) extends A with Comparable[C] {... }

val h = new java.util.HashMap[A, String]

If you populate this hash map with colliding keys, you’ll get a runtime exception, as it tries to use Comparable to compare the keys, but without the Comparable actually being able to work (and you can’t tell at runtime because of erasure). (I haven’t checked whether they catch the exception, but if they do, it makes the DoS vastly more effective as you replace a quick key lookup with a very expensive exception.)

So, no, this is broken by design. It cannot work, and we should not do it this way.

We could try to special-case String alone, and that would probably succeed safely, but would leave everything else vulnerable, including (String, String) and all sorts of stuff like that.

So we need a principled solution: to avoid O(n) work on length-n collisions, demand an ordering on the keys. Then there are no sneaky hidden surprises: you get safety when you ask for it.

Note that the Rust maintainers, who are great sticklers for avoiding vulnerabilities, do not provide a collision-resistant HashMap; they instead require you to use a non-stupid Hash. If collision-DoS is a problem, you have to do your part.

In Scala, because the Java String has a disastrously easy-to-collide hash function, we can’t take that option. But we can still follow the principle of making it correct, not “in practice usually better but things might randomly explode too, good luck!”.

4 Likes

Can’t we just use a hash function with some random salt?

I think you misunderstand, I’m not talking about assuming comparables are all comparable with each other, I’m talking about inventing a special case ordering which would not blow up, no matter what keys you give it. For instance, if we ONLY handle Ints and Strings, it could be something like:

  val o = new Ordering[Any] {

    def group(any: Any): Int = any match {
      case _: Int => 0
      case _: String => 1
      case _ => 2
    }

    override def compare(x: Any, y: Any): Int = {
      val gx = group(x)
      val gy = group(y)
      val diff = gy - gx

      if (diff == 0) {
        gx match {
          case 0 => Ordering.Int.compare(x.asInstanceOf[Int], y.asInstanceOf[Int])
          case 1 => Ordering.String.compare(x.asInstanceOf[String], y.asInstanceOf[String])
          case 2 => 0
        }
      } else diff
    }
  }

and then anything can be compared:

scala> o.compare("a", "b")
res3: Int = -1

scala> o.compare("a", "a")
res4: Int = 0

scala> o.compare(0, 0)
res5: Int = 0

scala> o.compare(0, 1)
res6: Int = -1

scala> o.compare(0, "hi")
res7: Int = 1

scala> o.compare("hi", "hi")
res8: Int = 0

scala> o.compare("hi", Nil)
res9: Int = 1

scala> o.compare(Nil, Nil)
res10: Int = 0

scala> o.compare(Nil, List(1))
res11: Int = 0

And if the values are unrecognized types, then we punt and say “dunno, they’re equal in this ordering I guess”

As in comparing

scala> o.compare(Nil, List(1))
res11: Int = 0

So this would be scoped to the implementations of HashSet and HashMap?

If all it does is improve the poor behavior of something which is already misbehaving, I don’t see a problem with it.

I would be a bit leery of adding something like this if it’s accessible in a wider scope than just an implementation bandaid for these two classes.

all products can also be handled in a single special case (in pseudo-scala):

if (x.isInstanceOf[Product] && y.isInstanceOf[Product]) {
  val arX = x.productArity
  val arY = y.productArity
  if (arX != arY) arY - arX
  else (recursivelyCompareLexicographically(x.productIterator, y.productIterator, maxDepth = 10))
}

or something like that…

correct, I’m talking about a completely private[this] solution, and arguably it doesn’t even have to explicitly be an Ordering it could just be code.

This is sort of tangental, but why do this rather than just if (gx == gy) ?

Because I used diff twice

1 Like

@joshlemer - That’s better than what Java does, in one way, because it at least won’t explode; but it’s worse in another way, which is if you’re doing anything a little complicated, it won’t work (and you won’t have any indication that it won’t work).

I really think we should err on the side of making it easy to detect whether there is a vulnerability or not, rather than the side where we try to catch a subset of vulnerabilities without anyone being the wiser.

I also think we should consider adding a Dict class that is a hash map with string keys (or keys with a typeclass that stringifies them) that is collision resistant. This has the advantage of (1) likely being faster than anything else; (2) allowing us to use a higher-quality hash implementation like MurmurHash3; (3) handling most common use cases anyway; and (4) being easier to type so encouraging correct, safe usage by default.

4 Likes