Progress Observer Notification

Adligo · June 14, 2019, 5:40am

I want to know the progress of my functions in general, and there doesn’t seem to be a way to do this included in the core library (although it’s my first week with Scala). I think I am simply looking for a trait like as follows to be added to the core libraries, and then classes like CombinationsIter to take the ProgressObserver as a parameter;

This would allow for functions to inform their callers about how long something is taking, something I do all the time in regular object oriented / non functional programming

https://www.scala-lang.org/api/2.12.3/scala/collection/SeqLike.html

https://stackoverflow.com/questions/15596318/is-it-better-to-avoid-using-the-mod-operator-when-possible

…
class CombinationsIter(n: Int, observer: ProgressObserver) extends AbstractIterator[Repr] {

/**
  * Observes progress of functions which have been passed large amounts of data.
  * [email protected]
  */
trait ProgressObserver {
  /**
    * Informs the ProgressObserver how many of the things are to be done in total.
    * @param p
    */
  def setTotal(p : BigInt);

  /**
    * @return The number at which this observable would like to be informed 
    * of progress out of the total.  Also note that this allows avoiding the % modulo 
    * operator on each loop increment, which is about 600% faster than using the 
    * modulo operator.  see stack overflow (link above);
    */
  def getIncrement: BigInt;

  /**
    * Report progress to the ProgressObserver
    * @param i
    */
  def reportProgress(i : BigInt);
}

sjrd · June 14, 2019, 10:18am

It looks like something you could implement yourself in a separate library. Have you tried doing that?

Adligo · June 14, 2019, 2:30pm

Sure, I could go that way, but it involves copy and paste *.scala with a minor change to a large percentage of the functions. If I do go that way, I will probably do it in Java instead of Scala. I think the maintainers of Scala should include it with the language functions.

sjrd · June 14, 2019, 3:19pm

Hum, I’m not sure I understand. Do you want every method such as foreach and map on all collections to have that feature? Because if yes, modifying the standard library to include that functionally would dramatically harm performance for all the existing use cases, which is not acceptable.

If you just one or two additional methods with this behavior, you can add them with implicit classes.

Or perhaps you’d want to consider writing a conversion to an ObservableIterator that takes an observer as parameter, and implements next by calling the observer and delegating to an underlying iterator. That way you can implement your functionality once for all collections and you’ll receive all the standard methods for free.

Ichoran · June 14, 2019, 3:33pm

That’s really not necessary in most cases. Something like this should be perfectly adequate (put your own logic in as necessary if you want observers):

class CounterFn {
  val count = new java.util.concurrent.atomic.AtomicLong(0)
  def apply[A, B](f: A => B): A => B =
    a => { count.getAndIncrement; f(a) }
}

Then you can

val c = new CounterFn()
xs.map{ c(x => x+1) }

which is useless, but you can send c to something else before you call the map to let it watch the progress.

There is a small bit of extra boilerplate, but I think it’s probably worth it to inform people that this kind of progress logging is going on (which could have important consequences when refactoring, since progress logging isn’t referentially transparent).

Anyway, the performance hit you get would make it completely untenable to build it in to the standard library, but luckily you don’t need to (as shown by the example above).

(You could also create a custom collection, but the approach above lets you get progress of everything, and you can elaborate the pattern to achieve all kinds of things–make it a loan pattern if you want to know when the operation is done, etc…)

Adligo · June 14, 2019, 3:34pm

Yep, I do want the functionality in most functions / all over the place (foreach, map, etc). I also want it done in a standard way and included with the common Scala libraries.
If I did write a ObservableIterator where should it go, and how would you suggest incorporating it with code like;

var pairs = topic.toList.combinations(2).map{ case Seq(x, y) => (x, y) }.toList;

I think it would still require adding function overloading, ie in this case;

def combinations(n: Int, o: ProgressObserver): Iterator[Repr]

Then in the new combinations method, you could create the ObservableIterator, and pass it to other methods. Although in the case of combinations, it uses its own iterator the CombinationsIter class. Also I don’t think it would even effect the Big O performance of any of the functions, but there would be a small performance hit for sure. Possibly enough to justify writing the code twice.

Adligo · June 14, 2019, 3:52pm

Ichoran,

Your solution seems to track progress either before or after the thing I am trying to track progress on. I.E.

  private class CounterFn(n: Int) {
    logMe("created CounterFn " + n)
    val logIncrement : Int = 100;
    val count = new java.util.concurrent.atomic.AtomicLong(0)
    def apply[A, B](f: A => B): A => B =
      a => {
        count.getAndIncrement;
        if (count.get()%logIncrement == 0) {
          logMe("processed " + count + " out of  " + n);
        }
        f(a)
      }
  }
val cfn: CounterFn = new CounterFn(S.length);
    var pairs = S.toList.map(  cfn(x => x+1) ).combinations(2).map{ case Seq(x, y) => (x, y) }.toList;

The above will track progress before the combinations method starts execution? This is NOT what I’m looking for!

Ichoran · June 14, 2019, 4:14pm

Most time-consuming methods on collections take closures.

Should you encounter one that doesn’t, yes, you would have to rewrite it if logging of that part of the process is critical.

Also, I don’t know why you decided to instrument the first map rather than the second one; the second does O(n^2) work instead of just O(n). You can also instrument both maps.

Adligo · June 14, 2019, 4:24pm

Ichoran, It was just an example of how I tried to use your suggestion.

dcsobral · June 14, 2019, 4:22pm

What is the thing you are trying to track progress of?

Adligo · June 14, 2019, 4:32pm

Anything a particular function is doing at a high level of abstraction. In the combinations example I am trying to track how many combinations have been made out of the total number of potential combinations. However, this could be for any logic that goes in a function (which process a collection [i.e. a large amount of data]), and is not specific to the combinations function in any way.

Adligo · June 14, 2019, 4:40pm

I guess I should also point out, that I will want to notify an actual end user with a GUI progress bar. Although my initial Use-Case is as a programmer processing large amounts of data. This makes two Use-Cases for adding the feature.

dcsobral · June 14, 2019, 4:52pm

You can use tap() for that.

Adligo · June 14, 2019, 5:03pm

Thanks, I am looking into tap. For the specific case of CombinationsIter I have my doubts that it will work (since it is its own iterator), but it may work in other cases. I will post what I find here;

scottcarey · June 14, 2019, 5:01pm

It is a trivial change to move the function up before the check.

Adligo · June 14, 2019, 5:11pm

Scottcary, please elaborate with examples after viewing lines 206-250;
https://github.com/scala/scala/blob/v2.12.3/src/library/scala/collection/SeqLike.scala#L1

Ichoran · June 14, 2019, 5:16pm

combinations returns an iterator, anyway, which means that it won’t be doing any work until the next map runs. So in this case you can just put the logic in the second map and get what you want. In other cases that might not be true.

Adligo · June 14, 2019, 5:41pm

I’m on 2.12 so tap isn’t available
https://github.com/scala/scala/blob/v2.13.0-M5/src/library/scala/util/ChainingOps.scala#L1

More specifically I added ChainingOpts as an inner class to my class and tried;

val cfn: CounterFn = new CounterFn(S.length);
var pairs = new ChainingOps(S.toList.combinations(2)).tap( cfn(x => x+1) ).map{  case Seq(x, y) => (x, y) }.toList;

This gave me a fairly confusing message;
Error:(22, 9) value class may not be a member of another class
class ChainingOps[A](private val self: A) extends AnyVal {

With ChainingOps in another file and this;

var pairs =  new ChainingOps(S.toList.combinations(2)).tap(
      cfn(x => x+1) ).map{  case Seq(x, y) => (x, y) }.toList;

I get;
Error:(95, 18) type mismatch;
found : Int(1)
required: String
cfn(x => x+1) ).map{ case Seq(x, y) => (x, y) }.toList;

Ichoran · June 14, 2019, 6:21pm

tap isn’t needed here. Just map{ cfn{ case Seq(x, y) => (x, y) } }. It’s still an iterator at this point, so it’ll be called as the data is processed.

If you don’t want to do anything, just reuse map with an identity function. It’s slightly wasteful but you probably don’t care that much. .map{ cfn(x => x) }. No need to mess with implicit classes.

Adligo · June 14, 2019, 6:23pm

Ichoran, Hey thanks so much, as I said I’m new to the language!