I want to know the progress of my functions in general, and there doesn’t seem to be a way to do this included in the core library (although it’s my first week with Scala). I think I am simply looking for a trait like as follows to be added to the core libraries, and then classes like CombinationsIter to take the ProgressObserver as a parameter;
This would allow for functions to inform their callers about how long something is taking, something I do all the time in regular object oriented / non functional programming
… class CombinationsIter(n: Int, observer: ProgressObserver) extends AbstractIterator[Repr] {
/**
* Observes progress of functions which have been passed large amounts of data.
* [email protected]
*/
trait ProgressObserver {
/**
* Informs the ProgressObserver how many of the things are to be done in total.
* @param p
*/
def setTotal(p : BigInt);
/**
* @return The number at which this observable would like to be informed
* of progress out of the total. Also note that this allows avoiding the % modulo
* operator on each loop increment, which is about 600% faster than using the
* modulo operator. see stack overflow (link above);
*/
def getIncrement: BigInt;
/**
* Report progress to the ProgressObserver
* @param i
*/
def reportProgress(i : BigInt);
}
Sure, I could go that way, but it involves copy and paste *.scala with a minor change to a large percentage of the functions. If I do go that way, I will probably do it in Java instead of Scala. I think the maintainers of Scala should include it with the language functions.
Hum, I’m not sure I understand. Do you want every method such as foreach and map on all collections to have that feature? Because if yes, modifying the standard library to include that functionally would dramatically harm performance for all the existing use cases, which is not acceptable.
If you just one or two additional methods with this behavior, you can add them with implicit classes.
Or perhaps you’d want to consider writing a conversion to an ObservableIterator that takes an observer as parameter, and implements next by calling the observer and delegating to an underlying iterator. That way you can implement your functionality once for all collections and you’ll receive all the standard methods for free.
That’s really not necessary in most cases. Something like this should be perfectly adequate (put your own logic in as necessary if you want observers):
class CounterFn {
val count = new java.util.concurrent.atomic.AtomicLong(0)
def apply[A, B](f: A => B): A => B =
a => { count.getAndIncrement; f(a) }
}
Then you can
val c = new CounterFn()
xs.map{ c(x => x+1) }
which is useless, but you can send c to something else before you call the map to let it watch the progress.
There is a small bit of extra boilerplate, but I think it’s probably worth it to inform people that this kind of progress logging is going on (which could have important consequences when refactoring, since progress logging isn’t referentially transparent).
Anyway, the performance hit you get would make it completely untenable to build it in to the standard library, but luckily you don’t need to (as shown by the example above).
(You could also create a custom collection, but the approach above lets you get progress of everything, and you can elaborate the pattern to achieve all kinds of things–make it a loan pattern if you want to know when the operation is done, etc…)
Yep, I do want the functionality in most functions / all over the place (foreach, map, etc). I also want it done in a standard way and included with the common Scala libraries.
If I did write a ObservableIterator where should it go, and how would you suggest incorporating it with code like;
var pairs = topic.toList.combinations(2).map{ case Seq(x, y) => (x, y) }.toList;
I think it would still require adding function overloading, ie in this case;
Then in the new combinations method, you could create the ObservableIterator, and pass it to other methods. Although in the case of combinations, it uses its own iterator the CombinationsIter class. Also I don’t think it would even effect the Big O performance of any of the functions, but there would be a small performance hit for sure. Possibly enough to justify writing the code twice.
Your solution seems to track progress either before or after the thing I am trying to track progress on. I.E.
private class CounterFn(n: Int) {
logMe("created CounterFn " + n)
val logIncrement : Int = 100;
val count = new java.util.concurrent.atomic.AtomicLong(0)
def apply[A, B](f: A => B): A => B =
a => {
count.getAndIncrement;
if (count.get()%logIncrement == 0) {
logMe("processed " + count + " out of " + n);
}
f(a)
}
}
val cfn: CounterFn = new CounterFn(S.length);
var pairs = S.toList.map( cfn(x => x+1) ).combinations(2).map{ case Seq(x, y) => (x, y) }.toList;
The above will track progress before the combinations method starts execution? This is NOT what I’m looking for!
Most time-consuming methods on collections take closures.
Should you encounter one that doesn’t, yes, you would have to rewrite it if logging of that part of the process is critical.
Also, I don’t know why you decided to instrument the first map rather than the second one; the second does O(n^2) work instead of just O(n). You can also instrument both maps.
Anything a particular function is doing at a high level of abstraction. In the combinations example I am trying to track how many combinations have been made out of the total number of potential combinations. However, this could be for any logic that goes in a function (which process a collection [i.e. a large amount of data]), and is not specific to the combinations function in any way.
I guess I should also point out, that I will want to notify an actual end user with a GUI progress bar. Although my initial Use-Case is as a programmer processing large amounts of data. This makes two Use-Cases for adding the feature.
Thanks, I am looking into tap. For the specific case of CombinationsIter I have my doubts that it will work (since it is its own iterator), but it may work in other cases. I will post what I find here;
combinations returns an iterator, anyway, which means that it won’t be doing any work until the next map runs. So in this case you can just put the logic in the second map and get what you want. In other cases that might not be true.
More specifically I added ChainingOpts as an inner class to my class and tried;
val cfn: CounterFn = new CounterFn(S.length);
var pairs = new ChainingOps(S.toList.combinations(2)).tap( cfn(x => x+1) ).map{ case Seq(x, y) => (x, y) }.toList;
This gave me a fairly confusing message;
Error:(22, 9) value class may not be a member of another class
class ChainingOps[A](private val self: A) extends AnyVal {
With ChainingOps in another file and this;
var pairs = new ChainingOps(S.toList.combinations(2)).tap(
cfn(x => x+1) ).map{ case Seq(x, y) => (x, y) }.toList;
I get;
Error:(95, 18) type mismatch;
found : Int(1)
required: String
cfn(x => x+1) ).map{ case Seq(x, y) => (x, y) }.toList;
tap isn’t needed here. Just map{ cfn{ case Seq(x, y) => (x, y) } }. It’s still an iterator at this point, so it’ll be called as the data is processed.
If you don’t want to do anything, just reuse map with an identity function. It’s slightly wasteful but you probably don’t care that much. .map{ cfn(x => x) }. No need to mess with implicit classes.