With Eve, it appears that the authors want to achieve similar goals as what I want with SPREAD. The difference is that I'm not too worried about actual users (

*yet).*I'm more concerned about implementation.

At its core, Eve is essentially a datalog engine that is based on a novel constraint propagation algorithm, enabling the efficient joining of datalog relations.

However, the authors seem to struggle to come up with a solution to store and re-use provenance data (constraints) that is captured during joins. The central idea is that via the clever re-use of provenance data - you can

*incrementally*update a materialised join whenever the dependent relations are updated.

Think spreadsheets!

According to the Eve dairy it appears that the authors have tried, but abandoned Self Adjusting Computation (SAC) as a possible solution. I don't know the reason why they have abandoned SAC, because I believe SAC

*can*effectively be used to do incremental joins.

With SPREAD, it is rather easy to translate any algorithm to become Self Adjusting. To be precise, I think SPREAD is more related to 'Self-explaining Computation' as it is to SAC.

The trouble with traditional SAC that it is rather cumbersome. Standard algorithms have to be especially tailored to become Self Adjusting. Not so with SPREAD.

SPREAD is based on the pervasive use of (trace) provenance and forgetful memoization, combined with a single clever data structure, called the Treap.

At this point, I have to disappoint the reader, because the remainder of this post is not going to demonstrate how SPREAD can be used to do efficient joins. But I will demonstrate a SPREAD solution for incremental sums (or any folding operation). Hopefully, that will convince the reader that SPREAD can be a viable solution for the problems the Eve project is facing.

The first thing to know about SPREAD is that any (computed) value holds a (back) reference to its origin. Here is the canonical example for a computed value 21:

((1 + 2) * (3 + 4)) ~> [([(1 + 2) ~> 3] * [(3 + 4) ~> 7]) ~> [(3 * 7) ~>

**21**]]

sum(0 1 2 3 4 5 6 7 8 9) => 45

In SPREAD such list is encoded as an uniquely represented, confluently persistent treap. To sum a treap, the following (scala) algorithm can be evaluated:

def sum(t: TreapSet[Int]): Int = {

if (t.left.isEmpty) {

if (t.right.isEmpty) t.value

else t.value + sum(t.right)

}

else {

if (t.right.isEmpty) sum(t.left) + t.value

else sum(t.left) + t.value + sum(t.right)

}

}

The SPREAD trace (... ~> 45) of the above algorithm would look like this:

sum(0 1 2 3 4 5 6 7 8 9) ~> [((sum(0 1 2 3 4 5) + 6) + sum(7 8 9)) ~> [([(sum(0 1 2 3 4 5) + 6) ~> [([sum(0 1 2 3 4 5) ~> [(sum(0 1 2 3 4) + 5) ~> [([sum(0 1 2 3 4) ~> [(sum(0 1 2 3) + 4) ~> [([sum(0 1 2 3) ~> [(sum(0 1 2) + 3) ~> [([sum(0 1 2) ~> [(sum(0 1) + 2) ~> [([sum(0 1) ~> [(sum(0) + 1) ~> [([sum(0) ~> 0] + 1) ~> [(0 + 1) ~> 1]]]] + 2) ~> [(1 + 2) ~> 3]]]] + 3) ~> [(3 + 3) ~> 6]]]] + 4) ~> [(6 + 4) ~> 10]]]] + 5) ~> [(10 + 5) ~> 15]]]] + 6) ~> [(15 + 6) ~> 21]]] + [sum(7 8 9) ~> [(sum(7 8) + 9) ~> [([sum(7 8) ~> [(sum(7) + 8) ~> [([sum(7) ~> 7] + 8) ~> [(7 + 8) ~> 15]]]] + 9) ~> [(15 + 9) ~> 24]]]]) ~> [(21 + 24) ~> 45]]]

Adding the number 10 to the treap and summing it will result in a total of (... ~> 55). Here is the resulting trace:

sum(0 1 2 3 4 5 6 7 8 9 10) ~> [((

**sum(0 1 2 3 4 5) + 6)**+ sum(7 8 9 10)) ~>**[([(sum(0 1 2 3 4 5) + 6) ~> [([sum(0 1 2 3 4 5) ~**>**[(sum(0 1 2 3 4) + 5) ~> [([sum(0 1 2 3 4) ~> [(sum(0 1 2 3) + 4) ~> [([sum(0 1 2 3) ~> [(sum(0 1 2) + 3) ~> [([sum(0 1 2) ~> [(sum(0 1) + 2) ~> [([sum(0 1) ~> [(sum(0) + 1) ~> [([sum(0) ~> 0] + 1) ~> [(0 + 1) ~> 1]]]] + 2) ~> [(1 + 2) ~> 3]]]] + 3) ~> [(3 + 3) ~> 6]]]] + 4) ~> [(6 + 4) ~> 10]]]] + 5) ~> [(10 + 5) ~> 15]]]] + 6) ~> [(15 + 6) ~> 21]]]**+ [sum(7 8 9 10) ~> [(sum(7 8 9) + 10)**~> [([sum(7 8 9) ~> [(sum(7 8) + 9) ~> [([sum(7 8) ~> [(sum(7) + 8) ~> [([sum(7) ~> 7] + 8) ~> [(7 + 8) ~> 15]]]] + 9) ~> [(15 + 9) ~> 24]]]]**+ 10) ~> [(24 + 10) ~> 34]]]]) ~> [(21 + 34) ~> 55]]]**bold**. That's a lot of similarity!

Notice also, that it is solely due to the nice properties of the treap data structure that we are able to share so much structure. Next to that, this similarity can be exploited via memoization, requiring only log(n) incremental rework after an insertion, deletion, etc.

Memoization in SPREAD is implemented via a (very) weak hash table. As every value holds a (strong) reference to its origin (a trace), there is no need for the memoization table to hold strong references to traces.

Voila, I've just demonstrated the incremental sum of a list that is optimally efficient!