Twelve rules are cited below as part of a test to determine whether a product that is claimed to be fully relational is actually so. Use of the term “fully relational” in this report is slightly more stringent than in my Turing paper (written in 1981). This is partly because vendors in their ads and manuals have translated the term “minimally relational” to “fully relational” and partly because in this report, we are dealing with relational DBMS and not relational systems in general, which would include mere query-reporting systems.
However, the 12 rules tend to explain why full support of the relational model is in the users’ interest. No new requirements are added to the relational model. A grading scheme is later defined and used to measure the degree of fidelity to the relational model.
This particular article seems less important thirty years later, but it was vital in the early days of relational systems to understanding what, precisely, a relational database management system ought to do and—just as importantly—what it ought not do. It wasn’t enough to slap SQL on top of a hierarchical database platform and call it relational.
Spark is built around the concept of Resilient Distributed Datasets. If you have not read Matei Zaharia, et al’s paper on the topic, I highly recommend it:
Spark exposes RDDs through a language-integrated API similar to DryadLINQ  and FlumeJava , where each dataset is represented as an object and transformations are invoked using methods on these objects.
Programmers start by defining one or more RDDs through transformations on data in stable storage (e.g., map and filter). They can then use these RDDs in actions, which are operations that return a value to the application or export data to a storage system. Examples of actions include count (which returns the number of elements in the dataset), collect (which returns the elements themselves), and save (which outputs the dataset to a storage system). Like DryadLINQ, Spark computes RDDs lazily the first time they are used in an action, so that it can pipeline transformations.
In addition, programmers can call a persist method to indicate which RDDs they want to reuse in future operations. Spark keeps persistent RDDs in memory by default, but it can spill them to disk if there is not enough RAM. Users can also request other persistence strategies, such as storing the RDD only on disk or replicating it across machines, through flags to persist. Finally, users can set a persistence priority on each RDD to specify which in-memory data should spill to disk first.
The link also has a video of their initial presentation at NSDI. Check it out.
Before I go on to the main body of this text, I would like to make a short digression about security in general.
Security is often in conflict with other interests in the programming trade. You have users screaming for a solution, and they want it now. At this point, they don’t really care about security, they just want to get their business done. But if you give them a solution that has a hole, and that hole is later exploited, you are the one that will be hung. So as a programmer you always need to have security in mind, and make sure that you play your part right
One common mistake in security is to think “we have this firewall/encryption/whatever, so we are safe”. I like to think of security of something that consists of a number of defence lines. Anyone who has worked with computer systems knows that there are a lot of changes in them, both in their configuration and in the program code. Your initial design may be sound and safe, but as the system evolves, there might suddenly be a security hole and a serious vulnerability in your system.
By having multiple lines of defence you can reduce the risk for this to happen. If a hole is opened, you can reduce the impact of what is possible to do through that hole. An integral part of this strategy is to never grant more permissions than is absolutely necessary. Exactly what this means in this context is something I shall return to.
This is a must-read for anyone interested in rights management in SQL Server.