SQL Server 2017 and Azure SQL Database introduced native graph database capabilities used to model many-to-many relationships. The first implementation of SQL Graph introduced support for nodes to represent entities, edges to represent relationships and a new MATCH predicate to support graph pattern matching and traversal.
We will be further expanding the graph database capabilities with several new features. In this blog we will discuss one of these features that is now available for public preview in SQL Server 2019, Edge Constraints on Graph Edge Tables.
In the first release of SQL Graph, an edge could connect any node to any other node in the database. With Edge Constraints users can enforce specific semantics on the edge tables. The constraints also help in maintaining data integrity. This post describes how you can create and use edge constraints in a graph database. We will use the following graph schema created in the WideWorldImporters database for the samples discussed here.
I know that SQL Server 2017 was a bit underwhelming for graph database work, so I will be interested in seeing how much of the gap they cover in this release.
The authors study the computational complexity of GraphQL looking at three central questions:
- The evaluation problem: what can we say about the complexity of GraphQL query evaluation?
- The enumeration problem: how efficiently can we enumerate the results of a query in practice?
- The response size problem: how large can responses get, and what can we do to avoid obtaining overly large response objects?
In order to do this, they need to find some solid ground to use for reasoning. So the starting point is a formalisation of the semantics of GraphQL.
This is a review of a published academic paper rather than a how-to guide, so it’s math-heavy. I am enjoying seeing the development of normal forms for graph processing languages—it’s the beginning of a new generation of normalization purists.
What it does: Estimates a current node’s importance from its linked neighbors and then again from their neighbors. A node’s rank is derived from the number and quality of its transitive links to estimate influence. Although popularized by Google, it’s widely recognized as a way of detecting influential nodes in any network.
How it’s used: PageRank is used in quite a few ways to estimate importance and influence. It’s used to suggest Twitter accounts to follow and for general sentiment analysis.
PageRank is also used in machine learning to identify the most influential features for extraction. In biology, it’s been used to identify which species extinctions within a food web would lead to biggest chain reaction of species death.
If you are interested in getting into graph databases, it’s useful to know these algorithms.
Closure tables are plain ordinary relational tables that are designed to work easily with relational operations. It is true that useful extensions are provided for SQL Server to deal with hierarchies. The HIERARCHYID data type and the common language runtime (CLR) SqlHierarchyId class are provided to support the Path Enumeration method of representing hierarchies and are intended to make tree structures represented by self-referencing tables more efficient, but they are likely to be appropriate for some but not all the practical real-life hierarchies or directories. As well as path enumerations, there are also the well-known design patterns of Nested Sets and Adjacency Lists. In this article, we’ll concentrate on closure tables.
A directed acyclic graph (DAG) is a more general version of a closure table. You can use a closure table for a tree structure where there is only one trunk, because a branch or leaf can only have one trunk. We just have a table that has the nodes (e.g. staff member or directory ‘folder’) and edges (the relationships). We are representing an acyclic (no loops allowed) connected graph where the edges must all be unique, and where there is reflexive closure. (each node has an edge pointing to itself)
Take the time to read this one carefully, as I think this model is applicable much more often than it’d appear at first blush.
Gremlin is the graph traversal language of Apache TinkerPop, an open source Graph Computing Framework. Gremlin allows the users to write complex queries to traverse their graphs by using a composed sequence of steps, with each step performing an operation on the data stream (further details here). There are 4 fundamental steps:
· transform: transform the objects in the stream
· filter: remove objects from the stream
· sideEffect: pass the object, but yield some side effect
· branch: decide which step to take
Click through for a quick example showing how to create and populate a graph.
When developing on Azure Cosmos DB, Microsoft’s globally distributed, horizontally partitioned, multi-model database service, it’s useful to use the local emulator. At the moment the Web Interface of the data explorer is oriented for the SQL API and it sounds impossible to create a graph query on the emulator. This is false, the UI is not aware of these features but the engine supporting the Graph API.
You can access the emulator engine by the means of the .Net librairies. Create a new console project in Visual Studio and add the package
Microsoft.Azure.Graphs. This package is in preview, you’ll need to specify to the package manager that you’re accepting the previews.
Read on to learn more.
However, with recent focus on big data for many of my clients, we have experienced an increase in different business requests that requires for many-to-many data modelling. Consequently, as a Microsoft shop we’ve had to turn to other non-Microsoft products to ensure that we optimally respond to such business requests. Not surprisingly, ever since word got around that graph database will be part of SQL Server 2017, we’ve been looking forward to this latest release of SQL Server. Having played around with the graph database feature in SQL Server 2017, we have noticed that unlike other graph database vendors, plotting and visualising the data out of the graph database is not readily available in SQL Server 2017. Luckily, thanks to SQL Server R, you can easily plot and visualise SQL Server 2017 graph database data without turning to 3rd party plugins. In this article, I demonstrate how SQL Server Machine Learning Services (previously known as SQL Server 2016 R Services) can be used to plot a diagram according to the data defined in a SQL Server 2017 graph database.
The igraph library is a good one; there’s a lot of power in it that this post just introduces.
Positive (Forward) Direction
I’d also like to see the tables use a forward direction naming rather than reverse (like “Written By”). So perhaps:
($from_id) the member Wrote the post ($to_id)
($from_id) who Likes who/what ($to_id)
($from_id) the reply to the main post RepliesTo the main post ($to_id)
Avoid passive voice. That’s good advice in general.
The possibility to use both technologies together is very interesting. Using graph objects we can store relationships between elements, for example, relationships between forum members. Using R scripts we can build a cluster graph from the stored graph information, illustrating the relationships in the graph.
The script below creates a database for our example with a subset of the objects used in my article and a few more relationship records between the forum members.
Click through for the script.
One of the simplest concepts when computing graph based values is that of
centrality, i.e. how central is a node or edge in the graph. As this
definition is inherently vague, a lot of different centrality scores exists that
all treat the concept of central a bit different. One of the famous ones is
the pagerank algorithm that was powering Google Search in the beginning.
tidygraphcurrently has 11 different centrality measures and all of these are
centrality_*for easy discoverability. All of them returns a
numeric vector matching the nodes (or edges in the case of
This is a big project and is definitely interesting if you’re looking at analyzing graph data.