When developing on Azure Cosmos DB, Microsoft’s globally distributed, horizontally partitioned, multi-model database service, it’s useful to use the local emulator. At the moment the Web Interface of the data explorer is oriented for the SQL API and it sounds impossible to create a graph query on the emulator. This is false, the UI is not aware of these features but the engine supporting the Graph API.
You can access the emulator engine by the means of the .Net librairies. Create a new console project in Visual Studio and add the package
Microsoft.Azure.Graphs. This package is in preview, you’ll need to specify to the package manager that you’re accepting the previews.
Read on to learn more.
However, with recent focus on big data for many of my clients, we have experienced an increase in different business requests that requires for many-to-many data modelling. Consequently, as a Microsoft shop we’ve had to turn to other non-Microsoft products to ensure that we optimally respond to such business requests. Not surprisingly, ever since word got around that graph database will be part of SQL Server 2017, we’ve been looking forward to this latest release of SQL Server. Having played around with the graph database feature in SQL Server 2017, we have noticed that unlike other graph database vendors, plotting and visualising the data out of the graph database is not readily available in SQL Server 2017. Luckily, thanks to SQL Server R, you can easily plot and visualise SQL Server 2017 graph database data without turning to 3rd party plugins. In this article, I demonstrate how SQL Server Machine Learning Services (previously known as SQL Server 2016 R Services) can be used to plot a diagram according to the data defined in a SQL Server 2017 graph database.
The igraph library is a good one; there’s a lot of power in it that this post just introduces.
Positive (Forward) Direction
I’d also like to see the tables use a forward direction naming rather than reverse (like “Written By”). So perhaps:
($from_id) the member Wrote the post ($to_id)
($from_id) who Likes who/what ($to_id)
($from_id) the reply to the main post RepliesTo the main post ($to_id)
Avoid passive voice. That’s good advice in general.
The possibility to use both technologies together is very interesting. Using graph objects we can store relationships between elements, for example, relationships between forum members. Using R scripts we can build a cluster graph from the stored graph information, illustrating the relationships in the graph.
The script below creates a database for our example with a subset of the objects used in my article and a few more relationship records between the forum members.
Click through for the script.
One of the simplest concepts when computing graph based values is that of
centrality, i.e. how central is a node or edge in the graph. As this
definition is inherently vague, a lot of different centrality scores exists that
all treat the concept of central a bit different. One of the famous ones is
the pagerank algorithm that was powering Google Search in the beginning.
tidygraphcurrently has 11 different centrality measures and all of these are
centrality_*for easy discoverability. All of them returns a
numeric vector matching the nodes (or edges in the case of
This is a big project and is definitely interesting if you’re looking at analyzing graph data.
To put the graph database to the test, I took bunch of emails from a particular MVP SQL Server distribution list (content will not be shown and all the names will be anonymized). On my gmail account, I have downloaded some 90MiB of emails in mbox file format. With some python scripting, only FROM and SUBJECTS were extracted:writer.writerow(['from','subject']) for index, message in enumerate(mailbox.mbox(infile)): content = get_content(message) row = [ message['from'].strip('>').split('<')[-1], decode_header(message['subject']),"|" ] writer.writerow(row)
This post walks you through loading data, mostly. But at the end, you can see how easy it is to find who replied to whose e-mails.
Graph databases are based on graph theory, and employ nodes, edges, and properties. The graph theory is the study of the graphs that are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of nodes, edges which are connected by edges, arcs, or lines.
A graph can be directed or undirected (uni or bi-directional) that might point the direction of the relationship between the edges.
Graph databases can be compared to the Network Model Databases, that were focusing on solving the same problem the interconnected world.
The most popular graph database in the world currently is NEO4J, which is implemented in Java and is using CQL (Cypher Query Language), a language that has definitely inspired the SQL Graph T-SQL language extension.
Niko notes that this is not a fully mature product yet, but it’s an interesting start.