Press "Enter" to skip to content

Category: Graph

Cross-Database Graph Query Problems

Louis Davidson receives a nastygram from SQL Server’s graph functionality:

Just understand that if you need any of the graphDb underlying data structures, you will need to find their actual physical name and use it. I would definitely suggest never accessing these columns via any method other than the pseudocolmns for production code (because you have no way to predict the column names from dev to prod (you cannot specify the names when creating a table), but this following code does work:

Click through to see the issue and Louis’s workarounds.

Comments closed

Graph Analysis with NetworkX

Tori Tompkins introduces us to a Python package:

NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex graphs. It’s a really cool package that contains heaps of graph algorithms for all different uses. In this tutorial, I will cover how to create a graph from an edge list and different ways we can query it.

Unsure what a graph is exactly? Check out my Data Science Moments video which introduces graphs and their uses in 5 minutes:

Click through for that video, as well as a way to load, process, and display graph data.

Comments closed

Working with GraphX in Spark

Tomaz Kastrun continues a series on Spark with a look at GraphX. Part 20 gives an overview of GraphX:

GraphX is Spark’s API component for graph and graph-parallel computations. GraphX uses Spark RDD and builds a graph abstraction on top of RDD. Graph abstraction is a directed multigraph with properties of edges and vertices.

Part 21 shows off the operators available:

Property graphs have collection of operators, that can take user-defined function and produce new graphs with transformed properties and structure. Core operators are defined in Graph and compositions of core operators are defined as GraphOps, and are automatically available as members of Graph. Each graph representation must provide implementations of the core operations and reuse many of the useful operations that are defined in GraphOps.

Click through for more information on graphs in the Spark ecosystem.

Comments closed

Actions with Edge and Node Tables in SQL Server

Louis Davidson is a man of action:

One of the interesting things about working with many-to-many relationships in SQL Server with graph tables instead of a relational table is that unlike a relational many-to-many table, by default an edge may can implement relationships from lots of different tables (nodes). You can also limit what nodes can be related using which edges.

For example, say you have 4 nodes and 2 edges, both of the edges, by default, each edge would allow relationships from each node to itself, or each node to each other node. It can all get a bit complicated to figure out if you have a lot of objects (and to be fair, you probably also want to be able to check to make sure your objects are configured as you expect.

In this blog, I will demonstrate how to determine, given a given edge or node, what operations are possible. 

Click through to learn more.

Comments closed

Why Have Multiple Edge Constraints in SQL Graph?

Louis Davidson has an explanation for us:

Edge constraints were added in SQL Server 2019 to make the node to edge relationship stricter/enforced, and more like typical foreign key constraints. When used, they define what node types can be used in the from and to position of the edge. What makes edges different than a many-to-many relationship in a relational table is that an edge can implement more than one many-to-many relationship in a single table. To constrain the types of data that can be put into the edge, you can use an edge constraint.

Edge constraints are very similar to implementing foreign key constraints, but there are a few key differences. Foreign keys are between two tables. Edges are between one edge table, and multiple pairs of node tables. In both cases, you can have multiple constraints, even from the same table to the same related table on the same column. However, with edge constraints, because you can have multiple pairs of expressions, and even multiple constraints, it bears discussion. If you have more than one constraint, it has one big negative, but it is allowed to implement one big positive!

Click through for the explanation, as well as an example.

Comments closed

The Basics of Graph Theory

Ernest Martinez gives us a primer on graph theory, as well as a few interesting use cases:

We used a new option in the Oracle database called Spatial Data Option, which allowed us to do multi-dimensional queries based on geographic location and perform shortest path queries in SQL. To access the latitude and longitude of every zip code in the country, we purchased a list of zip code centroids from the US Post Office. We then joined the centroid zip with a store’s zip which gave us an approximate cartesian coordinate for the store.

The first customer to purchase this product was a national muffler company. We POC’d (proof of concept) it initially in the NYC area. The first problem we encountered was that the shortest distance between point A and point B wasn’t necessarily the right answer. For example, to a person living on the north shore of Long Island the nearest shop, as the crow flies, was in Connecticut, across the Long Island Sound. Unless they had a boat, this was definitely not their closest shop. Obviously we needed to introduce cost functions into our algorithms. A high cost across the sound resolved the issue.

Click through for more info and a few stories.

Comments closed

Clarifying Key Errors on Graph Tables in SQL Server

Louis Davidson clears up some noise:

This causes the following error message:

Msg 2627, Level 14, State 1, Line 14Violation of UNIQUE KEY constraint 'AKEdge'. Cannot insert duplicate key in object 'dbo.Edge'. The duplicate key value is (455672671, 0, 455672671, 1).

So what is this: (455672671, 0, 455672671, 1)?

Click through to understand what this all means. Louis also has a quick procedure which looks up those details.

Comments closed

Visualizing SQL Server Graph Tables via TGF

Louis Davidson shows how you can visualize data stored in SQL Server graph tables:

Each node object has its own surrogate key values that start at 0, so if you are going to use the code for more than one node at a time, you have to make the surrogate values unique for the TGF file (see the last blog on importing for more details on that). In the code I make a temp table to stage the objects, so if you have > 1 node, the second set of keys need to start off where the previous ones left off. So the code uses an identity column, and joins to that identity column by schema, table, and edgeId, outputting the unique key:

Read on to see how Louis translates the data into the right format for visualization.

Comments closed

Visualizing Infrastructure with Terraform Graph

Jonathan D’Aloia shows how we can visualize Terraform-based infrastructure with diagrams:

As can be seen from the image above we have every resource that is defined in the Terraform code that is to be deployed. At a first glance, it does appear that not all the information here is of such relevance, for example, the metadata referring to registry or root provider. However, if we look away from this we can begin to see how the Infrastructure model is going to look once it has been deployed.

We can see that we have one resource group called “example” which has an Azure SQL Server and also an Azure Storage account also both called “example” and that all of these resources directly link to the resource group. I would also point out that you can also see that Azure SQL database directly links to the SQL server giving a clear indication of which databases belong to which server.

Click through for an example as well as the process.

Comments closed

Importing Graph Data into SQL Server

Louis Davidson takes us through an interesting problem:

The problem was, if I wanted to recreate this graph in data, I had to type in a bunch of SQL statements (something I generally enjoy to a certain point, but one of my sample files cover the geography of Disney World, and it would take a very long time to manually type that into a database as it took quite a while just to do one section of the park). 

So I went hunting for a tool to do this for me, but ended right back with yEd. The default file type when you save in yEd is GraphML, which is basically some pretty complex XML that was well beyond my capabilities using XML in SQL or Powershell. Realistically I don’t care that much about anything other than just the nodes and edges, and what I found was that you can save graphs in the tool a format named Trivial Graph Format (TGF).

Click through to see it in action.

Comments closed