Press "Enter" to skip to content

Category: Graph

Execution Plans of Graph Tables in SQL Server

Hugo Kornelis looks at the execution plan:

Welcome to part twenty-one of the plansplaining series, where we will continue our look at execution plans for graph queries. In the previous post, we looked at the internal structure of node and edge tables, and discovered that they have a few hidden columns. Now let’s look how those columns are used in graph queries.

Read on for the example and a deeper dive into how graph tables actually work.

Comments closed

Plan Analysis of Graph Tables

Hugo Kornelis is back and looking at graphs:

SQL Graph is the name for a set of features, introduced in SQL Server 2017 and extended in SQL Server 2019, that bring graph database functionality into SQL Server. See here for the full documentation as provided by Microsoft. In this first post about SQL Graph, I’ll look at what execution plans reveal about the internal structure of graph tables. I’ll then use that knowledge in later parts, where I’ll discuss more advanced queries on graph tables.

Click through for the primer.

Comments closed

Cross-Database Graph Query Problems

Louis Davidson receives a nastygram from SQL Server’s graph functionality:

Just understand that if you need any of the graphDb underlying data structures, you will need to find their actual physical name and use it. I would definitely suggest never accessing these columns via any method other than the pseudocolmns for production code (because you have no way to predict the column names from dev to prod (you cannot specify the names when creating a table), but this following code does work:

Click through to see the issue and Louis’s workarounds.

Comments closed

Graph Analysis with NetworkX

Tori Tompkins introduces us to a Python package:

NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex graphs. It’s a really cool package that contains heaps of graph algorithms for all different uses. In this tutorial, I will cover how to create a graph from an edge list and different ways we can query it.

Unsure what a graph is exactly? Check out my Data Science Moments video which introduces graphs and their uses in 5 minutes:

Click through for that video, as well as a way to load, process, and display graph data.

Comments closed

Working with GraphX in Spark

Tomaz Kastrun continues a series on Spark with a look at GraphX. Part 20 gives an overview of GraphX:

GraphX is Spark’s API component for graph and graph-parallel computations. GraphX uses Spark RDD and builds a graph abstraction on top of RDD. Graph abstraction is a directed multigraph with properties of edges and vertices.

Part 21 shows off the operators available:

Property graphs have collection of operators, that can take user-defined function and produce new graphs with transformed properties and structure. Core operators are defined in Graph and compositions of core operators are defined as GraphOps, and are automatically available as members of Graph. Each graph representation must provide implementations of the core operations and reuse many of the useful operations that are defined in GraphOps.

Click through for more information on graphs in the Spark ecosystem.

Comments closed

Actions with Edge and Node Tables in SQL Server

Louis Davidson is a man of action:

One of the interesting things about working with many-to-many relationships in SQL Server with graph tables instead of a relational table is that unlike a relational many-to-many table, by default an edge may can implement relationships from lots of different tables (nodes). You can also limit what nodes can be related using which edges.

For example, say you have 4 nodes and 2 edges, both of the edges, by default, each edge would allow relationships from each node to itself, or each node to each other node. It can all get a bit complicated to figure out if you have a lot of objects (and to be fair, you probably also want to be able to check to make sure your objects are configured as you expect.

In this blog, I will demonstrate how to determine, given a given edge or node, what operations are possible. 

Click through to learn more.

Comments closed

Why Have Multiple Edge Constraints in SQL Graph?

Louis Davidson has an explanation for us:

Edge constraints were added in SQL Server 2019 to make the node to edge relationship stricter/enforced, and more like typical foreign key constraints. When used, they define what node types can be used in the from and to position of the edge. What makes edges different than a many-to-many relationship in a relational table is that an edge can implement more than one many-to-many relationship in a single table. To constrain the types of data that can be put into the edge, you can use an edge constraint.

Edge constraints are very similar to implementing foreign key constraints, but there are a few key differences. Foreign keys are between two tables. Edges are between one edge table, and multiple pairs of node tables. In both cases, you can have multiple constraints, even from the same table to the same related table on the same column. However, with edge constraints, because you can have multiple pairs of expressions, and even multiple constraints, it bears discussion. If you have more than one constraint, it has one big negative, but it is allowed to implement one big positive!

Click through for the explanation, as well as an example.

Comments closed

The Basics of Graph Theory

Ernest Martinez gives us a primer on graph theory, as well as a few interesting use cases:

We used a new option in the Oracle database called Spatial Data Option, which allowed us to do multi-dimensional queries based on geographic location and perform shortest path queries in SQL. To access the latitude and longitude of every zip code in the country, we purchased a list of zip code centroids from the US Post Office. We then joined the centroid zip with a store’s zip which gave us an approximate cartesian coordinate for the store.

The first customer to purchase this product was a national muffler company. We POC’d (proof of concept) it initially in the NYC area. The first problem we encountered was that the shortest distance between point A and point B wasn’t necessarily the right answer. For example, to a person living on the north shore of Long Island the nearest shop, as the crow flies, was in Connecticut, across the Long Island Sound. Unless they had a boat, this was definitely not their closest shop. Obviously we needed to introduce cost functions into our algorithms. A high cost across the sound resolved the issue.

Click through for more info and a few stories.

Comments closed

Clarifying Key Errors on Graph Tables in SQL Server

Louis Davidson clears up some noise:

This causes the following error message:

Msg 2627, Level 14, State 1, Line 14Violation of UNIQUE KEY constraint 'AKEdge'. Cannot insert duplicate key in object 'dbo.Edge'. The duplicate key value is (455672671, 0, 455672671, 1).

So what is this: (455672671, 0, 455672671, 1)?

Click through to understand what this all means. Louis also has a quick procedure which looks up those details.

Comments closed

Visualizing SQL Server Graph Tables via TGF

Louis Davidson shows how you can visualize data stored in SQL Server graph tables:

Each node object has its own surrogate key values that start at 0, so if you are going to use the code for more than one node at a time, you have to make the surrogate values unique for the TGF file (see the last blog on importing for more details on that). In the code I make a temp table to stage the objects, so if you have > 1 node, the second set of keys need to start off where the previous ones left off. So the code uses an identity column, and joins to that identity column by schema, table, and edgeId, outputting the unique key:

Read on to see how Louis translates the data into the right format for visualization.

Comments closed