Kafka Connect Neo4j Sink

Michael Hunger and Andrea Santurbano announce a new sink for Kafka Connect which handles writing to Neo4j:

We’ve been using the work we did for the Kafka sink – Neo4j extension and have made it available via remote connections over our binary bolt protocol. So you can stream your events from Apache Kafka®directly into Neo4j to create, update and enrich your graph structures. Then it is really up to you what you want to with the event data.

The events can come from frontend systems, API notifications, other databases or streaming systems like Apache Spark™ and Apache Flink®.

For details on the format and internals, please see our previous article or the documentation for the Neo4j sink.

Read on for details and demos.

Traversing Nearest Neighbors With Dijkstra’s Algorithm And SQL Server Graph

James Livingston gives us a Powershell-based implementation of Dijkstra’s shortest path algorithm:

In my previous post I shared a SQL Server 2017 graph database of US capitals. Graphs are a computer science core competency and present some interesting challenges for programmers. Most notable of these challenges is finding the shortest path between nodes. Dijkstra’s algorithm is a commonly taught algorithm for finding shortest path. Dijkstra’s is often asked about during entry level developer interviews and it is a great algorithm to implement when learning a new language since it requires utilizing loops, logic, and data structures.
Here’s my implementation of Dijkstra’s algorithm using PowerShell, traversing a graph of US capitals. Rather than manage our own graph nodes and edges, we’ll utilize graph tables and queries in SQL Server. There’s a lot of different ways to implement this in PowerShell, my first cut of this ended up using a hash table so I could perform random access. There’s a give-and-take with custom PowerShell objects, which sacrifice random access for some other benefits.

Click through for the code.

A Graph Database Of US Capitals

Kevin Feasel

2018-12-11

Graph

James Livingston has a graph database to share:

While there’s countless relational databases out there for practice, there’s not much in the way of graph databases. It is my intent to share my graph databases with the world in hopes that it removes the friction associated with your learning.
US Capitals is a popular data set for working with graphs. Nodes identify a state capital. An edge connects a capital in one state with the capital of a neighboring state. Only the lower 48 states are present. While the data is readily available, I was unable to find TSQL scripts to create the graph using SQL Server 2017 graph database. I created those scripts and have made them readily available on GitHub.

I’m interested in the forthcoming post on Dijkstra’s algorithm; I think the last time I saw that was my undergrad days.

What’s New In SQL Server 2019 CTP 2.1 Graph Support

Kevin Feasel

2018-12-03

Graph

Niko Neugebauer looks at a few additions to SQL Server graph support:

Now, in the next step we shall create a derived view, which shall contain the list with all Persons and Businesses, joining them together:

CREATE OR ALTER VIEW dbo.Followers AS	SELECT PersonId as Id, FullName	FROM dbo.Person	UNION ALL	SELECT BusinessId, BusinessName	FROM dbo.Business;

Now, the real new thing is that we can use such derived tables in SQL Server 2019 CTP 2.1 and Azure SQL Database together with the MATCH clause, in the statements such as the one below where we list all the followers of the “Real Stuff” company:

SELECT Followers.ID, Followers.FullName	FROM Followers, Follows, Company	WHERE MATCH(Followers-(Follows)->Company)	AND CompanyName = 'Real Stuff'

This query works fine, delivering us the expected results while generating a pretty complex execution plan in the background.

Niko focuses on heterogeneous nodes and edges, as well as derived views.

Views And Derived Tables In SQL Server 2019 Graph

Kevin Feasel

2018-11-12

Graph

Shreya Verma shows examples of using views and derived tables in SQL Server 2019’s graph database functionality:

We will be further expanding the graph database capabilities with several new features. In this blog we will discuss one of those features that is now available for public preview in Azure SQL Database and SQL Server 2019 CTP2.1: use of derived tables and views on graph tables in MATCH queries.

Graph queries on Azure SQL Database now support using view and derived table aliases in the MATCH syntax. To use these aliases in MATCH, the views and derived tables must be created either on a node or edge table which may or may not have some filters on it or a set of node or edge tables combined together using the UNION ALL operator. The ability to use derived table and view aliases in MATCH queries, could be very useful in scenarios where you are looking to query heterogeneous entities or heterogeneous connections between two or more entities in your graph.

It’s good to see the product team expand on what they released in 2017, getting the graph product closer to production-quality.

Graph Additions In SQL Server 2019

Kevin Feasel

2018-10-01

Graph

Shreya Verma announces one of the new additions to graph database support in SQL Server 2019:

SQL Server 2017 and Azure SQL Database introduced native graph database capabilities used to model many-to-many relationships. The first implementation of SQL Graph introduced support for nodes to represent entities, edges to represent relationships and a new MATCH predicate to support graph pattern matching and traversal.

We will be further expanding the graph database capabilities with several new features. In this blog we will discuss one of these features that is now available for public preview in SQL Server 2019Edge Constraints on Graph Edge Tables.

In the first release of SQL Graph, an edge could connect any node to any other node in the database. With Edge Constraints users can enforce specific semantics on the edge tables. The constraints also help in maintaining data integrity. This post describes how you can create and use edge constraints in a graph database. We will use the following  graph schema created in the WideWorldImporters database for the samples discussed here.

I know that SQL Server 2017 was a bit underwhelming for graph database work, so I will be interested in seeing how much of the gap they cover in this release.

The Semantics Of GraphQL

Kevin Feasel

2018-05-23

Graph

Adrian Colyer reviews a paper on the mathematical properties behind GraphQL:

The authors study the computational complexity of GraphQL looking at three central questions:

  1. The evaluation problem: what can we say about the complexity of GraphQL query evaluation?
  2. The enumeration problem: how efficiently can we enumerate the results of a query in practice?
  3. The response size problem: how large can responses get, and what can we do to avoid obtaining overly large response objects?

In order to do this, they need to find some solid ground to use for reasoning. So the starting point is a formalisation of the semantics of GraphQL.

This is a review of a published academic paper rather than a how-to guide, so it’s math-heavy.  I am enjoying seeing the development of normal forms for graph processing languages—it’s the beginning of a new generation of normalization purists.

Graph Algorithms Supported In Neo4j

Kevin Feasel

2018-04-30

Graph

Amy Hodler gives us a quick summary of fifteen separate algorithms for traversing a graph in Neo4j:

6. PageRank

What it does: Estimates a current node’s importance from its linked neighbors and then again from their neighbors. A node’s rank is derived from the number and quality of its transitive links to estimate influence. Although popularized by Google, it’s widely recognized as a way of detecting influential nodes in any network.

How it’s used: PageRank is used in quite a few ways to estimate importance and influence. It’s used to suggest Twitter accounts to follow and for general sentiment analysis.

PageRank is also used in machine learning to identify the most influential features for extraction. In biology, it’s been used to identify which species extinctions within a food web would lead to biggest chain reaction of species death.

If you are interested in getting into graph databases, it’s useful to know these algorithms.

Closure Tables: Graph Data In Relational Form

Phil Factor shows how to use the concept of closure tables to represent graph-style data in a relational database:

Closure tables are plain ordinary relational tables that are designed to work easily with relational operations. It is true that useful extensions are provided for SQL Server to deal with hierarchies. The HIERARCHYID data type and the common language runtime (CLR) SqlHierarchyId class are provided to support the Path Enumeration method of representing hierarchies and are intended to make tree structures represented by self-referencing tables more efficient, but they are likely to be appropriate for some but not all the practical real-life hierarchies or directories. As well as path enumerations, there are also the well-known design patterns of Nested Sets and Adjacency Lists. In this article, we’ll concentrate on closure tables.

A directed acyclic graph (DAG) is a more general version of a closure table. You can use a closure table for a tree structure where there is only one trunk, because a branch or leaf can only have one trunk. We just have a table that has the nodes (e.g. staff member or directory ‘folder’) and edges (the relationships). We are representing an acyclic (no loops allowed) connected graph where the edges must all be unique, and where there is reflexive closure. (each node has an edge pointing to itself)

Take the time to read this one carefully, as I think this model is applicable much more often than it’d appear at first blush.

Using Cosmos DB For Graph Data

Jose Mendes has an introduction to the Cosmos DB graph engine:

Gremlin is the graph traversal language of Apache TinkerPop, an open source Graph Computing Framework. Gremlin allows the users to write complex queries to traverse their graphs by using a composed sequence of steps, with each step performing an operation on the data stream (further details here). There are 4 fundamental steps:

· transform: transform the objects in the stream

· filter: remove objects from the stream

· sideEffect: pass the object, but yield some side effect

· branch: decide which step to take

Click through for a quick example showing how to create and populate a graph.

Categories

April 2019
MTWTFSS
« Mar  
1234567
891011121314
15161718192021
22232425262728
2930