Press "Enter" to skip to content

Day: November 29, 2018

An Overview Of Today’s Data And Event Ecosystem

John Hammink walks us through a laundry list of tools in the modern data platform:

InfluxDB

The rapid instrumentation of the physical world due to IoT and data-collecting applications has led to an explosion of time-stamped data. Time series databases serve this evolving niche, and among them, InfluxDB is emerging as a major player. InfluxDB, like others, can handle complex logic or business rules atop massive — and fast-growing — data sets, and InfluxDB adds the advantage of a range of ingestion methods, as well as the ability to append tags to different data points. Aiven also provides a managed version, Aiven InfluxDB.

Even with this list, there are still plenty more, including some of my favorites. H/T DZone

Comments closed

Apache Samza At 1.0

Jagadish Venkatraman announces Apache Samza 1.0:

We are pleased to announce today the release of Samza 1.0, a significant milestone in the history of the project. Apache Samza is a distributed stream processing framework that we developed at LinkedIn in 2013. Samza became a top-level Apache project in 2014. Fast-forward to 2018, and we currently have over 3,000 applications in production leveraging Samza at LinkedIn. The use-cases include detecting anomalies, combating fraud, monitoring performance, notifications, real-time analytics, and many more. Today, Samza integrates not only with Apache Kafka, but also with many other systems, including Azure EventHubsAmazon Kinesis, HDFS, ElasticSearch, and Brooklin. Multiple companies like Slack, TripAdvisor, eBay, and Optimizely have adopted Samza.

We view Samza 1.0 as a step towards our vision of making stream processing universally accessible. In this post, we describe our journey in building and scaling a distributed stream processing system. We also present the key features in Samza 1.0: a rich high-level API, event-time-based processing, integration with Apache Beam, Samza SQL, a standalone mode to run Samza without YARN, and a new test framework for Samza applications.

This runs in the same space as Spark Streaming, Flink, and Kafka Streams, so there are plenty of competitors and a lot of innovation.

Comments closed

Azure SQL Database Supports R Integration

David Smith notes that Azure SQL Database now has (in preview) support for R:

Azure SQL Database, the database-as-a-service based on Microsoft SQL Server, now offers R integration. (The service is currently in preview; details on how to sign up for the preview are provided in that link.) While you’ve been able to run R in SQL Server in the cloud since the release of SQL Server 2016 by running a virtual machine, Azure SQL Database is a fully-managed instance that doesn’t require you to set up and maintain the underlying infrastructure. You just choose the size and scale of the database you want to manage, and then connect to it like any other SQL Server instance. (If you want to learn how to set up an Azure SQL database, this Microsoft Learn module is a good place to start.)

Python and Java are not yet supported, but I’d imagine that they’ll be on the way too.

Comments closed

Kafka And Handling Missing Topics

The folks at Redglue show what happens when you send a message to a Kafka broker on a non-existent topic:

Now let’s produce messages to a non-existent topic called redglue_nonexistent:

root@kafka1:~# kafka-console-producer --broker-list 127.0.0.1:9092 --topic redglue_nonexistent
I maybe don't exists
[2018-11-28 14:22:12,454] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 1 : {redglue_nonexistent=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)

Obvious there a WARNING saying that the topic doesn’t exists, but it allows you to “send” messages to that specific topic

Read on to see what happens.

Comments closed

Connection Failed With Error 772

Jack Vamvas investigates an error when trying to connect to SQL Server 2016 on Windows Server 2016:

Question: I’ve upgraded an application with a built – in Database API . When attempting to establish a SQL Server database connection this error appears – Connection failed – SQL Server Error 772 – TCPIP Socket

Upon investigation the application was using  the native drivers attempting to connect to a SQL Server 2016 \ Windows 2016

As part of the testing I downloaded the ODBC 13.1 SQL Server drivers – independent of the application and tested a DSN connection to the same SQL Server – and it connected OK. I then created a DSN with native drivers and the error reappeared.

What is going on ? How can I fix this issue?

Read on for the solution and keep those drivers up to date.

Comments closed

Implementing A Change Tracking Solution In SQL Server

Jon Shaulis shows us how we can use Change Tracking to detect when rows get modified:

This allows you to detect changes in a lightweight manner via the Transaction Log in SQL Server in combination with T-SQL. Change Data Capture is more about auditing or creating a historical view and Temporal Tables are the next step up from there which became available in 2016 versions of SQL Server. Change Tracking is primarily used for finding only things that have changed. Not necessarily why, how, or who changed it, but what has changed and what it is now.

So why would you want this technology implemented? I find this technology is best suited for tasks where I want as light of a footprint as possible and I want to bring over incremental changes.

Click through for a long and complete walkthrough.  If you’re thinking to implement change tracking, this is a good link to check out.

Comments closed

Adding Constraints In The CREATE TABLE Statement

Steve Jones shows how you can add constraints in your CREATE TABLE statement:

A good habit to get into is to explicitly name your constraints. I try to do this when I create tables to be sure that a) I have a PK and b) it’s named the same for all environments.

I can create a PK inline, with a simple table like this:

CREATE TABLE Batting
   (
        BattingKey INT NOT NULL CONSTRAINT BattingPK PRIMARY KEY
        , PlayerID INT
        , BattingDate DATETIME
        , AB TINYINT
        , H TINYINT
        , HR tinyint
   )
;

This gives a primary key, named “BattingPK, that I can easily see inline with the column.

Steve also gives an alternative formulation which works well for composite keys.  You can additionally add constraints after the create statement, but if you are creating temp tables and want to take advantage of temp table reuse, constraints have to be created as part of the table (and cannot have names).  For additional fun, since SQL Server 2014, you can create indexes as part of the CREATE TABLE statement as well—that was needed to create memory-optimized tables as back in that edition, you couldn’t add new indexes after the fact.

Comments closed

Resumable Online Index Creation In SQL Server 2019

Monica Rathbun tries out resumable online index creation in SQL Server 2019:

SQL Server 2019 brings a very exciting new feature that, is long overdue. Resumable online index create is one of my favorite new things. This paired with the Resumable Index Rebuilds introduced with SQL Server 2017 really gives database administrators much more control over index processes.

Have you ever started to build a new index on very large table only to have users call and complain their process is hung, not completing, or system is slow? That’s when you realize you’re the cause because you tried to sneak in a new index. I have many times, because creating a new index can impact performance and can be a problematic process for users when you have no or little downtime windows available. When you kill the create process it rolls back requiring you to start from the beginning the next time. With resumable online index creation you now have the ability to pause and restart the build at the point it was paused.  You can see where this can be very handy.

Click through for a demo and discussion on what options are available.

Comments closed

A Case When CASE Isn’t The Right Case

Adrian Buckman notes differences in the two ways of using CASE statements:

It looks so clean compared to the first example! but it wasn’t until I tested the second method out that I realised that the behaviour of the two CASE expressions are different as outlined on books online

The CASE expression has two formats:
The simple CASE expression compares an expression to a set of simple expressions to determine the result.
The searched CASE expression evaluates a set of Boolean expressions to determine the result.
Both formats support an optional ELSE argument.

I put together some examples to illustrate the difference when evaluating Null using the two Case expressions, the query returns the column ‘Databasename’ from the derived list values clause, example 1 has a Null value and example 2 has a value of ‘SQLUndercover’ which you will see below:

Adrian looks into a scenario in which the two CASE expressions return different results, and digs into execution plans to find out why.

Comments closed

Quick Search Within Visual Studio Code

Chrissy LeMaire has an extension which allows for quick searches of highlighted text on a few sites:

We recently released a VS Code extension that lets you highlight terms and search dbatools.io, Microsoft Docs, Google, StackOverflow, DuckDuckGo or Technet or Thwack right from your code! It’s called dbatools simple search and you can find it in the Extension Marketplace.

I’ve also confirmed that it does work with Azure Data Studio; you just need to download the vsix extension from the Extension Marketplace and install and get context menu search support.

Comments closed