Press "Enter" to skip to content

Curated SQL Posts

Optimizing a Poisson Survival Model

Joshua Entrop shows off optimx() in R to perform a survival analysis:

In this blog post, we will fit a Poisson regression model by maximising its likelihood function using optimx() in R. As an example we will use the lung cancer data set included in the {survival} package. The data set includes information on 228 lung cancer patients from the North Central Cancer Treatment Group (NCCTG). Specifically, we will estimate the survival of lung cancer patients by sex and age using a simple Poisson regression model. You can download the code that I will use throughout post here

Read the whole thing. H/T R-bloggers

Comments closed

Apache Kafka in the Gaming Industry

Kai Wähner walks us through a few use cases for Apache Kafka in online gaming:

This blog post explores how event streaming with Apache Kafka provides a scalable, reliable, and efficient infrastructure to make gamers happy and Gaming companies successful. Various use cases and architectures in the gaming industry are discussed, including online and mobile games, betting, gambling, and video streaming.

Learn about:

– Real-time analytics and data correlation of game telemetry
– Monetization network for real-time advertising and in-app purchases
– Payment engine for betting
– Detection of financial fraud and cheating
– Chat function in games and cross-games
– Monitor the results of live operations like weekend events or limited time offers
– Real-time analytics on metadata and chat data for marketing campaigns

It’s an interesting overview of where this platform fits in the industry.

Comments closed

Why IS NOT NULL is a Range Predicate

Erik Darling drops knowledge on us:

Why is IS NULL (not to be confused with ISNULL, the function) considered in equality predicate, and IS NOT NULL considered an inequality (or range) predicate?

It seems like they should be fairly equivalent, though opposite. One tests for a lack of values, and one tests for the presence of values, with no further examination of what those values are.

The trickier thing is that we can seek to either condition, but what happens next WILL SHOCK YOU.

This is my shocked face.

Comments closed

Understanding Digital Twins in IoT Hub

Paul Hernandez explains the concept of digital twins in the IoT space:

Azure Digital Twins Service offers a way to build next generation IoT solutions. There are other approaches on the market to describe IoT devices and build digital twins. Without making a formal comparison I can say with the Azure Digital Twins is possible to build a powerful semantic layer on top of your connected devices using domain specific models.

To show you how this work let’s create a kind of “hello world” example. An end-to-end solution is out-of-scope of this post. Instead I will create some hands-on tutorial to demonstrate some of the functionalities.

Click through to see an example.

Comments closed

Discovering Provider-Specific Commands in Powershell

Jeffrey Hicks performs some Powershell research:

I’ve been diving into PowerShell help lately while preparing my next Pluralsight course. One of the sad things I have discovered is the loss of provider-aware help. As you may know, some commands have parameters that only exist when using a specific PSDrive.  For example, the -File parameter for Get-ChildItem only works in the file system. If you change to the CERT: PSDrive that parameter isn’t available. But you do get a new parameter like -CodeSigningCert. It used to be that you could use help to discover these differences. Get-Help has a -Path parameter to assist you.

Sadly, at least for me as a PowerShell teacher, this no longer works, especially if you have updated help recently in Windows PowerShell. Microsoft is working on some updates to how Help is delivered but for now we might need alternatives.

Read on for alternatives.

Comments closed

Analyzing Windows Documents with SQL Server Semantic Search

Haroon Ashraf begins a series on semantic search over files:

The article examines storing unstructured data using native windows directory structure and managing it through the SQL Server database. Subsequently, we demonstrate how to perform comparative analysis with the help of Semantic Search in order to get valuable information out of the data. Additionally, you are going to get some hands-on experience of storing and linking documents on Windows File System and using Semantic Search.

It is of great importance for both database beginners and professionals to know how to apply file storage strategies such as File Table to searching technologies like Semantic Search for unstructured data analysis.

The first part of the series is all about setting up FileTable.

Comments closed

Marking a Table in DAX as a Date Table

Alberto Ferrari explains why you have to mark a date table as such in Power BI:

Even though this probably does not surprise you, in fact it should. The measure – as it is written – should not work. The reason why it works is because the relationship between Sales and Date is using the Sales[Order Date] column. Sales[Order Date] has a Date data type. If we use an integer to relate the two tables instead of using a Date column, the formula stops working. We now change the relationship using the integer column Sales[OrderDateKey] instead of the Sales[Order Date].

Read the whole thing.

Comments closed

Python and Power BI Desktop

David Eldersveld continues a series on Python as an external tool in Power BI. Part 2 is about defining a proper JSON formatted file:

To see a new tool in the Power BI Desktop ribbon, you need to define a JSON file and place it in a specific folder on your workstation. The featured external tools Tabular Editor, DAX Studio, and ALM Toolkit have installers that take care of this step. Since you do not have a dedicated installer to place this file in the required directory, you need to manually define your own.

As long as the “enhanced metadata format” for the data model is enabled, and the JSON in your file is accurate, you should see your new tool in your ribbon after you re-open Power BI Desktop.

Part 3 shows off virtual environments in Python as well as how to connect to the Tabular Object Model:

The Tabular Object Model (TOM) library for .NET opens Power BI’s data model to external tools. Two pieces are required to allow Python to interface with .NET:
1) Pythonnet package for .NET CLR (pip install pythonnet)
2) Python-SSAS module (ssas_api.py placed in the same folder as the main script you’d like to run)

The python-ssas (ssas_api.py) Python module that facilitates the TOM connection is all the work of Josh Dimarsky–originally for querying and processing Analysis Services. I simply repurposed it for use with Power BI Desktop or the XMLA endpoint in Power BI Premium and extended it with some relevant examples. Everything relies on Josh’s Python module, which has functions to connect to TOM, run DAX queries, etc.

This does look pretty handy.

Comments closed

Assuming a Role with AWS Powershell Tools

Sheldon Hull solves a problem:

I’ve had some issues in the past working with AWS.Tools PowerShell SDK and correctly assuming credentials.

By default, most of the time it was easier to use a dedicated IAM credential setup for the purpose.

However, as I’ve wanted to run some scripts across multiple accounts, the need to simplify by assuming a role has been more important.

It’s also a better practice than having to manage multiple key rotations in all accounts.

Read on to see how far Sheldon has been able to take this, but also how much more work is left to do.

Comments closed

Data Privacy in Confluent Platform

David Millman shows off the Privitar Kafka Connector:

The initial message structure, in the left column above, is a simple JSON document with five fields. The middle column contains the list of rules that must be applied, defining the policy. On the right is a sample output message generated as a result of the policy being applied to the initial message.

In the Privitar Policy Manager, a user maps the individual fields to the appropriate rule, as shown in the screenshot below. A rule is applied to each of the fields and the schema is read as a single table structure, named testfile. These rules can be applied for every instance of the schema.

Read on for more.

Comments closed