Kevin Feasel – Page 272

Combining Flink SQL, Streamlit, and Kafka

Published 2024-06-13 by Kevin Feasel

Lucia Cerchie has a pair of posts. First up, Lucia sets the stage:

n part 1 of this series, we’ll make an app, hosted on Streamlit, that allows a user to select a stock, in this case SPY, or the SPDR S&P 500 ETF Trust. Upon selection, a live chart of the stock’s bid prices, calculated every five seconds, will appear.

What are the pieces that go into making this work? The source of the data is the Alpaca Market Data API. We’ll hook up a Kafka producer to the websocket stream and send data to a Kafka topic in Confluent Cloud. Then we’ll use Flink SQL within Confluent Cloud’s Flink SQL workspace to tumble an average bid price every five seconds. Finally, we’ll use a Kafka consumer to receive that data and populate it to a Streamlit component in real time. This frontend component will be deployed on Streamlit as well.

Part 2 then closes the trap:

In part one of this series, we walked through how to use Streamlit, Apache Kafka®, and Apache Flink® to create a live data-driven user interface for a market data application to select a stock (e.g., SPY) and discussed the structure of the app at a high level. First, data with information on stock bid prices is moved via an Alpaca websocket, then, it’s produced to a Kafka topic in Confluent Cloud where it is also processed with Flink SQL.

Now comes the tricky part: running the Kafka consumer and producer in the same application.

Click through for a good demonstration of a practical solution. Lucia also has a GitHub repo with all of the code, a demo of the site in action, and some links to additional resources.

Comments closed

An Auditing Oddity with SQL Audit

Published 2024-06-13 by Kevin Feasel

Rod Edwards runs into legal troubles:

This is a finger pointing situation that i’ve witnessed in the past regarding native SQL Auditing, and the potential for edge case false positives. Something really not helpful when it comes to any security related topic.

This post is just to highlight a potential gotcha with the native SQL Auditing functionality, dependent on it’s configuration. It’s certainly not a best practice on setting up Auditing, or access controls, or even the intent someone may have in falling foul of any Audit. There are many awesome guides out there on how to do exactly that.

Despite this post not being any of those things, it is still quite useful in pointing out an edge case in auditing, one to which I don’t have a good answer.

Comments closed

Role Checks: Access Admin, Security Admin, DDL Admin

Published 2024-06-13 by Kevin Feasel

David Seis looks at three roles:

Understanding SQL Server roles is crucial for managing permissions and ensuring SQL Server security. In this post, we will delve into three specific roles: db_accessadmin, db_securityadmin, and db_ddladmin, discussing when each should be used and considerations for least privilege and security. We’ll also include a script you can use to audit your database roles.

Read on to see what each of those three do. I’m not sure I’ve ever worked in an environment that required use of any of these three roles. Typically, the person or set of people responsible for doing the activities associated with one of those three roles needed to do all three (and more).

Comments closed

Fun with Query Timeouts

Published 2024-06-13 by Kevin Feasel

Forrest McDaniel gets my most coveted category:

I love how there are lots of little things to SQL Server – mostly ignored details that only sometimes matter but make sense when you stare at them. Actually, I’m not sure I love them. I forget about them, and then stub my toe on them. Occasionally though, there’s a quirky combination that leads to bloggable shenanigans.

Let’s start with Detail Number One, which has most definitely tripped me up: queries that are returning rows won’t time out. You might be familiar with connections using a default 30s timeout, but as long the query is returning a row every 29s, the client won’t time it out. You can even test this in SSMS.

Read on to see how Forrest takes advantage of this, uh, capability.

Comments closed

Show Top N and Bottom N Records in One Power BI Visual

Published 2024-06-13 by Kevin Feasel

Kenneth Omorodion burns the candle from both ends:

Recently, I wrote an article, Rank and Sort Data Based on Multiple Columns in Power BI Using DAX. However, it is very common for business users to request the ability to dynamically view the Top N and Bottom N values of a measure, like Total Sales, on the same visual. This requirement is simple to implement on either the Top or Bottom N options. But, the challenge is when we need to represent the two options on the same chart simultaneously.

Read on for an example of how to do this.

Comments closed

Build a Custom Semantic Model for Microsoft Fabric

Published 2024-06-13 by Kevin Feasel

Reza Rad offers up some advice:

The Lakehouse or Warehouse comes with a default Power BI Sematic model, which can be used for reporting and analytics. However, you can also build and use a customized semantic model. There are significant differences when using the semantic model in real-world analytics projects. In this article, I’ll explain the difference between these two, which one is recommended, and why.

Click through for the video, as well as the article.

Comments closed

Three Partitioning Options in Postgres

Published 2024-06-13 by Kevin Feasel

Semab Tariq shows how to perform three types of partitioning in PostgreSQL:

PostgreSQL is renowned for its exceptional performance in managing data. One of its standout features is partitioning, a technique that divides large datasets into smaller, more manageable segments. Partitioning provides several benefits, including improved query performance, streamlined data management, and enhanced scalability. By organizing data into partitions, PostgreSQL can execute searches more efficiently and handle tasks with greater ease.

In this blog, we will delve into the details of partitioning in PostgreSQL, exploring its various types, advantages, and drawbacks. We’ll uncover how partitioning can revolutionize data management and decision-making processes in database environments.

Click through for demonstrations of range, list, and hash partitioning.

Comments closed

Orphaned Users in SQL Server

Published 2024-06-12 by Kevin Feasel

David Seis puts the orphans to work:

In SQL Server, a user becomes ‘orphaned’ when it exists within a database but lacks an associated login at the server level. This typically occurs when a database is either moved or restored to a different SQL Server instance. To understand why, it’s important to note that while logins are created at the server level, users are created at the database level. Each login is linked to a unique Security Identifier (SID). Therefore, during the process of moving or restoring a database, the SIDs may not align correctly, resulting in orphaned users.

Read on for a script to find and fix orphaned users.

Comments closed

New Video: The Naive Bayes Set of Algorithms

Published 2024-06-12 by Kevin Feasel

I have a new video:

In this video, I cover a class of algorithm that is neither particularly naive nor particularly Bayesian: Naive Bayes.

I am a bit tongue in cheek with that description, as technically I’ll give you that the class of algorithms is “naive.” But I do still have some fun with the name and then show how we can use Naive Bayes to build a quick-and-dirty model that’s at least somewhat effective.

Comments closed

Microsoft Fabric: Lakehouse or Warehouse?

Published 2024-06-12 by Kevin Feasel

Koen Verbeeck helps us choose:

This doesn’t mean no code has to be written. On the contrary, in this article we’re going to focus on two services of Fabric: the lakehouse and the warehouse. The first one is part of the Data Engineering experience in Fabric, while the latter is part of the Data Warehousing experience. Both require code to be written to create any sort of artefact. In the warehouse we can use T-SQL to create tables, load data into them and do any kind of transformation. In the lakehouse, we use notebooks to work with data, typically in languages such as PySpark or Spark SQL.

Read on for the comparison. I tend to go more for the lakehouse experience rather than warehouse, but Koen provides a lot of the info you’d need in order to make the right decision for yourself.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Author: Kevin Feasel