Press "Enter" to skip to content

Author: Kevin Feasel

Power BI Aggregations

Reza Rad dives into aggregations in Power BI.  Part one introduces the topic:

Imagine a Fact table with 250 millions of rows. Such a fact table is big enough to be considered as a good candidate for DirectQuery connection. You don’t want to load such a big table into the memory, and most probably, the Power BI file size exceeds the 1GB limitation. Now, think about your reporting solution for a second. Do you always query this fact table at the finest or minimum granular level? I mean do you always look at every single transaction in this table when you do report on it?

The answer is No. In most of the times, you are querying the data by other fields or columns. As an example; you query the Sales value in the fact table, by Year. Some other times, you query the fact table’s values by Customer’s education category. Some other times, you query the values in the fact table, by each product. When you look at real-world scenarios, most of the time, you are querying the fact table by aggregations of dimension tables.

Then, Reza starts building an aggregation table:

Aggregation tables are the fast performing solution for huge DirectQuery tables in Power BI. In the previous blog post, I explained what is an aggregation, and why it is an important part of a Power BI implementation. Aggregations are part of the Composite model in the Power BI. For the aggregation set up, your first step is to create an aggregated table. In this blog post, I’ll explain how that step can be done. If you want to learn more about Power BI, read the Power BI book, from Rookie to Rock Star.

There’s a lot of detail packed into the first posts in this series, so it looks like a good one to watch.

Comments closed

Preventing Issues With Columnstore Indexes

Kevin Chant has a post covering issues in his experience with the creation of columnstore indexes:

Error due to online statement

It’s a very common error caused usually by somebody copying a rowstore “Create Index” command. The “online=on” option you use with rowstore indexes does not work with creating columnstore indexes yet. For the record the online option will work in SQL Server 2019.

This one I’ve run into, as I like clustered columnstore indexes a lot but occasionally need single-row results from them.  If the table is empty, creating an index offline is no problem.  But once you get a billion or so rows in it, that’s a non-starter in a 24×7 uptime shop.  Read the whole thing.

Comments closed

Azure Price Increases In CAD

Randolph West notes that Azure services priced in Canadian dollars will increase by five percent:

Starting December 1, 2018, prices for Azure services in the Canadian dollar will increase by 5 percent to more closely align to Azure pricing in US dollars. Even after this adjustment, customers buying in the Canadian dollar will continue to find Azure offerings highly competitive.

Microsoft periodically assesses its pricing of products and services across the globe to ensure reasonable alignment across regions. This change to Azure prices is an outcome of this assessment.

The first thing that came to mind was a particular joke from the Simpsons.  For those who don’t remember, the Simpsons was a hilarious cartoon for about ten years before it was quietly killed and replaced with something almost but not quite the same, lacking most of the humor.

Comments closed

Do You Have Trace Flag 4199 Enabled?

Andy Galbraith recommends that you enable trace flag 4199 in SQL Server:

The session was titled “Modernize Your SQL Server with Bob Ward, the Tiger Team, and CSS Escalation Engineers” and it…was…awesome!
One of the presenters was Pedro Lopes (blog/@SQLPedro), a Senior PM for the Relational Engine.  In his part of the day he talked about several features of the engine and the optimizer, but the “What…what did he say?” moment for me was when he talked about trace flag 4199…and how we should have it turned on pretty much everywhere.
Wait…what?

If you aren’t aware of trace flag 4199 and are running an edition of SQL Server prior to 2016, this is big.  One of our user group members called it out specifically at our last meeting.  As far as 2016+ instances go, Andy covers how that behavior is a little different, so check it out.

Comments closed

An Overview Of Today’s Data And Event Ecosystem

John Hammink walks us through a laundry list of tools in the modern data platform:

InfluxDB

The rapid instrumentation of the physical world due to IoT and data-collecting applications has led to an explosion of time-stamped data. Time series databases serve this evolving niche, and among them, InfluxDB is emerging as a major player. InfluxDB, like others, can handle complex logic or business rules atop massive — and fast-growing — data sets, and InfluxDB adds the advantage of a range of ingestion methods, as well as the ability to append tags to different data points. Aiven also provides a managed version, Aiven InfluxDB.

Even with this list, there are still plenty more, including some of my favorites. H/T DZone

Comments closed

Apache Samza At 1.0

Jagadish Venkatraman announces Apache Samza 1.0:

We are pleased to announce today the release of Samza 1.0, a significant milestone in the history of the project. Apache Samza is a distributed stream processing framework that we developed at LinkedIn in 2013. Samza became a top-level Apache project in 2014. Fast-forward to 2018, and we currently have over 3,000 applications in production leveraging Samza at LinkedIn. The use-cases include detecting anomalies, combating fraud, monitoring performance, notifications, real-time analytics, and many more. Today, Samza integrates not only with Apache Kafka, but also with many other systems, including Azure EventHubsAmazon Kinesis, HDFS, ElasticSearch, and Brooklin. Multiple companies like Slack, TripAdvisor, eBay, and Optimizely have adopted Samza.

We view Samza 1.0 as a step towards our vision of making stream processing universally accessible. In this post, we describe our journey in building and scaling a distributed stream processing system. We also present the key features in Samza 1.0: a rich high-level API, event-time-based processing, integration with Apache Beam, Samza SQL, a standalone mode to run Samza without YARN, and a new test framework for Samza applications.

This runs in the same space as Spark Streaming, Flink, and Kafka Streams, so there are plenty of competitors and a lot of innovation.

Comments closed

Azure SQL Database Supports R Integration

David Smith notes that Azure SQL Database now has (in preview) support for R:

Azure SQL Database, the database-as-a-service based on Microsoft SQL Server, now offers R integration. (The service is currently in preview; details on how to sign up for the preview are provided in that link.) While you’ve been able to run R in SQL Server in the cloud since the release of SQL Server 2016 by running a virtual machine, Azure SQL Database is a fully-managed instance that doesn’t require you to set up and maintain the underlying infrastructure. You just choose the size and scale of the database you want to manage, and then connect to it like any other SQL Server instance. (If you want to learn how to set up an Azure SQL database, this Microsoft Learn module is a good place to start.)

Python and Java are not yet supported, but I’d imagine that they’ll be on the way too.

Comments closed

Kafka And Handling Missing Topics

The folks at Redglue show what happens when you send a message to a Kafka broker on a non-existent topic:

Now let’s produce messages to a non-existent topic called redglue_nonexistent:

root@kafka1:~# kafka-console-producer --broker-list 127.0.0.1:9092 --topic redglue_nonexistent
I maybe don't exists
[2018-11-28 14:22:12,454] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 1 : {redglue_nonexistent=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)

Obvious there a WARNING saying that the topic doesn’t exists, but it allows you to “send” messages to that specific topic

Read on to see what happens.

Comments closed

Connection Failed With Error 772

Jack Vamvas investigates an error when trying to connect to SQL Server 2016 on Windows Server 2016:

Question: I’ve upgraded an application with a built – in Database API . When attempting to establish a SQL Server database connection this error appears – Connection failed – SQL Server Error 772 – TCPIP Socket

Upon investigation the application was using  the native drivers attempting to connect to a SQL Server 2016 \ Windows 2016

As part of the testing I downloaded the ODBC 13.1 SQL Server drivers – independent of the application and tested a DSN connection to the same SQL Server – and it connected OK. I then created a DSN with native drivers and the error reappeared.

What is going on ? How can I fix this issue?

Read on for the solution and keep those drivers up to date.

Comments closed

Implementing A Change Tracking Solution In SQL Server

Jon Shaulis shows us how we can use Change Tracking to detect when rows get modified:

This allows you to detect changes in a lightweight manner via the Transaction Log in SQL Server in combination with T-SQL. Change Data Capture is more about auditing or creating a historical view and Temporal Tables are the next step up from there which became available in 2016 versions of SQL Server. Change Tracking is primarily used for finding only things that have changed. Not necessarily why, how, or who changed it, but what has changed and what it is now.

So why would you want this technology implemented? I find this technology is best suited for tasks where I want as light of a footprint as possible and I want to bring over incremental changes.

Click through for a long and complete walkthrough.  If you’re thinking to implement change tracking, this is a good link to check out.

Comments closed