Press "Enter" to skip to content

Month: July 2021

Two Ways to Access Kafka Topics from R

Patrick Neff shows us a couple of ways to build a Kafka-to-R pipeline:

In Data Science projects, we distinguish between descriptive analytics and statistical models running in production. Overall, these can be seen as one process. You start with analyzing historical data to gain insights, find correlations, and finally develop and optimize your model. Then you transfer it and use it in your running system. A key point for every data scientist is not just the mathematical skills themselves, but also how to get the data into your analytics program.

In this blog post, we focus exactly on this crucial step: retrieving the data. In a second article, we’ll talk about running your model on real-time data.

Click through for the techniques.

Comments closed

String Formatting in Powershell

Robert Cain continues a series on fun with Powershell and strings:

Specifically, this can control the output when we embed a numeric value inside a string. Passing in special formatting instructions will make it easy to display values with commas, as currency, or even as hexidecimal.

For all of the examples, we’ll display the code, then under it the result of our code. In this article I’ll be using PowerShell Core, 7.1.3, and VSCode. The examples should work in PowerShell 5.1 in the PowerShell IDE, although they’ve not been tested there.

Robert has quite a few examples, so check them out.

Comments closed

The Benefits of Kubernetes for App Hosting

Joy George Kunjikkur enumerates reasons why you might want to use Kubernetes to host applications:

I started writing this post 2-3 years back. Mainly when Apache Spark 2.3 started supporting Kubernetes (K8s) in 2018. It was obvious that Kubernetes is taking over app hosting space the same way virtual machines took over physical machines. All are expected to understand where the industry is moving and adopt. Hence I paused this post as there is nothing I need to endorse. But it’s time to resume this post and publish it.

Click through for a slew of thoughts on the topic.

Comments closed

A Primer on Azure Kubernetes Service

Arun Sirpal gives us a brief introduction of Azure Kubernetes Service:

You have the ability to run these on-premises (complex) or in a cloud service, like AWS or Azure. Hence AKS – Azure Kubernetes Service which helps reduce the complexity and operational overhead of managing Kubernetes by offloading much of that responsibility to Microsoft. You may be wondering how does containers relate to this? It was something on my mind when I first entered into this technology. Remember that containers is the next step beyond traditional virtualisation, you can run SQL Server Linux in containers, as an example. I then look at AKS as the “management” layer of the container solution, carrying out tasks such as scheduling, scaling, health, load balancing and host management.

Click through for more information.

Comments closed

How Dynamic Data Masking Interacts with Bulk Copy (BCP)

Kenneth Fisher puts on a lab coat:

Hypothesis: If I have Dynamic Data Masking enabled on a column then when I use something like BCP to pull the data out it should still be masked.

I’m almost completely certain this will be the case but I had someone tell me they thought it would go differently, and since neither of us had actually tried this out it seemed like time for a simple experiment.

Click through for the experiment and its results.

Comments closed

Contrasting Scala and Python wrt Spark

Sanjay Rathore contrasts two of the three key Apache Spark languages:

Imagine the first day of a new Apache Spark project. The project manager looks at the team and says: which one to choose, scala or python. So let’s start with “scala vs python for spark”. 

You may wonder if this is a tricky question. What does the enterprise demand say? Is this like asking iOS or Android? Is there a right or wrong answer?

So we are here to inform and provide clarity. Today we’re looking at two popular programming languages, Scala and Python, and comparing them in the context of Apache Spark and Big Data in general.

Read on for the comparison. I’m at a point where I think it’s wise to know both languages and roll with whichever is there. If you’re in a greenfield Spark implementation, pick the one you (or your team) is more comfortable with. If you’re equally comfortable with the two, pick Scala because it’s a functional programming language and those are neat.

Comments closed

Conditional Visibility in Power BI Paginated Reports

Sabrina Jordan has a clever solution to a common customer request:

How many times have you struggled to find the happy medium between a report that looks phenomenal when printed, but has everything a user might need in an Excel export? I recently built a beautiful paginated report with groupings separated by white space for easy readability – but the user wanted to export the results to Excel, and the format prevented them from sorting or filtering the report contents. Power BI Report Builder has a couple features that can allow you the best of both worlds, with a few simple tricks. By the end of this tutorial, you will have created two Tablix, set up conditional visibility based on report render format, and set conditional sizing on the Excel Tablix (using hidden charts!) to prevent blank pages.

Click through for the solution.

Comments closed

Customized Per-User Default Values in Power BI

Marco Russo and Alberto Ferrari lay out a challenge:

Tabular offers the built-in feature of hiding rows of data from specific users. For example, you can create a set of security rules to let a store manager see only the sales of their store. This works fine if your goal is to secure data, which means preventing access to data that a user is not expected to see.

Another common requirement is to be able to select by default, for a store manager, their sales. With that said, store managers can see the data of other stores, but they need to explicitly request it. In other words: by default, the store manager sees the sales of their store only. By using a slicer, they can choose a different combination of stores.

Read on for the solution, but be sure to read the warnings Marco and Alberto include near the end of the article.

Comments closed

Power BI Anti-Patterns

Paul Turley begins a new series:

It occurred to me that we have put so much effort into promoting best practices and proper design that there is far less information about how to create bad reports and data models. In that light, the purpose of this article is to talk about what to do if you want things to go poorly and make sure your projects fail – if not immediately, then sometime in the future – and if not for you then for whoever inherits the work that you have done.

I welcome your comments and thoughts about the worst practices you have seen in Power BI. Post your ideas in the comments below. What examples have you encountered of sure-fire ways to make projects go badly?

Paul has six starter anti-patterns; click through for the set.

Comments closed

Log Shipping Configuration with Powershell

Lee Markum needs to do log shipping in bulk:

I recently needed to configure log shipping for multiple databases at once as part of a migration project. I turned to PowerShell to do this.

But before we get to that part, this post assumes that you’ve done the upfront work to create shares for the backups to write to and for the backups to be copied to. This will involve providing the right permissions for the SQL Server service accounts involved in Log Shipping. If you are not familiar with this, that’s perfectly fine. Check out this article in MS Docs first.

Click through for the next steps in the process, including a well-timed Get-Help call.

Comments closed