Press "Enter" to skip to content

Month: September 2022

Running Postman Tests in GitLab

Rahul Kumar automates Postman tests:

Hi folks, In this brief blog post, we’ll learn more about Gitlab CI and Postman, the API testing tool we use the most frequently. This article’s goal is to provide a quick process for automatically testing the service API response. The solution makes use of the capabilities provided by the Gitlab-integrated Continuous Integration tool.

Click through for the tutorial.

Comments closed

Creating a Parameterized Table in Tableau

Jordan Witcombe does some slicing and dicing:

Let’s say you’re planning to present a large table of information and you want the user to be able to filter based on the top or bottom X number of records. Due to the size of the dataset we would like the customer to have the ability to filter based on more than one column for their comparison. Making a large dataset much easier to digest.

Read on to see how you can accept user input in Tableau and simplify the viewable data.

Comments closed

PolyBase and Windows Authentication in SQL Server 2022

I have some bad news:

Back in the SQL Server 2019 days, I laid out a post which covered a really annoying bug around Windows authentication and PolyBase. The upshot is that SQL Server 2019 CU2 broke Windows authentication, meaning that you could not create external objects on a SQL Server instance while using a Windows authenticated session. In CU2, if you tried, you would get a Level 20 error which would kill your session.

Over the next several CUs, the SQL Server team improved things incrementally: first, the error that you got was a Level 16 error, so your statement would fail but the session would remain active. Then, in CU6, they made it so that if your SQL Server instance is on a server running on a domain, then you would be okay—you could create external objects while connected with a domain-authenticated Windows account.

I decided to give this a try in SQL Server 2022 RC0 and ran into some interesting results. 

Click through for those results.

Comments closed

Deploying SQL Server via AKS

Rajendra Gupta needs to deploy a SQL Server container:

This article uses Azure Kubernetes Service (AKS) to deploy and manage the Kubernetes cluster. It is a fully managed service that offers serverless Kubernetes with integrated CI/CD solutions, enterprise-grade security, and governance.

You can navigate to https://azure.microsoft.com/en-in/services/kubernetes-service/#overview and try Azure Kubernetes Service (AKS).

Read on for an overview of Azure Kubernetes Service and how you can get a SQL Server on Linux container running atop it.

Comments closed

Plotting an ASCII Bar Chart using SQL

Lukas Eder is speaking my language:

No need for expensive Tableau subscriptions. Ditch Microsoft Excel. Just use native PostgreSQL to quickly visualise your data!

Here’s an idea I had for a while. As you may know, jOOQ can produce fancy charts from your jOOQ results. But that requires you use jOOQ, and you may not be using jOOQ, because you’re not coding in Java/Kotlin/Scala (otherwise, you’d be using jOOQ). That’s OK. I thought, why not do it with SQL (PostgreSQL, to be specific) directly, then? 

This one’s getting my most coveted category: Wacky Ideas.

Comments closed

SSRS 2022 Updates

Cookie McCray shares some updates:

Back in 2020, we announced the deprecation of Report Server features Pin to Power BI, Mobile Reports, and Mobile Report Publisher. These features will be removed from versions of SQL Server starting with SQL Server 2022 and will no longer be supported. Only the last three releases, SQL Server 2016, SQL Server 2017, and SQL Server 2019, will be supported in maintenance mode until EOL (End of Life) for existing customers.

There are some interesting things in here but nothing that makes me say Reporting Services has a real future.

Comments closed

Working with List Data in R using rrapply

Joris Chau pivots, unpivots, and unnests data in R:

The previous post showcases the rrapply() function in the minimal rrapply-package as a revised and extended version of base rapply() in the context of nested list recursion in R. For quick data exploration of a nested list it can make sense to keep the list in its original nested format to reduce the number of processing steps and minimize code complexity. As part of a more elaborate data analysis, if there is no specific reason to keep the nested data structure, it is often more practical to transform the nested list into a more convenient rectangular format and work with the unnested object (e.g. a data.frame) instead. In this follow-up post, we review the available (how) options in rrapply() to unnest or melt nested lists into a rectangular format in more detail and highlight the similarities and differences with respect to several common alternatives in R.

There’s a lot going on in this demo-packed post. H/T R-Bloggers.

Comments closed

Creating Multiple Output Files per Spark Task

Dmitry Tolpeko has a quick but helpful post:

It is highly recommended that you try to evenly distribute the work among multiple tasks so every task produces a single output file and job is completed in parallel.

But sometimes it still may be useful when a task generates multiple output files with the limited number of records in each file […]

I had to cut it off right there to keep from spilling the beans here. Click through for Dmitry’s post to see what setting controls records per file, allowing you to keep opening those Spark output files in Excel.

Comments closed

The Importance of Data Shaping

Paul Turley shapes the youth of data:

Power BI is a new tool and dimensional modeling is an old idea. One of the challenges is that, like other modern self-service analytics products on the market, Power BI doesn’t force self-service data jockeys to transform their data before reporting with it. If you want to import a big, wide spreadsheet full of numbers and create charts in a Power BI report, knock yourself out. But, the solution won’t scale and you will inevitably run into walls when you try to make future enhancements. Similar problems arise from importing many tables from different sources and transactional systems. Several tables all chained together with creative mashups and relationships present their own set of problems. The first iteration of such an effort is usually a valuable discovery method and learning experience. Great… treat it as such; take notes, make note of the good parts and then throw it away and start over! In Fredrick Brooks’ “The Mythical Man Month“, he cites that for most engineering projects, the first six attempts should be abandoned before the team will be prepared to start over and complete the work successfully. He was a chemical engineer before working for IBM; and hopefully, our methods in the data engineering business are more effective then his 6-to-1 rule. But, this makes the case the prototypes and proof-of-concept projects are a critical part of the learning path.

The tools don’t make the rules.

Unless you’re talking about the lambda architecture, in which case that’s kind of accurate. But we’re not talking about that here.

Comments closed

PolyBase and Cosmos DB’s Core API

I have some fun integrating the Cosmos DB Core API with PolyBase:

PolyBase comes with a few built-in drivers, including Oracle, Teradata, MongoDB, and SQL Server. For everything else in the 2019 “style” of things, there is a generic ODBC route. In this route, you need to obtain a valid ODBC driver, configure it, and let PolyBase know how to access data from that remote source.

Cosmos DB’s Core API just happens to have a working ODBC driver, so the first step is to grab the relevant version of that driver and install it on the machine running SQL Server.

Read on to see how it works and how you can get around some initial pain points. As a quick note, this only works with SQL Server on Windows, as SQL Server on Linux does not support generic ODBC drivers with PolyBase.

Comments closed