Press "Enter" to skip to content

Author: Kevin Feasel

Configuring Memory Limits for SQL Server in Kubernetes

Anthony Nocentino doesn’t have all the RAM in the world:

With that Pod deployed, I loaded up a HammerDB TPC-C test with about 10GB of data and drove a workload against our SQL Server. Then while monitoring the workload…boom HammerDB throws connection errors and crashes. Let’s look at why.

First thing’s first, let’s check the Pods status with kubectl get pods. We’ll that’s interesting I have 13 Pods. 1 has a Status of Running and the remainder have are Evicted. 

Anthony does a great job of explaining the problem and showing you the solution.

Comments closed

Documenting SSIS Catalogs

Dave Mason continues a series on documenting Integration Services:

In the last post, we looked at query options for documenting SSIS packages that are deployed via the legacy package deployment model. What about SSIS packages deployed to a catalog via the project deployment model? Is the package XML accessible so that we can query and shred it? I don’t know with certainty, but I think it is not.

However, there are .NET Framework interfaces we can use to programatically obtain SSIS catalog metadata. So for this post, I’ll soldier on without my dear friend T-SQL and proceed with PowerShell. Here’s a look at the projects and packages I was working with as I developed and debugged the script code. It’s a short list with just two projects, having one package each. The “DailyETLMain.dtsx” package is a sample from Microsoft’s GitHub repository.

Click through for Dave’s explanation and a link to the script.

Comments closed

Variable Header Counts and Azure Data Factory

Mark Kromer shows how you can convince Azure Data Factory to skip a variable number of lines before beginning processing:

A common requirement that I see from customers who are processing text files in data lakes with Azure Data Factory, is to read and process files where there are variable numbers of lines that precede both the data headers and the data that needs to be processed. ADF already has facilities that handle the ability to switch headers off or on as well as the ability to specify parameterized skip line counts. However, in many cases, files that are received for processing have variable numbers of superfluous lines that need to be skipped.

In ADF, between pipeline activities and data flows, there are a number of ways to handle this scenario. In this post, I am going to demonstrate one such technique. 

Read on to see which technique Mark chose and how to get it working.

Comments closed

Storytelling with Power BI

Marc Lelijveld wraps up a series on storytelling with Power BI:

Let them ask questions
As a report author, you start building your reports based on the information needs and business requirements you collected before your project. However, every answer to a question, triggers a new question to come up. In the end you end-up with more questions to answer than you thought about up front. Maybe even with scope creep in agile projects.

However, it is very unlikely that you answer all the business information needs in your dashboard or report within one iteration. So why not give them the ability to exploitative interact with the report and ask questions in a native language to their dataset? Power BI has the ability to ask questions to your data in your native language in just a few clicks.

This is probably one of the most underutilized aspects of Power BI.

Comments closed

HBase and S3

Krishna Maheshwari, et al, explain how we can allow Apache HBase to use S3 for storage:

Cloudera Data Platform (CDP) provides an out-of-the-box solution that allows Apache HBase deployments to use Amazon Simple Storage Service (S3) as its main persistence layer for saving table data. Amazon S3 is an object store which offers a high degree of durability with a pay-per-use cost structure. There is no server-side component to run or manage for S3 — all that is needed is the S3 client library and AWS credentials. However, HBase requires a consistent and atomic filesystem which means that it cannot directly use S3 because it is an eventually consistent object store. Both CDH and HDP have only provided HBase solely using HDFS because there have been long-standing impediments that prevented HBase from natively using S3. To address these issues, we’ve built an out-of-the-box solution which we are delivering for the first time via CDP. When you launch an Operational Database (HBase) cluster on CDP, HBase StoreFiles (the backing files for HBase tables) are stored in S3 and HBase write-ahead-logs (WAL) are stored in an HDFS instance run alongside HBase per usual.

I hadn’t thought of using S3, but it’s an interesting post.

Comments closed

Data Warehouse Concepts

Katrine Spirina takes us through several basic data warehousing concepts:

There are three basic types of modeling. Conceptual Data Model describes all entities a business needs information about. It provides facts about real-world things, customers, and other business-related objects and relations.

The goal of creating this data model is to synthesize and store all the data needed to gain an understanding of the whole business. The model is designed for the business audience.
Logical Data Model suits more in-depth data. It describes the structure of data elements, their attributes, and ways these elements interrelate. For instance, the model can be used to identify relationships between customers and products of interest for them. This model is characterized by a high level of clarity and accuracy.

Physical Data Model describes specific data and relationships needed for a particular case as well as the way data model is used in database implementation. It provides a wealth of meta-data and facilitates visualizing the structure of a database. Meta-data can involve accesses, limitations, indexes, and other features.

Click through for the whole story.

Comments closed

TensorFlow Changes

Ajit Jaokar summarizes key changes from TensorFlow from 1.x to 2.0:

The Data pipeline simplified:  TensorFlow2.0 has a separate module TensorFlow DataSets that can be used to operate with the model in more elegant way. Not only it has a large range of existing datasets, making your job of experimenting with a new architecture easier – it also has well defined way to add your data to it.
 
In TensorFlow 1.x for building a model we would first need to declare placeholders. These were the dummy variables which will later (in the session) used to feed data to the model. There were many built-in APIs for building the layers like tf.contrib, tf.layers and tf.keras, one could also build layers by defining the actual mathematical operations.
TensorFlow 2.0 you can build your model defining your own mathematical operations, as before you can use math module (tf.math) and linear algebra (tf.linalg) module. However, you can take advantage of the high level Keras API and tf.layers module. The important part is we do not need to define placeholders any more.

These look like some nice improvements.

Comments closed

Get Lock Details Against a Database

David Fowler has a new procedure:

Have you ever wanted a quick and easy way to see who was holding (and waiting on) locks on a particular database? Perhaps you’ve got some blocking issues going on and you want to see exactly which rows the row level locks were taken out on?

sp_LockDetails will return some handy information about the locks held on a specific database, including SPID, login, database, lock type and resource.

David includes the script in the post as well.

Comments closed

Azure Data Factory Pipeline Hierarchies

Paul Andrew explains the idea of pipeline hierarchies with respect to Azure Data Factory:

Next, even if the concept isn’t new, where I’d like to call out two big differences in my approach to orchestration with ADF comes from working within Microsoft Azure. The highly scalable cloud platform presents some new challenges that SSIS simply didn’t. For me these are:

– Needing to consider our wider solution and what things now cost. I’m fairly sure I’ve said it before. When working with ‘Pay-as-you-go’ services we need to think about designing for cost/consumption as well as all our other data transformation and output requirements. In Azure it is so easy to just leave resources running night and day, when only a short window of compute is needed.
– We need to consider the scale out capabilities of the other services that ADF is going to invoke. Or, to put it another way, how much parallel activity execution do we want ADF to achieve? As you may know the ADF ForEach activity by default allows us to execution inner activities in parallel, but is that enough?

It’s a very interesting idea; read the whole thing.

Comments closed

Listing Windows Users with Powershell

Jack Vamvas shows us how we can use Powershell to list Windows users in an Active Directory group:

Question: How can I use Powershell to list out Windows users? Are there Powershell cmdlets which can report on Windows users ?
Answer: There are “out of the box” Powershell cmdlets which will support the requirement . How you apply the powershell cmdlets will depend on how much detail is required.

Jack has a few examples here as well.

Comments closed