Press "Enter" to skip to content

Month: July 2018

Connecting GitHub To Azure Container Registry

Andrew Pruski automates the generation of SQL Server Docker images in Azure Container Registry, generating a new image with each GitHub repo check-in:

Fantastic, one build task created! How easy was that??

Let’s test by running: –

az acr build-task run --registry TestContainerRegistry01 --name buildsqlimage

And the progress of the build task can be monitored: –

az acr build-task logs --registry TestContainerRegistry01

Andrew gives us the step-by-step details, so check it out.

Comments closed

Parsing T-SQL Scripts With Pester

Rob Sewell shows us how to use Pester to ensure that a set of SQL scripts are valid T-SQL:

This is a quick Pester test I wrote to ensure that some SQL Scripts in a directory would parse so there was some guarantee that they were valid T-SQL. It uses the SQLParser.dll and because it was using a build server without SQL Server I have to load the required DLLs from the dbatools module (Thank you dbatools 🙂 )

It simply runs through all of the .sql files and runs the parser against them and checks the errors. In the case of failures it will output where it failed in the error message in the failed Pester result as well.

This particular example doesn’t ensure that the scripts do what you want them to do, but hey, Pester was built for that as well.

Comments closed

Data Engineering Remains As Important As Ever

Prashanth Southekal has good news for ETL developers:

While many companies have embarked on data analytics initiatives, only a few have been successful. Studies have shown that over 70% of data analytics programs fail to realize their full potential and over 80% of the digital transformation initiatives fail. While there are many reasons that affect successful deployment of data analytics, one fundamental reason is lack of good quality data. However, many business enterprises realize this and invest considerable time and effort in data cleansing and remediation; technically known as data engineering. It is estimated that about 60 to 70% of the effort in data analytics is on data engineering. Given that data quality is an essential requirement for analytics, there are 5 key reasons on why data analytics is heavy on data engineering.

1.Different systems and technology mechanisms to integrate data.

Business systems are designed and implemented for a purpose; mainly for recording business transactions. The mechanisms for data capture in Business systems such as ERP is batch/discrete data while in the SCADA/IoT Field Systems it is for continuous/time-series data. This means that these business systems store diverse data types caused by the velocity, volume, and variety dimensions in the data. Hence the technology (including the database itself) to capture data is varied and complex.  And when you are trying to integrate data from these diverse systems from different vendors, the metadata model varies resulting in data integration challenges.

That 60-70% on data engineering is probably a moderate underestimate.

Comments closed

Analyzing Clickstream Data With Spark

Tony Cruz and Denny Lee analyze advertising data in Spark and predict click counts given certain input features:

Let’s look at a concrete example with the Click-Through Rate Prediction dataset of ad impressions and clicks from the data science website Kaggle.  The goal of this workflow is to create a machine learning model that, given a new ad impression, predicts whether or not there will be a click.

To build our advanced analytics workflow, let’s focus on the three main steps:

  • ETL

  • Data Exploration, for example, using SQL

  • Advanced Analytics / Machine Learning

The Databricks blog has a couple other examples, but this was the most interesting one for me.

Comments closed

Thoughts On GraphQL In Existing Systems

Anshulee Asthana shares some thoughts after researching GraphQL implementation details:

The Existing System

  • SQL Server DB with SPs, functions, views, and queries. SPs are non-modular in the sense they join various tables and return values specific to the calls being made.
  • Connection with the DB is using basic ADO.NET and not with EF/LINQ, etc.
  • ASP.NET WebAPI2.0 (Not .NET Core) and Java-based APIs.
  • Native Android, iOS and Web Clients. Some portion of the web clients bypasses the API and talks to the DB directly.
  • WebClients: ASP.NET and React.
  • A React Native-based app.

System Maturity

  • System has been in production for few years now and is stable.
  • As the system is in a competitive space, new features are always getting added apart from usual maintenance.

Customer Ask

  • Whether it makes sense to wrap our existing APIs into a GraphQL Endpoint.

  • For a new feature in the react app evaluate making the new .NET based APIs in GraphQL.

It’s nice that Anshulee shared this case study, especially because there’s relatively little involving GraphQL + .NET.

Comments closed

So You’ve Got Database Corruption—Now What?

Jeff Mlakar walks us through troubleshooting an instance of database corruption:

Before we dive into fixing database corruption it is important to know what not to do. There are things we can do which will make a bad situation even worse.

  1. Don’t Panic – most sysadmins are used to having their cages rattled and can keep cool under duress. Jumping to action without a plan is not wise. Now is not the time to start trying things and performing thoughtless actions.

  2. Do NOT Detach the Database – we may never get it back again as it may be in a recovery pending state.

  3. No restarting SQL Services – databases may never get started up again just like the above.

  4. Don’t Reboot the Machine – same as above 2 points

  5. Don’t Start by Trying to Repair the Corruption – root cause analysis is critical to preventative measures

Jeff then walks us through things that he does to discern the root cause and correct the issue (if possible).

Comments closed

Building OData Sources In Biml

Cathrine Wilhelmsen gives us a workaround for no native OData support in Biml:

As of July 2018, there is no built-in Biml support for OData. To work with OData in Biml, you have to create a custom source and connection manager. This requires more Biml code than built-in functions like OleDbSource and may look a little overwhelming at first. But don’t worry! You don’t have to start from scratch.

In this blog post, we will first look at the properties of the OData Connection Manager and how to script it in Biml. Then, we will do the same for the OData Source. Finally, we will tie it all together and create a complete SSIS package that you can use a starting point for your own projects.

There’s a quick and easy solution but definitely read the whole thing to catch any oddities which might arise from reverse engineering your SSIS packages.

Comments closed

Power BI And Many-To-Many Joins

Teo Lachev notes a big improvement to Power BI’s data modeling story:

The July 2018 preview of Power BI Desktop delivers two killer preview features that solidify the Power BI position as the best data modeling tool on the market. First, Microsoft relaxes the Power BI relationship limitations by letting you create M:M joins between two tables. Second, you can now create a composite (hybrid) data model with different storage modes, such as from an SQL Server database configured for DirectQuery and from an imported text file. Without reiterating the documentation, here are some important notes to keep in mind based on my pre-release testing.

But read the whole thing, which includes some limitations around hybrid data models.

Comments closed

Parsing Power BI Logs With Power BI

Kellyn Pot’vin-Gorman uses Power BI to analyze Power BI logs for issues:

Do I really want to go through all this data by hand?  BI is a reporting tool, so what if I bring them into Power BI?  Let’s start with the first MS Mashup Container log-

In Power BI, click on Get Data –> Text and change the file type to “All Files” in the explorer and go to the directory that contains the trace files:

C:\Users\<user>\AppData\Local\Microsoft\Power BI Desktop\Traces\Performance

Remember that you will need to have “hidden items” set to be displayed to browse down to this folder.  Choose the files you wish to load in the directory and Power BI and choose a Customer delimiter of a quotes, (“) to separate the file.  This will load a file that will have a few columns you’ll need to remove that contain data like colons, nulls and other syntax from the file

I like it.

Comments closed

CXCONSUMER Not Entirely Harmless

Erik Darling points out a case where CXCONSUMER waits are important:

In this sample there are absolutely no waits whatsoever on CXPACKET.

They are nonexistent.

If you were hoping to find out that they were way crazy out of control call a priest and ditch your bar tab we’re driving on the train tracks, you’ll be awfully disappointed.

There just aren’t any.

There’s only one core in use for nearly the entire duration, aside from some blips.

That’s disappointing.  I was hoping to be able to ignore this wait altogether.

Comments closed