Curated SQL – Page 403 – A Fine Slice Of SQL Server

Converting Spark RDDs to DataFrames and Datasets

Published 2023-01-02 by Kevin Feasel

Ashish Chaudhary does a bit of swapping around:

In this blog, we will be talking about Spark RDD, Dataframe, Datasets, and how we can transform RDD into Dataframes and Datasets.

At this point, most of the libraries I know of accept and produce DataFrames. Occasionally you might need to “downshift” to an RDD to work with some specialty library. But in the event you do have one but want to get to another, Ashish has you covered.

Comments closed

Troubleshooting High CPU via PAGELATCH Waits

Published 2023-01-02 by Kevin Feasel

Ajay Dwiveldi does some digging:

In the above dashboards, I could clearly notice PAGELATCH_** wait at the top along with SOS_SCHEDULER_YIELD. The presence of the above 2 waits is indicative of high CPU issues due to contention on the access of data file pages. I validated and found that this PAGELATCH_** wait is present almost all the time on the server. So decided to check the data of dbo.WhoIsActive that stores captured data of sp_WhoIsActive in SQLMonitor tool.

Read on for the outcome.

Comments closed

PyODBC Error Messages

Published 2023-01-02 by Kevin Feasel

Jose Manuel Jurado Diaz collects a compendium of errors:

1) pyodbc.Error: (‘HY000’, ‘[HY000] [Microsoft][ODBC Driver 17 for SQL Server]Connection is busy with results for another command (0) (SQLExecDirectW)’)

This error ocurrs when the Python code is trying to open a new cursor when we have a previous one with results.

Read on for examples of the problem and solutions for each.

Comments closed

Calculating Present Value in Power Query

Published 2023-01-02 by Kevin Feasel

Imke Feldmann has a function for us:

Finally 2023 is here, the year we expect Power Query function libraries to ship. This will make it so much easier to re-use ready made function than M-extensions. So let’s start collecting some fodder for it, by creating a function to calculate the Present Value (PV) for Power Query:

Click through for that function, as well as an explanation of what it’s doing.

Comments closed

Running ML.NET in F#

Published 2022-12-30 by Kevin Feasel

Matt Eland builds a notebook:

In this article I’ll outline a simple pipeline that trains a regression machine learning model and saves it to a file for use later on. We’ll look at how to load the model using F# and use it to generate new predictions for new data points.

To round things out, I’ll be showing you how to do this all in a Polyglot Notebook, though you can skim over this aspect of the experiment as almost all of the code will work just fine in a normal .fs file outside of Polyglot Notebooks.

At the end, Matt mentions that the F# code looks a whole lot like C# code and that’s my biggest problem with the library: it forces you into writing C#-style code.

Comments closed

Alert Setup with Azure Monitor

Published 2022-12-30 by Kevin Feasel

Sunil Verma sounds the alarm:

For this instance, we will setup an alert and action to determine and send out a notification when a Virtual machine has been stopped and also could be restarted whenever such conditions has met.

1. Firstly, Go to search pane on the Azure portal search monitor, click on alert inside monitor and create an alert rule. Further, specify a scope for what you want to setup alert. On this occasion, I am setting it for virtual machine.

Read on to learn more about what Azure Monitor does, as well as the steps to set up an alert and an action.

Comments closed

A Recap on DAX in 2022

Published 2022-12-30 by Kevin Feasel

Marco Russo recaps the year:

Another important event for the DAX world has been the release of DAX Studio 3, with an important restyling of the user interface and user experience.

After two years, the composite models – whose official name is DirectQuery for Power BI datasets and Analysis Services – are still in preview. General availability should probably happen in 2023. In the meantime, the feature has been refined and improved. At SQLBI we also released new content about this topic.

Click through for info on eight new DAX functions, as well as what Marco and Alberto have been doing all year.

Comments closed

Power BI DevOps and CI/CD

Published 2022-12-30 by Kevin Feasel

Paul Turley tackles a difficult topic:

By most estimates, far more than 80% of all Power BI projects are small and performed by one Data Analyst or Developer. We know that Power BI is also used to develop high-volume datasets, models and business reports in full-scale deployment scenarios where DevOps principles are taken very seriously. So, with a significant minority of large-scale Power BI projects fitting into a category where someone might even think about fundamental concepts like version control or team development; what, exactly does DevOps for Power BI even mean when one size doesn’t fit every project?

Read the whole thing. Paul also includes a video on the topic for those so inclined.

Comments closed

Managed Self-Service BI in Power BI

Published 2022-12-30 by Kevin Feasel

Gogula Aryalingam has started a series on managed self-service BI. Part 1 provides an overview of the topic:

When putting together a business intelligence strategy using Power BI, Microsoft recommends three primary strategies that an organization can adopt. Out of these, the one that I tend to go with is managed self-service BI, which brings forth the concept of discipline at the core, flexibility at the edge. This concept is the dominant strategy used for BI at Microsoft itself; explained very nicely in this article. It’s my personal favorite, because I find it an effective means of onboarding customers once the core platform is built with the required standards (discipline), and then help them adopt the solution from the edge, thus providing them with the best of both worlds.

Part 2 takes us to the edge:

Now, what happens when an analyst, for instance, has a set of sales target spreadsheets and wants to compare the figures with sales metrics so that salespeople’s performances can be measured? It certainly needs a new dataset. However, flexibility at the edge has to prevail in the right way. This post will look at how we can go about this keeping to discipline at the core, flexibility at the edge.

Note: The analyst’s requirement is at current local to their group or department. It has not yet been made an organizational requirement. That’s how most requirements start out: A requirement at the departmental level, and then when enough people start reaping the benefits within and outside of the department, it can get absorbed into the core.

Part 3 returns to the core:

One problem that we may have overlooked when building a bunch of core datasets in that post, is that certain dimensions tend to duplicate across the datasets. Imagine a scenario where the single master data source of a managed self-service setup is a data warehouse, which sources all the required dimensions. When you have, for example, core reseller sales, internet sales, and finance datasets, each one will have a calendar dimension and a few others created in each of these datasets. This is not ideal if you think about the extent of the duplication and effort that is required.

This is where, once again, using DQ for PBI datasets and AS comes into play, where you could draw up a layered core dataset architecture. If we take the example of AdventureWorks’ fact tables in the data warehouse (single master data source) you can figure out what the business processes are.

Read on for Gogula’s thoughts. I think there’s a lot going for this particular strategy, especially in a large organization with hundreds (or thousands) of people actively using Power BI. At that point, doing everything through a central IT organization doesn’t scale very well.

Comments closed

Join Types in Spark SQL

Published 2022-12-29 by Kevin Feasel

Rituraj Khare makes some connections:

In Apache Spark, we can use the following types of joins in SQL:

Inner join: An inner join in Apache Spark is a type of join that returns only the rows that match a given predicate in both tables. To perform an inner join in Spark using Scala, we can use the join method on a DataFrame.

The set of options is the same as you’d see in a relational database: inner, left outer, right outer, full outer, and cross. The examples here are in Scala, though would apply just as easily to PySpark and, of course, writing classic SQL statements.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Curated SQL Posts