Press "Enter" to skip to content

Curated SQL Posts

Migrating Oracle Exadata Workloads to Azure

Kellyn Pot’vin-Gorman shows the process of moving from an Exadata system to Oracle on Azure:

An Exadata is an engineered system-  database nodes, secondary cell nodes, (also referred to as storage nodes/cell disks), InfiniBand for fast network connectivity between the nodes, specialized cache, along with software features such as Real Application Clusters, (RAC), hybrid columnar compression, (HCC), storage indexes, (indexes in memory) offloading technology that has logic built into it to move object scans and other intensive workloads to cell nodes from the primary database nodes.  There are considerable other features, but understanding that Exadata is an ENGINEERED system, not a hardware solution is important and its both a blessing and a curse for those databases supported by one.  The database engineer must understand both Exadata architecture and software along with database administration.  There is an added tier of performance knowledge, monitoring and patching that is involved, including knowledge of the Cell CLI, the command line interface for the cell nodes.  I could go on for hours on more details, but let’s get down to what is required when I am working on a project to migrate an Exadata to Azure.

Click through for the process.

Leave a Comment

Power BI: Visual has Exceeded the Available Resources

Chris Webb explains why you might see an error in Power BI:

This visual has exceeded the available resources. Try filtering to decrease the amount of data displayed.Please try again later or contact support. If you contact support, please provide these details.More details Resource Governing: The query exceeded the maximum memory allowed for queries executed in the current workload group (Requested 1048580KB, Limit 1048576KB).

The official Power BI documentation has similar advice to what’s shown in this dialog about what to do here, but what’s really going on?

The information in the “More details” section of the section dialog gives you a clue: in this case it’s resource governance. When you run a DAX query in Power BI it will always use a certain amount of memory; inefficient DAX calculations can cause a query to try to grab a lot of memory. In Power BI Desktop these queries may run successfully but be slow, but the Power BI Service can’t just let a query use as many resources as it wants (if it did, it may affect the performance of other queries being run by other users) so there is a resource governor that will kill queries that are too resource hungry. In the case of the visual above the query behind it tried to use more than 1GB of memory and was killed by the resource governor.

Read on to understand where these limits are and how you can modify them.

Leave a Comment

Indexes for Memory-Optimized Tables

Monica Rathbun takes us through the options available when creating indexes on memory-optimized tables:

Before we dive into this subject it is VERY important to note the biggest differences.

First, ALL memory optimized indexes MUST be created when the table is created or migrated. You cannot add indexes in an existing table without dropping and recreating the table.

Secondly, currently you can only have 8 indexes per table including your primary key. Remember that every table must have a primary key to enforce a secondary copy for a minimum of schema durability This  means you can only really add 7 additional indexes so be sure to understand your workloads and plan indexing accordingly.

There are a few other differences as well, which Monica covers before detailing the specific index options.

Leave a Comment

Solving the Gaps and Islands Set of Problems

Ed Pollack continues a series on gap and island analysis:

Gaps and islands analysis supplies a mechanism to group data organically in ways that a standard GROUP BY cannot provide. Once we know how to perform an analysis and group data into islands, we can extend this into the realm of real data.

For all code examples in this article, we will use a set of baseball data that I’ve created and maintained over the years. This data is ideal for analytics as it is large and contains data quality that varies between very accurate and very sloppy. As a result, we are forced to consider data quality in our work, as well as scrutinize boundary conditions for correctness. This data will be used without much introduction as we will only reference two tables, and each is relatively straightforward.

The code in this article gets a bit complex, but Ed shows off some powerful techniques.

Leave a Comment

Copying Measure Definitions in Power BI

Erik Svensen takes us through an oddity in Power BI’s user interface:

Here is an idea you can vote for if you would find it useful as well – https://ideas.powerbi.com/forums/265200-power-bi-ideas/suggestions/13219620-duplicate-measure-and-format-copy

So we end up copying the formula from text in the formula bar

And click new measure and Paste it into the formula bar

But 8 of 10 times nothing is pasted (at least when I select) – WHY ???

This is a strange user experience. But regardless, I find it odd that you can’t copy a measure definition. If this is odd to you as well, upvote the Power BI suggestion.

Leave a Comment

Fraud Detection with Flink

Alexander Fedulov gives us a case study of using Apache Flink for fraud detection:

In this blog post, we have discussed the motivation behind supporting dynamic, runtime changes to a Flink application by looking at a sample use case – a Fraud Detection engine. We have described the overall architecture and interactions between its components as well as provided references for building and running a demo Fraud Detection application in a dockerized setup. We then showed the details of implementing a dynamic data partitioning pattern as the first underlying building block to enable flexible runtime configurations.

To remain focused on describing the core mechanics of the pattern, we kept the complexity of the DSL and the underlying rules engine to a minimum. Going forward, it is easy to imagine adding extensions such as allowing more sophisticated rule definitions, including filtering of certain events, logical rules chaining, and other more advanced functionality.

It was an interesting discussion and you can grab the code as well.

Leave a Comment

Customizing Your Rprofile

Colin Gillespie shows how you can customize R via the .Rprofile file:

Every time R starts, it runs through a couple of R scripts. One of these scripts is the .Rprofile. This allows users to customise their particular set-up. However, some care has to be taken, as if this script is broken, this can cause R to break. If this happens, just delete the script!

Full details of how the .Rprofile works can be found in my book with Robin on Efficient R programming. However, roughly R will look for a file called .Rprofile first in your current working directory, then in your home area. Crucially, it will only load the first file found. This means you can have per project Rprofile.

Click through for a sample R profile which has a lot going on.

Leave a Comment

Simulating Feller’s Coin-Tossing Puzzle in R

David Robinson has another fun puzzle:

Mathematician William Feller posed the following problem:

If you flip a coin times, what is the probability there are no streaks of heads in a row?

Note that while the number of heads in a sequence is governed by the binomial distribution, the presence of consecutive heads is a bit more complicated, because the presence of a streak at various points in the sequence isn’t independent

Click through for a solution in R.

Leave a Comment

Against Citizen Data Scientists

Bill Schmarzo doesn’t like the idea of “citizen data scientists” very much:

“Hello,” he says. “My name is Dr. Payne and I am your Citizen Dentist for today.”

Citizen Dentist?! You repeat the question out loud for him to hear, want an answer to this looney statement. “What is a Citizen Dentist?”

Get this. He replies, “I’m a person who performs dental work, but my proficiency and expertise is outside of the field of dentistry.”

Bill’s alternative is “Citizens of Data Science.” Click through to see what that means and how it differs.

Leave a Comment