Press "Enter" to skip to content

Day: December 17, 2021

Trying out Spark Streaming

Tomaz Kastrun continues a series on Spark. Part 15 provides an introduction to Spark Streaming:

Spark Streaming or Structured Streaming is a scalable and fault-tolerant, end-to-end stream processing engine. it is built on the Spark SQL engine. Spark SQL engine will is responsible for running results sets for streaming data, regardless of static or continuously in coming stream data.

Spark stream can use Dataframe (or Datasets) API in Scala, Python, R or Java to work on handling data ingest, creating streaming analytics and do all the computations. All these requests and workloads are done against Spark SQL engine.

I don’t think I’ve ever seen an example of using Spark Streaming in R, so that one’s new to me.

Part 16 looks at DataFrame operations in Spark Streaming:

When working with Spark Streaming from file based ingestion, user must predefine the schema. This will require not only better performance but consistent data ingest for streaming data. There is always possibility to set the spark.sql.streaming.schemaInference to true to enable Spark to infer schema on read or automatically.

Check out both of those posts.

Comments closed

Log Replay for Azure SQL Managed Instance

Joey D’Antoni has some quick notes on the Log Replay Service:

Recently, I’ve started on a project where we are migrating a customer to Azure SQL Managed Instance, which now supports a few different migration paths. You can simply backup and restore from a database backup, but you can’t apply a log or differential backup to that database. You can also use the Database Migration Service, but that requires a lot of infrastructure and Azure configuration. The log replay service, or LRS, is the functional equivalent of log shipping to your new managed instance database. While log shipping is a well known methodology for both database migrations or disaster recovery. However, the implementation is a little different–let’s talk about how it works.

Click through to see how it differs.

Comments closed

Reviewing Azure Options for PostgreSQL and MySQL

Maria Zakourdaev has a pair of info sheets. First up is Azure Database for MySQL:

MySQL is an open-source relational database that is widely used for web applications, it’s easy to use, reliable, secure, and fast.

Recently Microsoft have announced a new deployment option, Flexible Server, that is now generally available.

If we have a quick look at the available options, we now have Single Server and Flexible server deployment options.

Then we have Azure Database for PostgreSQL:

PostgreSQL is an open-sourced, feature rich and extendable relational database that handles high concurrency workloads easily. It supports complex structures, many advanced data types, Search Tree indexes and also got highly sophisticated query optimizer.

Azure Database for PostgreSQL is an Azure managed services running PostgreSQL community edition. With Flexible Server announced recently, you now have 3 deployment options: Single Server, Flexible Server and Hyperscale/Citus.

Click through for a quick comparison of each available option.

Comments closed

Matching Supply with Demand using Batch Mode

Joe Obbish has a solution:

Itzik Ben-Gan posted an interesting T-SQL challenge on SQL performance dot com. I’m writing up my solution in my own blog post because I have a lot to say and getting code formatting right can be tricky in blog post comments. For reference, my test machine is using SQL Server 2019 CU14 with an i7-9700K CPU @ 3.60 GHz processor. The baseline cursor solution completes in 8465 ms on my machine.

Click through for a bit of formal logic and a lot of tuning.

Comments closed

Sparklines and Filter Context

Ed Hansberry riffs on the new sparkline functionality in Power BI:

This isn’t a tutorial on how to create sparklines as Microsoft has you covered there. What I want to discuss though is how Sparklines are impacted by the filter context in a visual. First though you need to enable this feature. It is in preview mode at launch, so go to File, Options, Options and Settings, and look for the Preview section. Click Sparklines, then close and restart Power BI Desktop.

In the visual above, I added the “Month” field to the Sparkline settings and that created an additional filter on the visual. Before I added the Sparkline, Year was the only filter affecting the numbers. By adding the month to the sparkline, shown below, it breaks out the Total Net Sales figure by month.

But what if I don’t want the sparkline to be the full range of data. For example, I just want the sparkline to show the last 3 months of the year, so 3 data points, not 12.

Click through to see how it’d look as a measure and what you need to do to make sparklines look right.

Comments closed

Copy Logins between SQL Server Instances

David Alcock wants to move a login:

Migrating SQL databases is fun, depending on your definition of fun that is. The process can involve having to move things such as login details that have been around for that long that nobody has a clue what they are anymore.

With domain accounts that’s pretty straightforward, the passwords are managed in Active Directory and not held in SQL Server and it’s just a case of recreating the account on the new instance.

SQL authentication is different and migrating an account as is means you also have to recreate the password as is which could be difficult if you didn’t know what the password should be. It’s worth saying at this point that the preference should always be to use domain accounts, they’re more secure and much more manageable and migrations are ideal opportunities to refactor things to be better but for the sake of this article let’s proceed with the scenario of recreating a SQL authentication login with an unknown password, and we’ll need to get creative.

And as you’d expect, dbatools makes a dramatic appearance.

Comments closed

Troubleshooting External Command Timeouts in Power BI

Chris Webb continues a series n troubleshooting timeouts:

In the first post in this series I showed how any Power BI dataset refresh started via the Power BI portal or API is limited to 2 hours in Shared capacity and 5 hours in Premium capacity, and how you could work around that by running a refresh via Premium’s XMLA endpoint feature. In the second post in this series I showed how some M functions allow you to set timeouts. However, even if you initiate a refresh via the XMLA endpoint you may still get a timeout error and in this post I’ll discuss another reason why: the External Command Timeout.

Read on to see what the external command timeout is and when it might strike.

Comments closed