Press "Enter" to skip to content

Month: October 2022

Appending Rows to a Pandas DataFrame

Matt Eland acquires some rows that fell off a truck:

Recently I was working on comparing the performance of different machine learning models and I wanted to add entries to a Pandas DataFrame as I evaluated each model. What I found was that adding new rows to a Pandas DataFrame was a little harder than I suspected and required some mild searching, so I wanted to preserve the two solutions I found here in case it helps someone else.

Read on for those two solutions, though as Matt points out, only one of them is a good solution.

Comments closed

Backups and Restores when a NAS Requires a Password

Jana Sattainathan needs to give the daily password:

Sometimes, you have a share (like Azure Data Box via SMB as was the case for me) that you can access only with a UserName and Password. This is fine as long as you are accessing it interactively by typing it in, but how about accessing it from SQL Server for the purposes of backing up and restoring?

This is where “NET USE” command comes in handy becomes necessary

Read on to see how that can help you out.

Comments closed

Automatic Partition Maintenance in Power BI Incremental Refresh

Shabnam Watson goes investigating:

In this post, I am going to look at automatic partition maintenance by Power BI service for datasets with Incremental Refresh and focus on what happens to the partitions as time goes by. To do this, I am going to set up a couple of sample datasets with different Incremental Refresh (IR) policies with and without the Hybrid option, schedule automatic refreshes from the Power BI Service, and record how their partitions change over time. As a result, this post is going to get updated as time goes on as it documents how the partitions evolve.

Read on to learn more about what Incremental Refresh does and how things have changed over time. This looks like a post to come back to a few times.

Comments closed

Testing Azure SQL DB Hyperscale Performance

Reitse Eskens continues a series on performance testing Azure SQL DB tiers:

So far, my blogs have been on the different Azure SQL DB offerings where there are differences between DTU and CPU based. But in general, the design is recognizable. With the hyperscale tier, many things change. There are still cores and memory of course, but the rest of the design is totally different. I won’t go into all the details, you’re better off reading them here [] and here [] , but the main differences are the support of up to 100 TB of data in one database (all the other tiers max out at 40 TB), fast database restores based on file snapshots, rapid scale out and rapid scale up.

There are differences in testing this one versus the others, so buyer beware.

Comments closed

Row-Level Security against Power BI Shared Datasets

Teo Lachev combines two capabilities in Power BI:

In a typical engagement, I create an organizational semantic model(s) and “report packs”, such as Sales Report Pack, Inventory Report Pack, etc. These report packs are typically implemented as Power BI reports connected to the semantic model as a shared dataset using the Power BI Datasets connector. Reports sanctioned by IT are published to a dedicated workspace, such as Corporate BI. Departmental reports are deployed to their respective workspace, such as Sales, to enforce content-level security. Usually, the semantic model has row-level security (RLS) roles defined to enforce restricted access to data depending on the identity of the interactive user.

Read on to see how you can test out the results once you get it working.

Comments closed

Registered Servers in SQL Server

Kevin Hill has a video for us:

Query multiple SQL Server instances at one time!

I have used registered servers before, though my strong preference is for a Central Management Server. There are limits to how many servers CMS can handle before the initial load gets slow—I know this because we’re over the limit and it causes SSMS to freeze for 3-5 seconds the first time I open CMS. Still, having that server list in a central location means not having to share files around with the rest of the team and trying to figure out whether you have the most recent version of the file.

Comments closed

Motion Detecting and Alerting with Kafka and ksqlDB

Wei Rui and Yinsidi Jiao take us through a scenario:

Managing IoT (Internet of Things) devices and their produced data or events can be a challenge. On one hand, IoT devices usually generate massive amounts of data. On the other hand, IoT hardware has many limitations to process the data generated, such as cost, physical size, efficiency, and availability. You need a back-end system with high scalability and availability to process the growing volume of data. Things become more challenging when dealing with numerous devices and events in real time, and considering the required availability, latency, scalability, and agility for different usage and scenarios.

For Confluent Hackathon 2022, we built an end-to-end motion detection and alerting system, which currently acts as a home surveillance system, on top of Apache Kafka® and ksqlDB to demonstrate how easy it is to build IoT solutions by leveraging Confluent Cloud.

Read on to see how it works.

Comments closed

Software-as-a-Service: Single DB or Per-Client DB

Greg Low makes a choice:

On-premises applications are mostly single-tenant. They support a single organization. We do occasionally see multi-tenant databases. They hold the same types of information for many organizations.

But what about SaaS based applications? By default, you’ll want to store data for many client organizations. Should you create a large single database that holds data for everyone? Should you create a separate database for each client? Or should you create something in-between.

As with most things in computing, there is no one simple answer to this. Here are the main decision points that I look at:

Click through for Greg’s thoughts on the matter. Most of these factors are also relevant for on-premises SQL Server installations, not just Azure SQL DB/Managed Instance.

Comments closed

Apache Flink 1.16 Released

Godfrey He makes an announcement:

To reduce the cost of migrating Hive to Flink, we introduce HiveServer2 Endpoint and Hive Syntax Improvements in this version:

The HiveServer2 Endpoint allows users to interact with SQL Gateway with Hive JDBC/Beeline and migrate with Flink into the Hive ecosystem (DBeaver, Apache Superset, Apache DolphinScheduler, and Apache Zeppelin). When users connect to the HiveServer2 endpoint, the SQL Gateway registers the Hive Catalog, switches to Hive Dialect, and uses batch execution mode to execute jobs. With these steps, users can have the same experience as HiveServer2.

Read on for a pretty large hit list.

Comments closed