Kevin Feasel – Page 525

Additional Backup Tools for MySQL

Published 2022-12-12 by Kevin Feasel

As already mentioned, Zmanda Recovery Manager (or ZRM for short) is a member of the Zmanda family of products – Zmanda is famous for offering backup tools for MySQL and MariaDB. The company allows its customers to scale up without any issues (they offer a pay-as-you-go subscription model), and its tools are capable to back up terabytes of data in MySQL.

Read on to learn more about it, as well as a couple more tools you can use to back up a MySQL database.

Comments closed

Refreshing One Power BI Dataset Table via SSMS

Published 2022-12-12 by Kevin Feasel

Nicky van Vroenhoven performs a small update:

A few weeks back I was working on a dataset at a client where I needed to import Excel files from a folder into said dataset. I filtered the files on a prefix and loaded around 30 files of the same structure to a table in my dataset. The Excel files are exports from a budgetting system (I know, right?) that have to be updated multiple times in the next coming weeks on an ad-hoc basis.

Rather than refreshing the entire dataset, there’s a better way, though there is a caveat.

Comments closed

An Overview of Quarto for R Users

Published 2022-12-09 by Kevin Feasel

Nicola Rennie and Colin Gillespie provide an overview of Quarto:

Earlier this year, Posit (formerly RStudio) released Quarto. Quarto is an open-source scientific and technical publishing system that allows you to weave together narrative text and code to produce high-quality outputs including reports, presentations, websites, and more.

One of the main features of Quarto is that it isn’t just built for R. It’s language-agnostic. It can render documents that contain code written in R, Python, Julia, or Observable. That makes it incredibly useful if you work in multilingual teams, or collaborate with people who write in a different programming language from you. But what if you don’t use any other programming languages? What benefits does Quarto bring to people who only use R?

Read on to learn why you might want to use it over R Markdown.

Comments closed

Contrasting Data Lake, Delta Lake, and Data Lakehouse

Published 2022-12-09 by Kevin Feasel

Giuliano Rapoz and Arshad Ali disambiguate a few terms:

As a data engineer, we often hear terms like Data Lake, Delta Lake, and Data Lakehouse, which might be confusing at times. In this blog we’ll demystify these terms and talk about the differences of each of the technologies and concepts, along with scenarios of usage for each.

Read on to learn more about each topic.

Comments closed

AML Environments and SDKs

Published 2022-12-09 by Kevin Feasel

Tomaz Kastrun continues an advent of Azure ML. First up is environments:

We have explored how to create a compute instance and compute target and learned that ML frameworks and scripting packages always come preinstalled.

Choosing the right set of components (CPU, GPU, RAM, Core) and corresponding software (OS, ML Framework, packages) can be time-consuming.

Under Curated environments, you will find predefined environments, with settings for running particular frameworks, like PyTorch or TensorFlow.

Then an overview of the Azure CLI and Python SDK for AML:

What is Azure CLI? It is an Azure Command Line, a great tool for running commands out of CMD. It is a multi-platform and can be run from Azure or from the client’s machine. It is great for scripting and automating repetitive tasks or making the complex task look like lines of code, especially when it comes to infrastructure, managing, provisioning and monitoring. It can also be run from Azure Cloud Shell. It is native to Azure and can be used across all the services and offerings. Usually, the Azure CLI commands start with “az ..”. On top of that, you can also install Azure Machine Learning CLI, as an extension to Azure CLI. The AML CLI will give you additional commands to manage resources for machine learning.

The same functionality (to some extent) in Azure Machine Learning can be achieved with Python SDK. In addition to that, it offers also great ways to create and manage resources you use for training and deployment of models.

And, so that we can catch up a bit to Tomaz, one more post covering the Python SDK:

Looking briefly into Azure CLI and Python SDK, let’s explore the power of SDK and the most important namespaces.

Comments closed

Partial Database Projects

Published 2022-12-09 by Kevin Feasel

Olivier Van Steenlandt doesn’t get the whole cookie:

In this blog post, I will describe how you can get a database in source control partially. You might be wondering why you would do that. Well, let’s start by explaining the use case.

A couple of years ago, I was working for a company where a third-party vendor owned the OLTP system. At that point in time, we were not allowed to change any existing objects or create any new objects in the existing schemas. Though, we were required to be able to transfer the data from the OLTP system to the staging environment of our Data Warehouse. To do so, the third-party vendor created a schema in the database where we were allowed to create views and stored procedures to be able to get the data we needed.

Read on for an example of how this might work, as well as important database project settings you’ll want to change in that case.

Comments closed

Ignoring Warnings in Powershell

Published 2022-12-09 by Kevin Feasel

Kenneth Fisher puts a sticky note over the blinking red light so it won’t bother him anymore:

Ok, great! Good information. Not something that affects me right now but still helpful to know. And I really appreciate the fact that going forward I can expect this type of information.

I found this information less useful and less appreciated after I put it into a loop and it ran ~40 times in a row, hiding any real information beneath a pile of warnings. Fortunately, right there in the warning there is a helpful note on how to suppress it

Read on for the story of why this message popped up, as well as how to prevent it from popping up and Kenneth’s medium-term plan for dealing with it.

Comments closed

A Crash Course on Synapse Studio

Published 2022-12-09 by Kevin Feasel

Kevin Chant wants six minutes of your time:

In this post I want to do a six-minute crash course about Synapse Studio. I wanted to do this follow-up post for a couple of reasons.

First reason is because a while ago somebody who was fairly new to Azure Data Engineering Services mentioned that they thought a lot of my posts were for advanced users. So, I showed them a previous post which was a five-minute crash course about Synapse Studio.

Whilst showing them that post I realized that some of the screenshots were out of date. With this in mind I thought I would do an updated version of the crash course for Synapse Studio. Which also allows me to highlight where to find some features.

Start your timers and get reading.

Comments closed

Encryption and Regulatory Requirements

Published 2022-12-09 by Kevin Feasel

Matthew McGiffen talks regulation:

One of the reasons you may be considering encryption is due to the relevant data protection regulation: either because the regulation specifies that data should be encrypted or because of the large potential penalties where there is a data breach. Some US companies have been hit by fines in the hundreds of millions of dollars following data breaches, so we are talking large sums of money. In Europe the largest fines so far (under the GDPR) have been related to misuse of personal data or consent (750 million euros is the highest I am aware of), but there have been fines of up to 30 million euros for data breaches. In the case of a breach, you could also be sued by individuals whose data has been accessed or by class action.

Read on for more thoughts on the topic.

Comments closed

The Benefits of Stacking Pull Requests

Published 2022-12-08 by Kevin Feasel

Vivian Qu explains why stacking pull requests can make sense:

Becoming proficient in version control and change management is a necessary part of any software engineer’s job. However, I think that basic proficiency alone is not sufficient to be truly effective when working on complex production-ready software with a team of engineers. Stacking pull requests (PRs) is a key skill that should be learned early in a junior engineer’s career.

Stacking PRs is an advanced git technique that allows an engineer to break down one large change into a series of dependent changes that can be turned into smaller pull requests and reviewed separately.

Read on to learn more. It’s a skill I definitely don’t have, so time to add that to my to-learn list.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Author: Kevin Feasel