Press "Enter" to skip to content

Day: February 19, 2021

Choosing an Image File Type

The folks at Jumping Rivers continue a series on image optimization:

As the JPEG compression algorithm significantly reduces file size, JPEG files are ubiquitous across the web. If you take a photo on your camera, it’s almost certainly using a JPEG storage format. Historically the file extension was .jpg as Microsoft Windows only handled three character file extensions (also .htm vs .html). But today both extensions are used (personally I prefer .jpeg, but I’m not very consistent if I’m totally honest).

If you did a little Googling on which file format to use for images, then the answer you would come across is that JPEG’s are the default choice. But remember, figures are different from standard images!

Click through for a review of three viable image formats.

Comments closed

Research with R and Production with Python

Matt Dancho and Jarrell Chalmers lay out an argument:

The decision can be challenging because they both Python and R have clear strengths.

R is exceptional for Research – Making visualizations, telling the story, producing reports, and making MVP apps with Shiny. From concept (idea) to execution (code), R users tend to be able to accomplish these tasks 3X to 5X faster than Python users, making them very productive for research.

Python is exceptional for Production ML – Integrating machine learning models into production systems where your IT infrastructure relies on automation tools like Airflow or Luigi.

They make a pretty solid argument. I’ve launched success R-based projects using SQL Server Machine Learning Services, but outside of ML Services, my team’s much more likely to deploy APIs in Python, and we’re split between Dash and Shiny for visualization. H/T R-Bloggers

Comments closed

Refreshing a Single Table in Power BI

Marc Lelijveld doesn’t want to wait for everything to reload:

If you want to refresh a Power BI dataset, we all know where to find the refresh button in Power BI Desktop as well as in the Power BI Service. By clicking it, you will trigger the entire dataset to refresh. But sometimes it is more convenient to trigger a single table to refresh. If you want to do this, you can do a simple right-click on a table in Power BI Desktop, but how does this work in the Power BI Service? In this blogpost I will describe how you can trigger a single table refresh in the Power BI Service over XMLA endpoints. Please know, this does require Power BI Premium (either Premium per User or Premium Capacity is fine).

Click through to see how.

Comments closed

Protecting Excel with Powershell

Mikey Bronowski shows us a few techniques for protecting data in Excel files using Powershell:

Last month we have been hiding things in Excel, so this week we are going to make sure they are protected as well. Excel offers multiple levels of password protection and its options:

– locking file with a password, i.e. without key phrase opening file is not possible
– protecting workbook’s structure
– lastly, protecting individual worksheets from a handful of operations

Read on to see each of those in action.

Comments closed

So You Want to Index

Erik Darling has an indexing strategy for querulous normies:

Most queries will have a where clause. I’ve seen plenty that don’t. Some of’em have surprised the people who developed them far more than they surprised me.

But let’s start there, because it’s a pretty important factor in how you design your indexes. There are all sorts of things that indexes can help, but the first thing we want indexes to do in general is help us locate data.

None of this is groundbreaking but Erik does a really good job of laying out the order in which you want to consider specific factors.

Comments closed

Multi-Pathed Queries

Guy Glanster needs a multi-tool procedure:

This stored procedure, which I created in the AdventureWorks2017 database, has two parameters: @CustomerID and @SortOrder. The first parameter, @CustomerID, affects the rows to be returned. If a specific customer ID is passed to the stored procedure, then it returns all the orders (top 10) for this customer. Otherwise, if it’s NULL, then the stored procedure returns all orders (top 10), regardless of the customer. The second parameter, @SortOrder, determines how the data will be sorted—by OrderDate or by SalesOrderID. Notice that only the first 10 rows will be returned according to the sort order.

So, users can affect the behavior of the query in two ways—which rows to return and how to sort them. To be more precise, there are 4 different behaviors for this query:

1. Return the top 10 rows for all customers sorted by OrderDate (the default behavior)
2. Return the top 10 rows for a specific customer sorted by OrderDate
3. Return the top 10 rows for all customers sorted by SalesOrderID
4. Return the top 10 rows for a specific customer sorted by SalesOrderID

Let’s test the stored procedure with all 4 options and examine the execution plan and the statistics IO.

This is quite common for reporting procedures and Guy shares several patterns, some of which work better than others.

Comments closed

DAX and Case Sensitivity

Marco Russo and Alberto Ferrari talk about case sensitivity:

Every new language defines its own rules of case-sensitivity. R and Python are case-sensitive, DAX is not. It is not that one is right and the others are not; it is really a matter of personal taste of the author of the language. We would say that there is an equal number of pros and cons in both choices. Therefore, there is no definitive choice. That said, a choice needs to be made on two aspects: the language itself and the way it considers strings. Pascal, for example, is case-insensitive as a language, but string comparison is case-sensitive. The M language, in Power Query, is case-sensitive despite living in the same environment as DAX. DAX is case-insensitive as a formula language. 

Maybe it’s because I like living in the SQL world so much, but I highly prefer case-insensitivity as the default and case-sensitivity only when necessary.

Comments closed

Polling Loops in Powershell

Aaron Nelson has one method for creating a polling loop in Powershell:

Originally I had used the Start-Sleep command to wait 3 seconds ( Start-Sleep 3
). That worked fine on my machine, but when I deployed it to the server, I found I needed to bump it up to 6 seconds. At first that worked, but then a week later I needed to bump it up to 9 seconds. The problem here is obvious, if we force it to wait 9 seconds every time, even if the task was updated after 4 second, we’re wasting extra time. And those seconds are going to add up.

Read on for a smarter approach. Ideally we’d be able to use asynchronous event handling with awaits for all of this, but the real world is not always so nice.

Comments closed