Press "Enter" to skip to content

Author: Kevin Feasel

Strings And Identifiers

Kenneth Fisher explains the difference between a string and an identifier:

A common mistake, and one I make frequently myself is to use a string in place of an identifier, or vise-versa. So to start, let’s have some definitions, shall we?

String

a linear sequence of characters, words, or other data.

Identifier

a sequence of characters used to identify or refer to a program or an element, such as a variable or a set of data, within it.

And because I always find examples fairly useful.

Click through for the example as well as additional explanation.

Comments closed

Power BI Helper Version 2.0 Released

Reza Rad has a new version of Power BI Helper:

I started Power BI Helper with the intention to help to find issues in Power BI reports faster and easier. This tool over time became better and better. I’m excited now to let you know that the version 2.0 of this product is now available for everyone to use and enjoy. This version comes with these features:

  • Connecting to more than one Power BI model. Selection option for the model.

  • Showing the connection mode of the Power BI file.

  • Showing list of tables that are NOT used in any visualization, and can be hidden from the report.

  • Modeling advises

    • List of both directional relationships
    • List of inactive relationships
  • Some minor bug fixes

It looks like quite the useful tool.

Comments closed

Replicating Extra-Long Strings

Monica Rathbun walks us through a replication error:

Ever seen the below error? Until this week I hadn’t. So, I figured I’d take a little time and introduce it to those that had not.

Error Description: Length of LOB data (65754) to be replicated exceeds configured maximum 65536. Use the stored procedure sp_configure to increase the configured maximum value for max text repl size option, which defaults to 65536. A configured value of -1 indicates no limit

We ran into an issue with a customer this week, this error was flooding the error log. After a little digging I found it had to do with transactional replication (also applies to Change Data Capture) they had setup which included LOB data.

Read on to see what you can do to resolve this error.  Also, check out the comments and be glad you’re not in that boat…unless you are, in which case…

Comments closed

DATENAME In SQL Server

Randolph West continues his dates and times series with a new function, DATENAME():

There are many similarities between DATEPART and DATENAME. Where DATEPART returns the date or time part as an integer, DATENAME returns the part as a character string.

This DATENAME function also takes two parameters: the date or time part we want back, and the input date. Just as we saw with DATEPART, the documentation indicates the input date parameter must be an “expression that can resolve to one of the following data types: datesmalldatetimedatetimedatetime2datetimeoffset, or time.”

Similarly, the date and time parts that can be returned look much like those in DATEPART, which gives us another opportunity for the reminder that we should avoid using the available abbreviations in order to help with writing clearly understandable code.

DATENAME is a useful function for displaying parts of dates & times, but Randolph does lay out the caveats.

Comments closed

Azure Without ARM

Ed Elliott gives us a few ways of deploying Azure resources without using ARM templates:

So, what are our options?

  • Create/Edit/Delete ourselves using Powershell/.Net/Python/Go/Java/Some Other SDK

  • Process something else (YAML?) into JSON

  • Generate the ARM using c#/Powershell/something else

  • 3rd party tools, (Terraform is the big daddy) / others include Sparkle Formation

To be honest, I’d probably just stick with ARM templates.

Comments closed

SQL Operations Studio, June Release

Alan Yu announces the June release of SQL Operations Studio:

The June public preview release is focused on improving our Extensibility experience with the release of new extensions as well as addressing top GitHub issues.

Highlights for this build include the following.

  • SQL Server Profiler for SQL Operations Studio Preview extension initial release

  • Azure SQL Data Warehouse extension

  • Edit Data Filtering and Sorting

  • SQL Server Agent for SQL Operations Studio Preview extension enhancements for Jobs and Job History views

  • Build your own SQL Ops Studio extension

  • Visual Studio Code Refresh

  • Fix GitHub Issues

I saw “SQL Server Profiler” and started wondering what was going on, until Alan explained that it’s actually the lightweight Extended Events profiler and not the heavyweight beast we know and love and/or hate.

Comments closed

Using DALEX To Explain Black-Box Models

Przemyslaw Biecek explains that there’s more than LIME for explaining black-box models:

I’ve heard about a number of consulting companies, that decided to use simple linear model instead of a black box model with higher performance, because ,,client wants to understand factors that drive the prediction’’.
And usually the discussion goes as following: ,,We have tried LIME for our black-box model, it is great, but it is not working in our case’’, ,,Have you tried other explainers?’’, ,,What other explainers’’?

So here you have a map of different visual explanations for black-box models.

Check out DALEX, which includes a Jupyter notebook example.  H/T R-Bloggers

Comments closed

Comparing Keras In Python Versus R

Dmitry Kisler performs image classification using Keras in both Python and R:

From the plots above, one can see that:

  • the accuracy of your model doesn’t depend on the language you use to build and train it (the plot shows only train accuracy, but the model doesn’t have high variance and the bias accuracy is around 99% as well).

  • even though 10 measurements may be not convincing, but Python would reduce (by up to 15%) the time required to train your CNN model. This is somewhat expected because R uses Python under the hood when executes Keras functions.

This is just one example, but the results are about what I’d expect.

Comments closed

Auto-Encoders And KernelML

Rohan Kotwani gives us an example where KernelML might be better than TensorFlow or PyTorch:

So what’s the point of using KernelML?

1. The parameters in each layer can be non-linear
2. Each parameter can be sampled from a different random distribution
3. The parameters can be transformed to meet certain constraints
4. Network combinations are defined in terms of numpy operations
5. Parameters are probabilistically updated
6. Each parameter update samples the loss function around a local or global minima

KerneML Specs

KernelMLis brute force optimizer that can be used to train machine learning algorithms. The package uses a combination of a machine learning and monte carlo simulations to optimize a parameter vector with a user defined loss function. Using kernelml creates a high computational cost for large complex networks because it samples the loss function using a subspace for each parameter in the parameter vector which requires many random simulations. The computational cost was reduced by enabling parallel computations with the ipyparallel. The decision to use this package was made because it effectively utilizes the cores on a machine.

It’s an interesting use case, though I would have liked to have seen a direct comparison to other frameworks.

Comments closed

Probabilistic Debugging

Adrian Colyer summarizes a fascinating academic paper:

This program has a bug. When given an already encoded input, it encodes it again (replacing % with ‘%25’). For example, the input https://example.com/t%20c?x=1 results in the output https://example.com/t%2520c?x=1, whereas in fact the output should be the same as the input in this case.

Let’s put our probabilistic thinking caps on and try and to debug the program. We ‘know’ that the url printed on line 19 is wrong, so we can assign low probability (0.05) to this value being correct. Likewise we ‘know’ that the input url on line 1 is correct, so we can assign high probability (0.95). (In probabilistic inference, it is standard not to use 0.0 or 1.0, but values close to them instead). Initially we’ll set the probability of every other program variable being set to 0.5, since we don’t know any better yet. If we can find a point in the program where the inputs are correct with relatively high probability, and the outputs are incorrect with relatively high probability, then that’s an interesting place!

Since url on line 19 has a low probability of being correct, this suggests that url on line 18, and purl_str at line 12 are also likely to be faulty. PI Debugger actually assigns these probabilities of being correct 0.0441 and 0.0832 respectively. Line 18 is a simple assignment statement, so if the chances of a bug here are fairly low. Now we trace the data flow. If purl_str at line 12 is likely to be faulty then s at line 16 is also likely to be faulty (probability 0.1176).

I’m interested to see someone create a practical implementation someday.

Comments closed