Press "Enter" to skip to content

Month: February 2022

Making a Scatter Plot in Excel

Mike Cisneros shows how to create a nice-looking scatter plot in Excel:

Scatter plots are excellent charts for showing a relationship between two numerical variables across a number of unique observations. We see them in business communications from time to time, although they’re much more commonly used in the “exploration” part of the process—when we’re still trying to understand our data and find the important insights. 

If you’re unfamiliar with scatter plots, their common use cases, or their benefits and drawbacks in a range of scenarios, check out the what is a scatter plot? article in our SWD Chart Guide. There, we explore some of the basics of scatter plots via an example, share tips for designing them more effectively, and discuss common variations (bubble charts, connected scatter plots, and more).

Read on for the process, which can be a lot more difficult than you may first expect.

Comments closed

Avoiding SQL Injection in Entity Framework Raw Queries

Josh Darnell has a public service announcement (and I would have had that lead-in even if he didn’t say it himself!):

These days, most .NET developers seem to write SQL using Entity Framework, and specifically using LINQ. This is very convenient for us developers, and I’m sure most DBAs love it too.

One of the nice things about LINQ is that it’s not really vulnerable to SQL injection. C# code gets turned into SQL queries, and variables get translated into parameters automatically.

Thus folks who have only worked with ORMs and LINQ may not have ever learned about the bad old days. Which is normally fine, until…

Click through for an example of what not to do.

Comments closed

Working with Views in PostgreSQL

Gauri Mahajan tries out different types of views in Azure Database for PostgreSQL:

A view can be considered as a dataset that has a pre-determined schema, the data is derived based on a specific criterion and is the source from one or more underlying tables. When a view is queried for data, it, in turn, queries the underlying tables and presents the intended dataset. At times, when the volume of data is very large, a view may start impacting query performance. In those cases, data from the underlying tables with the intended logic that makes up the view is physically stored in another table. This specific construct is called a materialized view. As the data from the source table gets updated, this materialized view needs to be refreshed from time to time to keep the data update in the materialized view, unlike a regular view. Azure’s offering of PostgreSQL database is Azure Database for PostgreSQL and it offers the same features as well.

Click through for more information on creating and working with views, materialized or not. Note that SQL Server’s indexed views are not the same as materialized views here.

Comments closed

Paginated Reports vs Interactive Power BI Reports

Paul Turley explains when to use each type of report:

Technology evolves, the industry changes and the way businesses use technology changes. The road that has brought us to the current state of reporting capabilities in the Microsoft data platform has been long and winding. Ten years ago, we were using SQL Server Reporting Services (SSRS) to create interactive, dashboard-like reports scorecard reports. It wasn’t easy and the report interactivity wasn’t exactly “snappy” as the report server had to re-query data and re-render the report every time a link or chart series was clicked. But, because SSRS was the best tool that we had back in the day, the techniques we used for advanced report development were arcane by today’s standards. Over the years, I’ve written a few articles and posts about how to create reports with interactive KPI gauges, charts and scorecards using SSRS. Today we have a better tool for that style of reporting and analytics. Reporting Services remains a powerful and flexible tool for a few different styles of reports. Now that SSRS has been integrated into the premium Power BI service as “Paginated Reports”, this raises questions about how and when to use each tool – and how to use them together.

Read on for more information, including a handy set of bullet points to help you make a decision.

Comments closed

Azure Data Explorer Dynamic Columns in Power Query

Chris Webb blows out the margins:

Now you can easily extract the individual property values from this column in KQL, and indeed I did so in the KQL queries in this post, but the interesting thing is you can also extract these values in Power Query M very easily and – crucially – maintain query folding using the Record.FieldOrDefault M function in a custom column without needing to write any KQL yourself, in both Import mode and DirectQuery mode.

Read on to see how.

Comments closed

SQL Server Scheduler Affinity

Bob Dorr re-posts an oldie but goodie:

SQL Server uses 3 types of affinity to control where the SQL Server worker threads execute.  Before explaining the different scheduler affinity types let me clarify some terminology.

Click through to see what these three types of affinity are, as well as some of the terms you’ll need to know in order to understand affinity types.

Comments closed

Sentiment Analysis with Python

Sanil Mhatre performs a bit of sentiment analysis:

Previous articles in this series have focused on platforms like Azure Cognitive Services and Oracle Text features to perform the core tasks of Natural Language Processing (NLP) and Sentiment Analysis. These easy-to-use platforms allow users to quickly analyze their text data with easy-to-use pre-built models. Potential drawbacks of this approach include lack of flexibility to customize models, data locality & security concerns, subscription fees, and service availability of the cloud platform. Programming languages like Python and R are convenient when you need to build your models for Natural Language Processing and keep your code as well as data contained within your data centers. This article explains how to do sentiment analysis using Python.

Python is a versatile and modern general-purpose programming language that is powerful, fast, and easy to learn. Python runs on interpreters, making it compatible with multiple platforms, and is widely used in applications for web platforms, graphical interfaces, data science, and machine learning. Python is increasingly gaining popularity in data analysis and is one of the most widely used languages for data science. You can learn more about Python from the official Python Software Foundation website.

Click through to see what’s available in the NLP world for Python. The short version is “a lot.”

Comments closed

Generating Python Documentation with Sphinx

Evan Seabrook generates some docs:

As you can see above, we have docstrings defined for properties, methods and the classes themselves. Ultimately, these docstrings will be used by Sphinx to generate the documentation. If you’re using a different docstring format, you can use a Sphinx extension called Napoleon to use your existing docstrings. Once your project has a level of docstring usage that you’re happy with, we can move on to the next step of configuring Sphinx.

And that’s the downside to this: you get auto-generated documentation, which means it’s only as good as your developers’ ability to explain the code.

Comments closed

Calculus and Python

Muhammad Asad Iqbal Khan performs derivatives in Python:

Derivatives are one of the most fundamental concepts in calculus. They describe how changes in the variable inputs affect the function outputs. The objective of this article is to provide a high-level introduction to calculating derivatives in PyTorch for those who are new to the framework. PyTorch offers a convenient way to calculate derivatives for user-defined functions.

Read on to see how you can use PyTorch to do this.

Comments closed

Run Spark within Azure ML Compute

James Nguyen makes an announcement:

Following the blog post on Turning AML compute into Ray and Dask , we’ve added a new exciting capability to run Spark within AML compute where Spark shares the same context with your ML code. The Spark version is 3.2.1 with support for Delta Lake and Synapse SQL read/write. This enables users of AML to perform powerful data transformation and even Spark ML within AML interactive notebook or in a job run. 

Traditionally, Azure ML integrates with Spark Synapse or external compute services via a pipeline step or better via magic command like %synapse, but the computing context is separate from your AML logic so you still need to run Spark in a separate step and persist the output to some storage and load it in your AML script.

With this approach, Spark is available right within your AML code whether it’s AML notebook, python script or pipeline step. It shares the common computing context and most of the cases you can just directly convert the Spark Dataframe to Pandas and Dask Dataframe without persisting first to an intermediary storage.

I’ll have to try this out to see if it makes up for their getting rid of the Spark-based curated environments last year.

Comments closed