Press "Enter" to skip to content

Month: October 2024

An Overview of k Nearest Neighbors

Harris Amjad explains a common algorithm for classification:

It so happens that given the hype of Machine Learning (ML) and especially Large Language Models these days, there is a considerable proportion of those who wish to understand how these systems work from scratch. Unfortunately, more often than not, the interest fades away quickly as learners jump to complicated algorithms like neural networks and transformers first, without giving heed to traditional ML algorithms that paved the foundation for these advanced algorithms in the first place. In this tip, we will introduce and implement the K-Nearest Neighbors model in Python. Although it is quite old, it remains very popular due to its simplicity and intuitiveness.

Click through to learn more about this algorithm, including an implementation from scratch in Python.

Leave a Comment

Reading Data from Azure Blob Storage in Snowflake

Arun Sirpal explains a common architectural pattern:

Let’s go back to data platforms today and I want to talk about a very common integration I see nowadays, Azure Blob Storage linked to Snowflake via a storage integration which then we can access semi structured files via external tables, it is a good combination of technology I have to say.

Click through for an architecture diagram and example of the code you’d need.

Leave a Comment

Constrained Kerberos Delegation with SSRS and Power BI Gateways

Rod Edwards doesn’t want just anyone to double-hop:

Ok, many of you will already be aware that in order to use Integrated Authentication successfully with SSRS particularly, that you have to configure Kerberos Authentication. At a very basic level, this allows the credentials of the user running the report, to be passed to the report server (hop 1) and then along to the target of the SSRS datasource (hop 2), also known as “Double hop” authentication. The delegation part of this signifies where the service (PBIG or SSRS) is allowed to pass these credentials along to.

  • anywhere, ie…Unconstrained delegation, or
  • to a restricted set of targets…Constrained delegation.

Read on to see how you can set up constrained delegation.

Leave a Comment

Continuous Integration vs Continuous Delivery

Bravin Wasike disambiguates two terms:

Continuous Integration (CI) and Continuous Delivery (CD) are fundamental to DevOps and agile methodologies. They ensure that software is developed, tested, and delivered quickly and efficiently. CI and CD are more than just trending buzzwords. They are crucial processes that help teams deliver quality software at a high velocity. Understanding CI and CD is essential for anyone involved in software development or operations.

In this two-part series, we will demystify continuous integration vs. continuous delivery. In this part 1 article, we will discuss the basic concepts involved in Continuous Integration and Continuous Delivery. The article will highlight primary objectives of CI/CD, their benefits, CI/CD tools examples, key differences between CI and CD and real-world examples of CI/CD.

Read the whole thing. The way I quickly summarize continuous integration, continuous delivery, and continuous deployment is as follows:

  • Continuous integration: you check in code and the build process automatically runs tests to make sure all is well.
  • Continuous delivery: you check in code and, after the continuous integration process automatically runs tests, the continuous delivery process automatically builds the project and makes it available for push-button deployment.
  • Continuous deployment: you check in code and, after the continuous integration process automatically runs tests, the continuous deployment process automatically builds the project and deploys it. No human needs to be involved after check-in if all goes well.
Leave a Comment

Using the Log Analytics Ingestion API with Sentinel

Abhishek Sharan explains how the Log Ingestion API works when feeding data into a Log Analytics workspace:

In this blog, I’m going to delve into ingesting and transforming application logs to log analytics workspace using Log Ingestion API technique. Before we jump into the details, let’s explore the different types of ingestion techniques based on the application log types.

  • If applications and services logs information to text files instead of standard logging services such as Windows Event log or Syslog, Custom Text Logs can be leveraged to ingest such text file logs in log analytics workspace.
  • If applications and services logs information to JSON files instead of standard logging services such as Windows Event log or Syslog, Custom JSON Logs can be leveraged to ingest such text file logs in log analytics workspace.
  • If an application understands how to send data to an API then you can leverage Log Ingestion API to send the data to Log Analytics Workspace.

Custom Text/JSON Logs are out of scope for this blog, I might write a separate blog dedicated to these techniques later. In this blog, my focus will be on streaming data to log analytics workspace using Log Ingestion API and transforming the data for optimal usage.

Note: This blog aims to demonstrate how to ingest logs using the log ingestion API. To keep things straightforward, I’ll refer to our public documentation.

Click through for details on the process.

Leave a Comment

Tablespaces in Oracle and PostgreSQL

Umair Shahid explains how tablespaces work in Oracle and PostgreSQL:

Tablespaces play an important role in database management systems, as they determine where and how database objects like tables and indexes are stored. Both Oracle and PostgreSQL have the concept of tablespaces, but they implement them differently based on the overall architecture of each database.

Oracle’s tablespaces are an integral part of the database that provide various functionalities, including separating data types, managing storage, and optimizing performance. PostgreSQL, on the other hand, takes a more simplified approach, using tablespaces primarily to control where physical files are stored.

This blog aims to provide a comprehensive comparison between Oracle and PostgreSQL tablespaces, covering their architecture, creation, and practical use cases, with the goal of helping DBAs better understand their capabilities and limitations

Read on to learn more about how tablespaces work in each platform and how they differ.

Leave a Comment

An Overview of Differential Privacy

Zachary Amos covers a topic of note:

Data analytics tools allow users to quickly and thoroughly analyze large quantities of material, accelerating important processes. However, individuals must ensure to maintain privacy while doing so, especially when working with personally identifiable information (PII). 

One possibility is to perform de-identification methods that remove pertinent details. However, evidence has suggested such options are not as effective as once believed. People may still be able to extract enough information from what remains to identify particular parties. 

Read on to learn a bit more about the impetus behind differential privacy and a few of the techniques you can use to get there. The real trick with differential privacy is adding the right kind of noise not to distort the distribution of the data, while still not allowing an end user to unearth enough information to identify a specific individual.

Leave a Comment

Splitting Data into Equally-Sized Groups in R

Steven Sanderson splits out some data:

As a beginner R programmer, you’ll often encounter situations where you need to divide your data into equal-sized groups. This process is crucial for various data analysis tasks, including cross-validation, creating balanced datasets, and performing group-wise operations. In this comprehensive guide, we’ll explore multiple methods to split data into equal-sized groups using different R packages and approaches.

Click through for methods and examples.

Leave a Comment

Spaces in Microsoft Fabric Delta Table Names

Sandeep Pawar is looking for a bit more space:

One of the annoying limitations of Direct Lake (rather of the SQL endpoint) was that you could not have spaces in table and column names in the delta table. It was supported in the delta table but the table was not query-able in the SQL endpoint which meant you had to rename all the tables and columns in the semantic model with business friendly names (e.g. rename customer_name to Customer Name). Tabular Editor and Semantic Link/Labs was helpful for that.

But at #FabConEurope, support for spaces in table names was announced and is supported in all Fabric engines. You have to use the backtick to include spaces, as show below.

Read on to learn more about how you can create these, what the limitations are, and then you can decide whether it’s worth it to have spaces in table names.

Leave a Comment