What it does: Estimates a current node’s importance from its linked neighbors and then again from their neighbors. A node’s rank is derived from the number and quality of its transitive links to estimate influence. Although popularized by Google, it’s widely recognized as a way of detecting influential nodes in any network.
How it’s used: PageRank is used in quite a few ways to estimate importance and influence. It’s used to suggest Twitter accounts to follow and for general sentiment analysis.
PageRank is also used in machine learning to identify the most influential features for extraction. In biology, it’s been used to identify which species extinctions within a food web would lead to biggest chain reaction of species death.
If you are interested in getting into graph databases, it’s useful to know these algorithms.
Typical Amazon QuickSight workflow
When you create an analysis, the typical workflow is as follows:
Connect to a data source, and then create a new dataset or choose an existing dataset.
(Optional) If you created a new dataset, prepare the data (for example, by changing field names or data types).
Create a new analysis.
Add a visual to the analysis by choosing the fields to visualize. Choose a specific visual type, or use AutoGraph and let Amazon QuickSight choose the most appropriate visual type, based on the number and data types of the fields that you select.
(Optional) Modify the visual to meet your requirements (for example, by adding a filter or changing the visual type).
(Optional) Add more visuals to the analysis.
(Optional) Add scenes to the default story to provide a narrative about some aspect of the analysis data.
(Optional) Publish the analysis as a dashboard to share insights with other users.
It’s interesting to see how Amazon is trying to move this functionality from third-party tools (Power BI, Tableau, etc.) and notebooks right into the set of AWS offerings. Contrast this with the way that Microsoft is building in Jupyter with Azure Notebooks.
Here’s the link to the official documentation:
…and here are the definitions of ‘deprecated’ and ‘discontinued’:
A deprecated feature will be discontinued from the product in a future release, but is still supported and included in the current release to maintain backward compatibility. It’s recommended you discontinue using deprecated features in new and existing projects to maintain compatibility with future releases.
A discontinued feature was deprecated in an earlier release. It may continue to be included in the current release, but is no longer supported. Discontinued features may be removed entirely in a future release or update.
Read on for the lists of deprecated and discontinued features.
Tuesday 3rd of this Month I invited people in the SQL Server community to share which tools are essential to their daily work. I was really overwhelmed by the number of stories that the topic triggered. 22 in total took the time to write down and share which tools they use for their work chores.
Going through 22 posts and aggregating them has been taking more time than I had hoped for, since my trusted laptop broke down – blinking codes are well and alive I tell you!
Click through for the 22 submissions as well as Jens’s set of links to the tools people mentioned.
A common way to review categorical variable relationships is to create a cross tab, also known as a matrix, to evaluate the counts for each resulting combination.
For example, in my current data set, I can create a matrix to compare the number of players in two teams, say the Knights and the Sharks, by position and by handedness.
In descriptive analytics, I’m not trying to prove anything by looking at these values. I’m just reporting them. (Although I do find it interesting that there is a preponderance of lefties in these two teams.)
In the business world, I might do something similar by placing product categories on rows and customer geography (country or state) on columns.
Stacia also gives her explanation of descriptive analytics, so check that out too.
No installation, no maintenance
As with any PaaS solution, Azure Notebooks makes it far quicker and easier to get up and running, as there’s no download or installation required. Microsoft handles all the maintenance for you too!
I’m working on a fairly big project using Azure Notebooks. It’s very helpful getting 1GB of space, so I can include all of my data, images, etc. from a fairly large number of notebooks. The big downside is that the server running these notebooks is pretty slow—even for a fairly simple ARIMA model, I had it sitting there for 10 minutes at 100% CPU. So don’t expect to run a heavy workload against it.
Say you’ve got a table with millions or billions of rows, and you need to delete some rows. Deleting ALL of them is fast and easy – just do TRUNCATE TABLE – but things get much harder when you need to delete a small percentage of them, say 5%.
It’s especially painful if you need to do regular archiving jobs, like deleting the oldest 30 days of data from a table with 10 years of data in it.
The trick is making a view that contains the top, say, 1,000 rows that you want to delete:
Read on for a demo.
SQL Search is doing exactly what it’s designed to do here: it’s finding every object that matches that string, HumanResources. It’s unfortunate that it also happens to be the name of my linked server and a schema in my database, but such is life, right? Its returning every object it hits a match on, which includes a bunch of views, plus one of those views actually contains my linked server reference.
I love SQL Search
Let me be clear: I’m not here to gang up on the fine folks at Redgate. This tool is beautiful and I love it. Otherwise, how else could we quickly search for objects in our databases? The alterative would be either querying system views for object definitions or using cursors to call sp_helptext over and over, and then trying to do pattern matching. Same as what SQL Search does.
There’s no easy way to sort this wheat from chaff, is there? This might be a starting point; let’s narrow down the search the objects we might need to look at. Then, we’ll manually script each one out, one at a time. That sounds an awful lot like a manual process. “If only there was a way to automate this checking”, he asked, sarcastically.
Click through for a very interesting cmdlet.
Quick and easy backup for VisualSVN. Wraps the svnadmin.exe and performs a hotcopy of all repositories in the $source_path, dumping them to the $backup_path
Read on for the script.
The behavior of request-reply is consistent even if you were to create, say, three partitions of the request topic and set the concurrency of three in consumer factory. The replies from all three consumers still go to the single reply topic. The container at the listening end is able to do the heavy lifting of matching the correlation IDs.
Kafka’s real advantage still comes from distributed, asynchronous processing, but if you have a use case where you absolutely need synchronous processing, you can do that in Kafka as well.