RF is based on decision trees. In machine learning decision trees are a technique for creating predictive models. They are called decision trees because the prediction follows several branches of “if… then…” decision splits – similar to the branches of a tree. If we imagine that we start with a sample, which we want to predict a class for, we would start at the bottom of a tree and travel up the trunk until we come to the first split-off branch. This split can be thought of as a feature in machine learning, let’s say it would be “age”; we would now make a decision about which branch to follow: “if our sample has an age bigger than 30, continue along the left branch, else continue along the right branch”. This we would do until we come to the next branch and repeat the same decision process until there are no more branches before us. This endpoint is called a leaf and in decision trees would represent the final result: a predicted class or value.
At each branch, the feature thresholds that best split the (remaining) samples locally is found. The most common metrics for defining the “best split” are gini impurity and information gain for classification tasks and variance reduction for regression.
Click through for more info and if you understand German, the video is good as well.