SigmaWay Blog

SigmaWay Blog tries to aggregate original and third party content for the site users. It caters to articles on Process Improvement, Lean Six Sigma, Analytics, Market Intelligence, Training ,IT Services and industries which SigmaWay caters to

Random forests: a collection of Decision trees!

In literal sense, a forest is an area full of trees. Likewise, in technical sense, a Random Forest is essentially a collection of Decision Trees. Although both are classification algorithms which are supervised in nature, which one is better to use?

A Decision Tree is built on an entire data set, using all the features/variables while a Random forest randomly (as the name suggests) selects observations/rows and specific features/variables to build several decision trees and then average the results. Each tree “votes” or chooses the  class and the one receiving the most votes by majority is the “winner” or the predicted class.

A Decision tree is comparatively easier to interpret and visualize, works well on large datasets and can handle categorical as well as numerical data. However, choosing a comfortable algorithm for optimal choice at each node and decision trees are also vulnerable to over fitting.

Random Forests come to our rescue in such situations. Since they select samples and the results are aggregated and averaged, they are more robust than decision trees. Random Forests are a strong modelling technique than Decision Trees.

Read more at: https://www.analyticsvidhya.com/blog/2020/05/decision-tree-vs-random-forest-algorithm/

  4234 Hits

Random Forest: An Alternative to Linear Regression

Random forest is an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the class's output by individual trees. It is called random because there are two levels of randomness; at row level and at the column level. In spite of it being such a convenient process to deal with large datasets it has a few disadvantages. In case of smaller datasets linear regression is a better method than this. Next is that any relationship between the response and independent variables can't be predicted. Also, this process is very cumbersome and can't take values from outside the datasets. Even then, random forest is advantageous because keeping the bias constant it can decrease the variance in the datasets and it helps us ignore most of the assumptions like linearity in datasets. Read more at: http://www.datasciencecentral.com/profiles/blogs/random-forests-explained-intuitively

 

  3515 Hits