Incube-8 – The Business Incubator
http://incube-8.org
The Business Incubator - Where business goes to growSun, 02 Jun 2019 09:13:09 +0000enhourly1https://wordpress.org/?v=5.0.2http://incube-8.org/wp-content/uploads/2018/09/cropped-BizIncub-32x32.jpgIncube-8 – The Business Incubator
http://incube-8.org
3232NAS
http://incube-8.org/nas/
http://incube-8.org/nas/#respondMon, 26 Nov 2018 11:04:54 +0000http://incube-8.org/?p=1330NAS 8TB
https://81.101.46.13:37187/

NAS 5TB
http://81.101.46.13:46062/

Pi External Address

]]>http://incube-8.org/nas/feed/0Optics
http://incube-8.org/optics/
http://incube-8.org/optics/#respondTue, 16 Oct 2018 20:53:33 +0000http://incube-8.org/?p=1126http://incube-8.org/optics/feed/0Smart Home
http://incube-8.org/smart-home/
http://incube-8.org/smart-home/#respondSat, 13 Oct 2018 11:08:00 +0000http://incube-8.org/?p=1073Smart Home Landlord
Temperature in each room?
When was the door last opened?
Movement sensor corridor?

Thermastat

Is there an electricity or gas api?

]]>http://incube-8.org/smart-home/feed/0Tutorials
http://incube-8.org/tutorials/
http://incube-8.org/tutorials/#respondMon, 08 Oct 2018 20:59:14 +0000http://incube-8.org/?p=876

List of the Best Free Online Tutorials

We have focused on collecting the tutorials. while all of the free contentonline.

Sharing Knowledge is the greatest gift start giving by sharing our link.

Khan Academy created in 2005 by Salman Khan offers great set of online tools that help educate students. The courses are based around short lessons in the form of YouTube videos. Its website also includes supplementary practice exercises and materials for educators. Bill Gates

"I'm a made up testimonial by someone that you have never met and may not exist but I endorse everything in this website it taught me everything I know!"

A Einstein

Physics dabler

"When I'm not endorsing strippers then I am endorsing this website! It's the place to get your learn on"

Nicky Minaj

Diva

"Hey you guys... I'm really excited about this site!"

Sloth

Designer

"After using this site for 2 days, all I can say is, this is simply awesome, and above standards of ALL other websites... delete all you other accounts, throw away all your belongings this website is all you need!"

Sam Dawson

Full Retard

Simple time management

Click edit button to change this text. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

the power of scaling

Click edit button to change this text. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

coding out of the box

Click edit button to change this text. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

]]>http://incube-8.org/elementor-344/feed/0Boosting the accuracy of your Machine Learning models
http://incube-8.org/boosting-the-accuracy-of-your-machine-learning-models/
http://incube-8.org/boosting-the-accuracy-of-your-machine-learning-models/#respondWed, 03 Oct 2018 12:51:21 +0000http://incube-8.org/?p=237Boosting the accuracy of your Machine Learning models

Tired of getting low accuracy on your machine learning models? Boosting is here to help. Boosting is a popular machine learning algorithm that increases accuracy of your model, something like when racers use nitrous boost to increase the speed of their car.

Boosting uses a base machine learning algorithm to fit the data. This can be any algorithm, but Decision Tree is most widely used. For an answer to why so, just keep reading. Also, the boosting algorithm is easily explained using Decision Trees, and this will be focus of this article. It builds upon approaches other than boosting, that improve accuracy of Decision Trees. For an introduction to tree based methods, read my other article here.

Bootstrapping

I would like to start by explaining an important foundation technique called Bootstrapping. Assume that we need to learn a decision tree to predict the price of a house based on 100 inputs. Prediction accuracy of such a decision tree would be low, given the problem of variance it suffers from. This means that if we split the training data into two parts at random, and fit a decision tree to both halves, the results that we may get could be quite different. What we really want is a result that has low variance if applied repeatedly to distinct data sets.

We can improve the prediction accuracy of Decision Trees using Bootstrapping

Create many (e.g. 100) random sub-samples of our dataset with replacement (meaning we can select the same value multiple times).

Learn(train) a decision tree on each sample.

Given new dataset, Calculate the prediction for each sub-sample.

Calculate the average of all of our collected predictions(also called bootstrap estimates) and use that as our estimated prediction for the data.

The procedure can be used in similar way for classification trees. For example, if we had 5 decision trees that made the following class predictions for an input sample: blue, blue, red, blue and red, we would take the most frequent class and predict blue.

In this approach, trees are grown deep and are not pruned. Thus each individual tree has high variance, but low bias. Averaging these trees reduces the variance dramatically.

Bootstrapping is a powerful statistical method for estimating a quantity from a data sample. Quantity can be a descriptive statistic such as a mean or a standard deviation. The application of the Bootstrapping procedure to a high-variance machine learning algorithm, typically decision trees as shown in the above example, is known as Bagging(or bootstrap aggregating).

How we use bootstrapping:

Config sampling

Error Estimation

An easy way of estimating the test error of a bagged model, without the need for cross-validation is Out-of-Bag Error Estimation. The observations not used to fit a given bagged tree are referred to as the out-of-bag (OOB) observations. We can simply predict the response for the ith observation using each of the trees in which that observation was OOB. We average those predicted responses, or take a majority vote, depending on if the response is quantitative or qualitative. An overall OOB MSE(mean squared error) or classification error rate can be computed. This is an acceptable test error rate because the predictions are based on only the trees that were not fit using that observation.

Random Forests

Decision trees aspire to minimize the cost, which means they make use of strongest predictors/classifiers for splitting the branches. So, most of the trees made from bootstrapped samples would use the same strong predictor in different splits. This relates the trees and leads to variance.

We can improve the prediction accuracy of Bagged Trees using Random Forests

While splitting branches of any tree, a random sampled of m predictors is chosen as split candidates from the full set of p predictors. The split is then allowed to only use one of those m predictors. A fresh sample of m predictors is taken at each split. You can try different values and tune it using cross validation.

For classification a good default is: m = sqrt(p)

For regression a good default is: m = p/3

Thus, on average, (p — m) / p of the splits will not even consider the strong predictor. This is known as decorrelating the trees, as we fix the issue of each tree using same strong predictor.

If m = p then random forests is equal to bagging.

Feature Importance

One problem with computing fully grown trees is that we cannot easily interpret the results. And it is no longer clear which variables are important to the relationship. Calculating drop in the error function for a variable at each split point gives us an idea of feature importance. It means that we record the total amount that the error is decreased due to splits over a given predictor, averaged over all bagged trees. A large value then indicates an important predictor. In regression problems this may be the drop in residual sum of squares and in classification this might be the Gini score.

Boosting

The prediction accuracy of decision trees can be further improved by using Boosting algorithms.

The basic idea behind boosting is converting many weak learners to form a single strong learner. What do we mean by weak learners?

Weak learner is a learner that will always do better than chance, when it tries to label the data, no matter what the distribution over the training data is. Doing better than chance means we are always going to have an error rate which is less than 1/2. This means that the learner algorithm is always going to learn something, and will not always be completely accurate i.e., it is weak and poor when it comes to learning the relationships between inputs and target. It also means a rule formed using a single predictor/classifier is not powerful individually.

We start finding weak learners in the dataset by making some distributions and forming small decision trees from them. The size of the tree is tuned using number of splits it has. Often 1 works well, where each tree consists of a single split. Such trees are known as Decision Stumps.

Another parameter boosting takes is the number of iterations or number of trees in this case. Additionally, it assigns weights to the inputs based on whether they were correctly predicted/classified or not. Lets look at the algorithm.

First, the inputs are initialized with equal weights. It uses the first base learning algorithm to do this, which is generally a decision stump. This means, in first stage, it will be a weak learner, that will fit a subsample of the data and make predictions for all the data.

Now we do the following till maximum number of trees is reached :

Update the weights of inputs based on previous run, and weights are higher for wrongly predicted/classified inputs

Make another rule(decision stump in this case) and fit it to a subsample of data. Note that this time rule will be formed by keeping the wrongly classified inputs(ones having higher weight) in mind.

Finally we predict/ classify all inputs using this rule.

3. After the iterations have been completed, we combine weak rules to form a single strong rule, which will then be used as our model.

The above algorithm is better explained with help of a diagram. Lets assume we have 10 input observations that we want to classify as “+” or “-”.

The boosting algorithm will start with box 1 as shown above. It assigns equal weights(denoted by size of the signs) to all inputs and predicts “+” for inputs in blue region and “-” for inputs in the reddish region, using decision stump D1.

In next iteration, Box 2, you can see weights of wrongly classified plus signs are greater than other inputs. So a decision stump D2 is chosen such that, these observations are now classified correctly.

In the final iteration, Box 3, it has 3 misclassified negatives from the previous run. So a decision stump D3 is chosen to correct that.

Finally, the output strong learner or Box 3, has a strong rule that is made by combining individual weak decision stumps. You can see how we boosted the classification power of our model.

In regression setting, the prediction error(usually calculated using least squares) is used to adjust weights of inputs, and consequent learners focus more on inputs with large error.

This type of boosting approach is known as Adaptive Boosting or AdaBoost. As with trees, boosting approach also minimizes a loss function. In case of Adaboost, it is the exponential loss function.

Another popular version of boosting is Gradient Boosting Algorithm. The basic concept remains the same, except here we don’t play with the weights, but fit the model on residuals(measurement of the difference in prediction and original outcome) rather than original outcomes. Adaboost is implemented using iteratively refined sample weights while Gradient Boosting uses an internal regression model trained iteratively on the residuals. This means that the new weak learners are formed keeping in mind the inputs that have high residuals.

In both algorithms, a tuning parameterlambda or shrinkageslows the processes even further by allowing more and different shaped trees to attack the residuals. This is also known as learning rate as it controls the magnitude by which each tree contributes to the model. As you can see, Boosting also does not involve bootstrapping, instead each tree is fit on a modified version of the original data. Instead of fitting a single large decision tree, which results in hard fitting the data, and potentially overfitting. The boosting approach learns slowly.

As you can see that the algorithm is explained clearly using Decision Trees, but there are other reasons that it’s mostly used with trees.

Decision trees are non-linear. Boosting with linear models simply doesn’t work well.

The weak learner needs to be consistently better than random guessing. You don’t normally need to do any parameter tuning to a decision tree to get that behavior. Training an SVM, for instance, really does need a parameter search. Since the data is re-weighted on each iteration, you likely need to do another parameter search on each iteration. So you are increasing the amount of work you have to do by a large margin.

Decision trees are reasonably fast to train. Since we are going to be building 100’s or 1000’s of them, that’s a good property. They are also fast to classify, which is again important when you need 100’s or 1000’s to run before you can output your decision.

By changing the depth you have a simple and easy control over the bias/variance trade off, knowing that boosting can reduce bias but also significantly reduces variance.

This is an extremely simplified (probably naive) explanation of boosting, but will help you understand the very basics. A popular library for implementing this algorithm is Scikit-Learn. It has a wonderful api that can get your model up an running with just a few lines of code in python.

]]>http://incube-8.org/boosting-the-accuracy-of-your-machine-learning-models/feed/0Decision Tree Analysis & Machine Learning
http://incube-8.org/decision-tree-analysis-machine-learning/
http://incube-8.org/decision-tree-analysis-machine-learning/#respondWed, 03 Oct 2018 09:23:13 +0000http://incube-8.org/?p=228Please see full article at https://towardsdatascience.com/decision-trees-in-machine-learning-641b9c4e8052

Decision Trees in Machine Learning

Decision Tree Analysis covers a wide area of machine learning, with different variants using different methods such as classification and regression. In decision making, decision tree analysis can be used to visually and explicitly represent decision forks and decision making. When used in the correct way it provides an invaluable tool. In IS/IT it is an commonly used tool for data mining and sets out different paths for achieving a goal or folk out a strategy to reach a particular goal, its also widely used in machine learning, we will be looking at how this can be adapted to our projects in this article.

How can an algorithm be represented as a tree?

For this let’s consider a very basic example that uses titanic data set for predicting whether a passenger will survive or not. Below model uses 3 features/attributes/columns from the data set, namely sex, age and sibsp (number of spouses or children along).

A decision tree is drawn upside down with its root at the top. In the image on the left, the bold text in black represents a condition/internal node, based on which the tree splits into branches/ edges. The end of the branch that doesn’t split anymore is the decision/leaf, in this case, whether the passenger died or survived, represented as red and green text respectively.

Although, a real dataset will have a lot more features and this will just be a branch in a much bigger tree, but you can’t ignore the simplicity of this algorithm. The feature importance is clear and relations can be viewed easily. This methodology is more commonly known as learning decision tree from data and above tree is called Classification tree as the target is to classify passenger as survived or died. Regression trees are represented in the same manner, just they predict continuous values like price of a house. In general, Decision Tree algorithms are referred to as CART or Classification and Regression Trees.

So, what is actually going on in the background? Growing a tree involves deciding on which features to choose and what conditions to use for splitting, along with knowing when to stop. As a tree generally grows arbitrarily, you will need to trim it down for it to look beautiful. Lets start with a common technique used for splitting.

Recursive Binary Splitting

In this procedure all the features are considered and different split points are tried and tested using a cost function. The split with the best cost (or lowest cost) is selected.

Consider the earlier example of tree learned from titanic dataset. In the first split or the root, all attributes/features are considered and the training data is divided into groups based on this split. We have 3 features, so will have 3 candidate splits. Now we will calculate how much accuracy each split will cost us, using a function. The split that costs least is chosen, which in our example is sex of the passenger. This algorithm is recursive in nature as the groups formed can be sub-divided using same strategy. Due to this procedure, this algorithm is also known as the greedy algorithm, as we have an excessive desire of lowering the cost. This makes the root node as best predictor/classifier.

Cost of a split

Lets take a closer look at cost functions used for classification and regression. In both cases the cost functions try to find most homogeneous branches, or branches having groups with similar responses. This makes sense we can be more sure that a test data input will follow a certain path.

Regression : sum(y — prediction)²

Lets say, we are predicting the price of houses. Now the decision tree will start splitting by considering each feature in training data. The mean of responses of the training data inputs of particular group is considered as prediction for that group. The above function is applied to all data points and cost is calculated for all candidate splits. Again the split with lowest cost is chosen. Another cost function involves reduction of standard deviation, more about it can be found here.

Classification : G = sum(pk * (1 — pk))

A Gini score gives an idea of how good a split is by how mixed the response classes are in the groups created by the split. Here, pk is proportion of same class inputs present in a particular group. A perfect class purity occurs when a group contains all inputs from the same class, in which case pk is either 1 or 0 and G = 0, where as a node having a 50–50 split of classes in a group has the worst purity, so for a binary classification it will have pk = 0.5 and G = 0.5.

When to stop splitting?

You might ask when to stop growing a tree? As a problem usually has a large set of features, it results in large number of split, which in turn gives a huge tree. Such trees are complex and can lead to overfitting. So, we need to know when to stop? One way of doing this is to set a minimum number of training inputs to use on each leaf. For example we can use a minimum of 10 passengers to reach a decision(died or survived), and ignore any leaf that takes less than 10 passengers. Another way is to set maximum depth of your model. Maximum depth refers to the the length of the longest path from a root to a leaf.

Pruning

The performance of a tree can be further increased by pruning. It involvesremoving the branches that make use of features having low importance. This way, we reduce the complexity of tree, and thus increasing its predictive power by reducing overfitting.

Pruning can start at either root or the leaves. The simplest method of pruning starts at leaves and removes each node with most popular class in that leaf, this change is kept if it doesn’t deteriorate accuracy. Its also called reduced error pruning. More sophisticated pruning methods can be used such as cost complexity pruning where a learning parameter (alpha) is used to weigh whether nodes can be removed based on the size of the sub-tree. This is also known as weakest link pruning.

Advantages of CART

Simple to understand, interpret, visualize.

Decision trees implicitly perform variable screening or feature selection.

Can handle both numerical and categorical data. Can also handle multi-output problems.

Decision trees require relatively little effort from users for data preparation.

Nonlinear relationships between parameters do not affect tree performance.

Disadvantages of CART

Decision-tree learners can create over-complex trees that do not generalize the data well. This is called overfitting.

Decision trees can be unstable because small variations in the data might result in a completely different tree being generated. This is called variance, which needs to be lowered by methods likebagging and boosting.

Greedy algorithms cannot guarantee to return the globally optimal decision tree. This can be mitigated by training multiple trees, where the features and samples are randomly sampled with replacement.

Decision tree learners createbiased trees if some classes dominate. It is therefore recommended to balance the data set prior to fitting with the decision tree.

This is all the basic, to get you at par with decision tree learning. An improvement over decision tree learning is made using technique ofboosting. A popular library for implementing these algorithms is Scikit-Learn. It has a wonderful api that can get your model up an running with just a few lines of code in python.