Monday, July 15, 2019

The Big Five Tech Companies and their Big Five Acquisitions


Python for Data science Cheat sheet

Python -Pandas cheat sheet
Machine Learning Emoji
Python basics for Data Science
Deep Learning Keras Cheat sheet

Recommender Systems

 What is a Recommender System :

     Over Internet, the number of choices is overwhelming, there is need to filter, prioritize and efficiently deliver relevant information in order to mitigate the problem of information overload, which has created a potential problem to many Internet users. Recommender systems solve this problem by searching through large volume of dynamically generated information to provide users with personalized content and services.

    A recommender system is a technology that is deployed in the environment where items (products, movies, events, articles)are to be recommended to users (customers, visitors, app users, readers) or the opposite.In real world scenario there  are many items and many users present in the environment making the problem hard and expensive to solve. Let's take for example a  shop with Good merchant knows personal preferences of her/his customers. Her/His high quality recommendations make customers satisfied and increase profits. In case of online marketing and shopping, personal recommendations can be generated by an artificial merchant: "the recommender system".
Taxonomy-of-knowledge-sources-in-recommendation.png

Recommender systems are increasingly used for personalised navigation through large amounts of information, especially in the e-commerce domain for product purchase advice. 



We could see the various types of Recommender system in my future blog.

Bias-Variance trade-off in Machine Learning

The bias-variance tradeoff is an important aspect of data science projects based on machine learning.Learning algorithm use mathematical or statistical models whose “error” can be split up into two main components reducible and irreducible error. Irreducible error is associated with a natural variability in a system. On the other hand, reducible error, as the name suggests, can be and should be minimized further to maximize accuracy.
This Reducible error mainly occurs due to squared bias or due to variance. Our goal would be to simultaneously reduce bias and variance as much as possible in order to obtain an accurate model .
Let’s look at what is Bias and Variance.
Bias:
In simple term,we are creating a model using supervised machine learning that predicts terribly(during training itself)far from the reality and they don’t change much from dataset to dataset.It is related to underfitting
Example: A linear regression model would have high bias when trying to model a non-linear relationship. It is because linear regression model does not fit non-linear relationship well.
Variance:
Usually, we divide our dataset into three parts:
  1. Training set: is used for training
  2. Test Set: for testing
  3. Validation set: for validation
If we are building complex models that fits well on training data but they cannot generalise the pattern well which results to overfitting. It means they don’t fit well on data outside training (i.e. validation / test datasets). In simple terms, it means they might predict close to reality on average, but they tend to change much more with small changes in the input.
Bias and Variance
Under-fitting and Over-fitting model’s

What exactly is Bias-Variance trade-off then:

So, if we choose a very complicated algorithm, we run in to a risk of high variance problem while if we use a simple one, we will face high bias problem. It’s a double-edged sword. The total error in any supervised machine learning prediction is the sum of the bias term in your model, variance and irreducible error. The irreducible error is also known as Baye’s error as it’s mostly noise which can’t be reduced by algorithms but by better data cleaning.
Total error = Bias Term + Variance + Irreducible error
If we plot these values against model complexity, we shall see that at a certain optimum model complexity, we will have the minimum total error.
Bias- Variance Trade-Off

Avoiding overfitting/underfitting:

We should always monitor validation accuracy along with training accuracy. If training accuracy is high but validation accuracy is low, it indicates overfitting. In this case, we should:
  • Get more training data
  • Increase regularization
If training accuracy itself is low, then we need to think about:
  • Decrease Regularization
  • Have we done sufficient training?
  • Should we train a more powerful network?

Conclusion

I’ve have given a brief overview on the bias variance trade off and have avoided mathematical calculations involved in it. Link for the article with detailed mathematical calculations related to Bias and variance has been provided below.

My Logo