Friday, July 26, 2019
Sunday, July 21, 2019
Wednesday, July 17, 2019
Monday, July 15, 2019
Recommender Systems
What is a Recommender System :
Over Internet, the number of choices is overwhelming, there is need to filter, prioritize and efficiently deliver relevant information in order to mitigate the problem of information overload, which has created a potential problem to many Internet users. Recommender systems solve this problem by searching through large volume of dynamically generated information to provide users with personalized content and services.
A recommender system is a technology that is deployed in the environment where items (products, movies, events, articles)are to be recommended to users (customers, visitors, app users, readers) or the opposite.In real world scenario there are many items and many users present in the environment making the problem hard and expensive to solve. Let's take for example a shop with Good merchant knows personal preferences of her/his customers. Her/His high quality recommendations make customers satisfied and increase profits. In case of online marketing and shopping, personal recommendations can be generated by an artificial merchant: "the recommender system".
Taxonomy-of-knowledge-sources-in-recommendation.png |
Recommender systems are increasingly used for personalised navigation through large amounts of information, especially in the e-commerce domain for product purchase advice.
We could see the various types of Recommender system in my future blog.
Bias-Variance trade-off in Machine Learning
The bias-variance tradeoff is an important aspect of data science projects based on machine learning.Learning algorithm use mathematical or statistical models whose “error” can be split up into two main components reducible and irreducible error. Irreducible error is associated with a natural variability in a system. On the other hand, reducible error, as the name suggests, can be and should be minimized further to maximize accuracy.
This Reducible error mainly occurs due to squared bias or due to variance. Our goal would be to simultaneously reduce bias and variance as much as possible in order to obtain an accurate model .
Let’s look at what is Bias and Variance.
Bias:
In simple term,we are creating a model using supervised machine learning that predicts terribly(during training itself)far from the reality and they don’t change much from dataset to dataset.It is related to underfitting
Example: A linear regression model would have high bias when trying to model a non-linear relationship. It is because linear regression model does not fit non-linear relationship well.
Variance:
Usually, we divide our dataset into three parts:
- Training set: is used for training
- Test Set: for testing
- Validation set: for validation
If we are building complex models that fits well on training data but they cannot generalise the pattern well which results to overfitting. It means they don’t fit well on data outside training (i.e. validation / test datasets). In simple terms, it means they might predict close to reality on average, but they tend to change much more with small changes in the input.
Bias and Variance |
Under-fitting and Over-fitting model’s |
What exactly is Bias-Variance trade-off then:
So, if we choose a very complicated algorithm, we run in to a risk of high variance problem while if we use a simple one, we will face high bias problem. It’s a double-edged sword. The total error in any supervised machine learning prediction is the sum of the bias term in your model, variance and irreducible error. The irreducible error is also known as Baye’s error as it’s mostly noise which can’t be reduced by algorithms but by better data cleaning.
Total error = Bias Term + Variance + Irreducible error
If we plot these values against model complexity, we shall see that at a certain optimum model complexity, we will have the minimum total error.
Bias- Variance Trade-Off |
Avoiding overfitting/underfitting:
We should always monitor validation accuracy along with training accuracy. If training accuracy is high but validation accuracy is low, it indicates overfitting. In this case, we should:
- Get more training data
- Increase regularization
If training accuracy itself is low, then we need to think about:
- Decrease Regularization
- Have we done sufficient training?
- Should we train a more powerful network?
Conclusion
I’ve have given a brief overview on the bias variance trade off and have avoided mathematical calculations involved in it. Link for the article with detailed mathematical calculations related to Bias and variance has been provided below.
Subscribe to:
Posts (Atom)
-
As we know, Spark runs on Master-Slave Architecture. Let’s see the step by step process 1.First step the moment we submit a Spa...
-
TF-IDF: TF-IDF stands for Term frequency and inverse document frequency and is one of the most popular and effective Natural Language ...