Monday, July 15, 2019

The Big Five Tech Companies and their Big Five Acquisitions

Python for Data science Cheat sheet

Python -Pandas cheat sheet

Machine Learning Emoji

Python basics for Data Science

Deep Learning Keras Cheat sheet

Recommender Systems

What is a Recommender System :

Over Internet, the number of choices is overwhelming, there is need to filter, prioritize and efficiently deliver relevant information in order to mitigate the problem of information overload, which has created a potential problem to many Internet users. Recommender systems solve this problem by searching through large volume of dynamically generated information to provide users with personalized content and services.

A recommender system is a technology that is deployed in the environment where items (products, movies, events, articles)are to be recommended to users (customers, visitors, app users, readers) or the opposite.In real world scenario there are many items and many users present in the environment making the problem hard and expensive to solve. Let's take for example a shop with Good merchant knows personal preferences of her/his customers. Her/His high quality recommendations make customers satisfied and increase profits. In case of online marketing and shopping, personal recommendations can be generated by an artificial merchant: "the recommender system".

Taxonomy-of-knowledge-sources-in-recommendation.png

Recommender systems are increasingly used for personalised navigation through large amounts of information, especially in the e-commerce domain for product purchase advice.

We could see the various types of Recommender system in my future blog.

Bias-Variance trade-off in Machine Learning

The bias-variance tradeoff is an important aspect of data science projects based on machine learning.Learning algorithm use mathematical or statistical models whose “error” can be split up into two main components reducible and irreducible error. Irreducible error is associated with a natural variability in a system. On the other hand, reducible error, as the name suggests, can be and should be minimized further to maximize accuracy.

This Reducible error mainly occurs due to squared bias or due to variance. Our goal would be to simultaneously reduce bias and variance as much as possible in order to obtain an accurate model .

Let’s look at what is Bias and Variance.

Bias:

In simple term,we are creating a model using supervised machine learning that predicts terribly(during training itself)far from the reality and they don’t change much from dataset to dataset.It is related to underfitting

Example: A linear regression model would have high bias when trying to model a non-linear relationship. It is because linear regression model does not fit non-linear relationship well.

Variance:

Usually, we divide our dataset into three parts:

Training set: is used for training
Test Set: for testing
Validation set: for validation

If we are building complex models that fits well on training data but they cannot generalise the pattern well which results to overfitting. It means they don’t fit well on data outside training (i.e. validation / test datasets). In simple terms, it means they might predict close to reality on average, but they tend to change much more with small changes in the input.

Bias and Variance

Under-fitting and Over-fitting model’s

What exactly is Bias-Variance trade-off then:

So, if we choose a very complicated algorithm, we run in to a risk of high variance problem while if we use a simple one, we will face high bias problem. It’s a double-edged sword. The total error in any supervised machine learning prediction is the sum of the bias term in your model, variance and irreducible error. The irreducible error is also known as Baye’s error as it’s mostly noise which can’t be reduced by algorithms but by better data cleaning.

Total error = Bias Term + Variance + Irreducible error

If we plot these values against model complexity, we shall see that at a certain optimum model complexity, we will have the minimum total error.

Bias- Variance Trade-Off

Avoiding overfitting/underfitting:

We should always monitor validation accuracy along with training accuracy. If training accuracy is high but validation accuracy is low, it indicates overfitting. In this case, we should:

Get more training data
Increase regularization

If training accuracy itself is low, then we need to think about:

Decrease Regularization
Have we done sufficient training?
Should we train a more powerful network?

Conclusion

I’ve have given a brief overview on the bias variance trade off and have avoided mathematical calculations involved in it. Link for the article with detailed mathematical calculations related to Bias and variance has been provided below.

https://www.ics.uci.edu/~smyth/courses/cs274/readings/xing_singh_CMU_bias_variance.pdf

Ramprabhu's AI Den

Friday, July 26, 2019

Classification Model Evaluation sheet

Traffic control using AI brain

Facial recognition system in a Chinese school.

Sunday, July 21, 2019

Free Python tools

Wednesday, July 17, 2019

Data Science Vs Data Analytics Vs BIGDATA