Unsupervised Learning

Segmentation and Clustering: Segmentation is the business problem we wants to solve with the data in hand. For example we have employee data and HR of the company wants to segment them into different buckets for taking some business actions based on some rules or as a market researcher you wants to segment customers for … Continue reading Unsupervised Learning


Kafka Interview Questions

How do we know if consumer has consumed a kafka message or not? Kafka does not keep track of messages consumed. The consumer has to keep the track of the same. In case of failure of consumer, message needs to be reprocessed by consumer it self. you can control from where in the Kafka topic … Continue reading Kafka Interview Questions

Data Compression formats in Hadoop

Weather we should compress our data while storing on HDFS or not is a important question. It creates a significant impact on the performance while processing data. Compressing data is advantageous in saving space as well as network transfer throughout the cluster. Codec is the term used for compressor / decompressor. It is an implementation … Continue reading Data Compression formats in Hadoop

Machine Learning Evaluation

Following are the Model evaluation metrics. Regression Metrics: Mean absolute error: Problem: Not differentiable (required for gradient descent) Mean squared error R2 error 1 : Perfect Model 0 : No learning Classification Metrics: Confusion Matrix: (First Letter indicates Actual and Second Predicted class. e.g. TP means Actual True and we predicted True)   Accuracy: Total … Continue reading Machine Learning Evaluation