2020 Full-time Interview CCJ’s Preparation (4): Machine Learning and Computer Vision Questions

see: 2020 Full-time Interview CCJ’s Preparation (1): Commonly Asked C++ Interview Questions, at this link.

see: 2020 Full-time Interview CCJ’s Preparation (2): Commonly Asked C++ Interview Questions in a Table, at this link.

see

1. Explain the difference between supervised and unsupervised machine learning?

In supervised machine learning algorithms, we have to provide labelled data, for example, prediction of stock market prices, whereas in unsupervised we need not have labelled data, for example, classification of emails into spam and non-spam.

2. Explain the difference between KNN and k-means clustering?

KNN is a supervised machine learning algorithm where we need to provide the labelled data to the model it then classifies the points based on the distance of the point from the nearest points.
Whereas, on the other hand, K-Means clustering is an unsupervised machine learning algorithm thus we need to provide the model with unlabelled data and this algorithm classifies points into clusters based on the mean of the distances between different points

3. What is the difference between classification and regression?

Classification is used to produce discrete results, classification is used to classify data into some specific categories .for example classifying e-mails into spam and non-spam categories.
Whereas, We use regression analysis when we are dealing with continuous data, for example predicting stock prices at a certain point of time.

4. How to ensure that your model is not overfitting?

5. List the main advantage of Naive Bayes?

A Naive Bayes classifier converges very quickly as compared to other models like logistic regression. As a result, we need less training data in case of naive Bayes classifier.

6. Explain Ensemble learning.

In ensemble learning, many base models like classifiers and regressors are generated and combined together so that they give better results. It is used when we build component classifiers that are accurate and independent. There are sequential as well as parallel ensemble methods.

Ensemble learning helps improve machine learning results by combining several models. This approach allows the production of better predictive performance compared to a single model. Basic idea is to learn a set of classifiers (experts) and to allow them to vote.

Alt text

Alt text

7. Types of Ensemble Methods:

Ensemble Methods: Predict class label for unseen data by aggregating a set of predictions (classifiers learned from the training data)

Types of Ensemble Methods: bagging, random forests and boosting.

1) Bagging

Alt text

Alt text

2) Random Forests

How to achieve randomness?

Alt text

Alt text

Training and Information Gain

Alt text

Ensemble model:

Alt text

Advantages of Random Forests:

Alt text

3) Boosting: train a strong classifier by combining weak classifiers

Alt text

Alt text

Alt text

Alt text

8. Stacking in Machine Learning

Stacking is a way to ensemble multiple classifications or regression model. There are many ways to ensemble models, the widely known models are Bagging or Boosting. Bagging allows multiple similar models with high variance are averaged to decrease variance. Boosting builds multiple incremental models to decrease the bias, while keeping variance small.

Stacking (sometimes called Stacked Generalization) is a different paradigm. The point of stacking is to explore a space of different models for the same problem. The idea is that you can attack a learning problem with different types of models which are capable to learn some part of the problem, but not the whole space of the problem. So, you can build multiple different learners and you use them to build an intermediate prediction, one prediction for each learned model. Then you add a new model which learns from the intermediate predictions the same target.

This final model is said to be stacked on the top of the others, hence the name. Thus, you might improve your overall performance, and often you end up with a model which is better than any individual intermediate model. Notice however, that it does not give you any guarantee, as is often the case with any machine learning technique.

Alt text

Alt text

9. Explain dimension reduction in machine learning.

What is the “Curse of Dimensionality?”

The difficulty of searching through a solution space becomes much harder as you have more features (dimensions).

Consider the analogy of looking for a penny in a line vs. a field vs. a building. The more dimensions you have, the higher volume of data you’ll need.

Dimension Reduction

Dimension Reduction is the process of reducing the size of the feature matrix (M x D, each row means a data sample, i.e., ). We try to reduce the number of columns (i.e., ) so that we get a better feature set either by combining columns or by removing extra variables.

PCA is a method for transforming features in a dataset by combining them into uncorrelated linear combinations. These new features, or principal components, sequentially maximize the variance represented (i.e. the first principal component has the most variance, the second principal component has the second most, and so on). As a result, PCA is useful for dimensionality reduction because you can set an arbitrary variance cutoff.

10. What should you do when your model is suffering from low bias and high variance?

When the model’s predicted value is very close to the actual value the condition is known as low bias. In this condition, we can use bagging algorithms like random forest regressor.

11. Explain differences between random forest and gradient boosting algorithm.

12. Comparison Bagging and Boosting

Explain the Bias-Variance Tradeoff.

Predictive models have a tradeoff between bias (how well the model fits the data) and variance (how much the model changes based on changes in the inputs).

Simpler models are stable (low variance) but they don’t get close to the truth (high bias).

More complex models are more prone to being overfit (high variance) but they are expressive enough to get close to the truth (low bias).

The best model for a given problem usually lies somewhere in the middle.

Bagging VS Boosting

Bagging and Boosting are two types of Ensemble Learning. These two decrease the variance of single estimate as they combine several estimates from different models. So the result may be a model with higher stability.

If the difficulty of the single model is over-fitting, then Bagging is the best option. If the problem is that the single model gets a very low performance, Boosting could generate a combined model with lower errors as it optimizes the advantages and reduces pitfalls of the single model.

Similarities Between Bagging and Boosting –

How can you choose a classifier based on training set size?

If training set is small, high bias / low variance models (e.g. Naive Bayes) tend to perform better because they are less likely to be overfit.

If training set is large, low bias / high variance models (e.g. Logistic Regression) tend to perform better because they can reflect more complex relationships.

13. GBDT: Gradient Boosted Decision Tree

14. Regularization: L1 and L2

see: this video at https://www.coursera.org/lecture/deep-neural-network/why-regularization-reduces-overfitting-T6OJj

see: https://towardsdatascience.com/intuitions-on-l1-and-l2-regularisation-235f2db4c261

Alt text

1) Model

Let’s define a model to see how L1 and L2 work. For simplicity, we define a simple linear regression model ŷ with one independent variable.

Alt text

Here I have used the deep learning conventions w (‘weight’) and b (‘bias’).

2) Loss Functions

To demonstrate the effect of L1 and L2 regularisation, let’s fit our linear regression model using 3 different loss functions/objectives:

Our objective is to minimize these different losses.

2.1) Loss function with no regularisation

We define the loss function L as the squared error, where error is the difference between y (the true value) and ŷ (the predicted value).

Alt text

Let’s assume our model will be overfitted using this loss function.

2.2) Loss function with L1 regularisation

Based on the above loss function, adding an L1 regularisation term to it looks like this:

Alt text

where the regularization parameter is manually tuned. Let’s call this loss function L1. Note that is differentiable everywhere except when , as shown below. We will need this later.

Alt text

2.3) Loss function with L2 regularisation

Similarly, adding an L2 regularisation term to L looks like this:

Alt text

where again, .

3) Gradient Descent

Now, let’s solve the linear regression model using gradient descent optimization based on the 3 loss functions defined above. Recall that updating the parameter in gradient descent is as follows:

Alt text

Let’s substitute the last term in the above equation with the gradient of L, L1 and L2 w.r.t. w.

Alt text

Alt text

Alt text

4) How is overfitting prevented?

From here onwards, let’s perform the following substitutions on the equations above (for better readability):

Alt text

Alt text

Alt text

4.1) With vs. Without Regularization

Observe the differences between the weight updates with the regularization parameter and without it. Here are some intuitions.

Intuition A:

Intuition B:

Intuition C:

Intuition D:
Edden Gerber (thanks!) has provided an intuition about the direction toward which our solution is being shifted. Have a look in the comments: https://medium.com/@edden.gerber/thanks-for-the-article-1003ad7478b2

4.2) L1 vs. L2

We shall now focus our attention to L1 and L2, and rewrite Equations {1.1, 1.2 and 2} by rearranging their and terms as follows:

Alt text

Alt text

Compare the second term of each of the equation above. Apart from , the change in depends on the term or the term, which highlight the influence of the following:

While weight updates using L1 are influenced by the first point, weight updates from L2 are influenced by all the three points. While I have made this comparison just based on the iterative equation update, please note that this does not mean that one is ‘better’ than the other.

For now, let’s see below how a regularization effect from L1 can be attained just by the sign of the current w.

4.3) L1’s effect on pushing towards 0 (sparsity)

Take a look at L1 in Equation 3.1. If is positive, the regularization parameter will push to be less positive, by subtracting from . Conversely in Equation 3.2, if is negative, will be added to w, pushing it to be less negative. Hence, this has the effect of pushing towards 0.

This is of course pointless in a 1-variable linear regression model, but will prove its prowess to ‘remove’ useless variables in multivariate regression models. You can also think of L1 as reducing the number of features in the model altogether. Here is an arbitrary example of L1 trying to ‘push’ some variables in a multivariate linear regression model:

Alt text

So how does pushing towards help in overfitting in L1 regularization? As mentioned above, as goes to , we are reducing the number of features by reducing the variable importance. In the equation above, we see that , and are almost ‘useless’ because of their small coefficients, hence we can remove them from the equation. This in turn reduces the model complexity, making our model simpler. A simpler model can reduce the chances of overfitting.

4.4) More words

see: https://medium.com/@edden.gerber/thanks-for-the-article-1003ad7478b2

I’d like to suggest that the last part — how regularization reduces overfitting — does not give a satisfying enough answer. The intuitions presented are based on the idea of taking an overfitted solution and moving “away from it” (getting a “less than perfect” solution, one that is affected by factors independent of the dataset, etc.). But that would also apply to moving away from the overfitted solution by naively adding a constant to all weights, which would of course not be helpful. In other words, regularization does make our solution less “perfect” but this in itself is not why it helps.

Instead, I suggest that we need to think about the direction toward which our solution is being shifted. Specifically, not just away from the overfitted solution but also toward the axis origin. This means that:

15. Linear Regression

16 Traditional ML:

Mean and Variance Etc

Alt text

Alt text

Alt text

Alt text

Bayes Rule

Alt text

What Is ‘naive’ in the Naive Bayes Classifier?

Alt text

Classification VS Regression

Classification is used when your target is categorical, while regression is used when your target variable is continuous. Both classification and regression belong to the category of supervised machine learning algorithms.

Briefly Explain Logistic Regression.

Logistic regression is a classification algorithm used to predict a binary outcome for a given set of independent variables.

The output of logistic regression is either a 0 or 1 with a threshold value of generally 0.5. Any value above 0.5 is considered as 1, and any point below 0.5 is considered as 0.

Logistic Regression VS Linear Regression:

SVM

怎么理解Dropout?

Batch normalization?

im2col convolution

Alt text

18. Model Evaluation

What is the ROC Curve and what is AUC (a.k.a. AUROC)?

The ROC (receiver operating characteristic) - the performance plot for binary classifiers of True Positive Rate (y-axis) vs. False Positive Rate (x-axis).

AUC is area under the ROC curve, and it’s a common performance metric for evaluating binary classification models.

It’s equivalent to the expected probability that a uniformly drawn random positive is ranked before a uniformly drawn random negative.

See the following figure:

Alt text

Why is Area Under ROC Curve (AUROC) better than raw accuracy as an out-of-sample evaluation metric?

Explain the Confusion Matrix

Alt text

How would you handle an imbalanced dataset?

An imbalanced dataset is when you have, for example, a classification test and 90% of the data is in one class. That leads to problems: an accuracy of 90% can be skewed if you have no predictive power on the other category of data! Here are a few tactics to get over the hump:

What’s important here is that you have a keen sense for what damage an unbalanced dataset can cause, and how to balance that.

How Do You Handle Missing or Corrupted Data in a Dataset?

20. Generative vs Discriminative models

Alt text

Alt text

Alt text

21 MRF VS CRF

Alt text

22 Possible Questions and Answers:

Write the equation of NCC:

Given left , and , (x,y) in left image, and (x-d, y) is in the right image;

Do 3D convolution or filtering:

import numpy as np
def image_filter(video, kernel):
D, H, W = video.shape
N,_,_ = kernel.shape
n_h = (N-1)/2
size = N*N*N
y = np.zero((H,W,D))
for d in range(0, D):
for h in range(0, H):
for w in range(0, W):
x = video[d-n_h:d+n_d, h- n_h:h+n_h, w-n_h : w+ n_h]
r = sum(x * kernel)/ size
y[d, h,w] = r
return y
using namespace std;
using namespace cv;
Mat image_filter(Mat & video, Mat & kernel){
int D = video.size[0];
int H = video.size[1];
int W = video.size[2];
int N = kernel.size[0];
int n_h = (N-1)/2;
int size_inv = 1.0/(N*N*N);
Mat y = cv::Mat((D,H,W));
for (int d = 0; d < D; ++d){
for (int h = 0; h < H; ++h){
for (int w = 0; w <W; ++w){
float r = 0;
for (int i = -n_h; i < n_h; i++){
r += video[d+i][h+i][w+i] * kernel[i+n_h][i+n_h][i+n_h];
}
y[d][h][w] = r *size_inv;
}
}
}
return y;
}

Actually we can use im2col for efficient convolution via matrix multiplication.

23. Top 10 Behavioral Interview Questions and Answers:

see https://www.thebalancecareers.com/top-behavioral-interview-questions-2059618

1) Tell me about how you worked effectively under pressure.

What They Want to Know: If you’re being considered for a high-stress job, the interviewer will want to know how well you can work under pressure. Give a real example of how you’ve dealt with pressure when you respond.

I had been working on a key project that was scheduled for delivery to the client in 60 days. My supervisor came to me and said that we needed to speed it up and be ready in 45 days, while keeping our other projects on time. I made it into a challenge for my staff, and we effectively added just a few hours to each of our schedules and got the job done in 42 days by sharing the workload. Of course, I had a great group of people to work with, but I think that my effective allocation of tasks was a major component that contributed to the success of the project.

2) How do you handle a challenge? Give an example.

What They Want to Know: Regardless of your job, things may go wrong and it won’t always be business as usual. With this type of question, the hiring manager wants to know how you will react in a difficult situation. Focus on how you resolved a challenging situation when you respond. Consider sharing a step-by-step outline of what you did and why it worked.

One time, my supervisor needed to leave town unexpectedly, and we were in the middle of complicated negotiations with a new sponsor. I was tasked with putting together a PowerPoint presentation just from the notes he had left, and some briefing from his manager. My presentation was successful. We got the sponsorship, and the management team even recommended me for an award.

3) Have you ever made a mistake? How did you handle it?

What They Want to Know: Nobody is perfect, and we all make mistakes. The interviewer is more interested in how you handled it when you made an error, rather than in the fact that it happened.

I once misquoted the fees for a particular type of membership to the club where I worked. I explained my mistake to my supervisor, who appreciated my coming to him, and my honesty. He told me to offer to waive the application fee for the new member. The member joined the club despite my mistake, my supervisor was understanding, and although I felt bad that I had made a mistake, I learned to pay close attention to the details so as to be sure to give accurate information in the future.

4) Give an example of how you set goals.

What They Want to Know: With this question, the interviewer wants to know how well you plan and set goals for what you want to accomplish. The easiest way to respond is to share examples of successful goal setting.

Within a few weeks of beginning my first job as a sales associate in a department store, I knew that I wanted to be in the fashion industry. I decided that I would work my way up to department manager, and at that point I would have enough money saved to be able to attend design school full-time. I did just that, and I even landed my first job through an internship I completed the summer before graduation.

5) Give an example of a goal you reached and tell me how you achieved it.

What They Want to Know: The hiring manager is interested in learning what you do to achieve your goals, and the steps you take to accomplish them.

When I started working for XYZ Company, I wanted to achieve the Employee of the Month title. It was a motivational challenge, and not all the employees took it that seriously, but I really wanted that parking spot, and my picture on the wall. I went out of my way to be helpful to my colleagues, supervisors, and customers - which I would have done anyway. I liked the job and the people I worked with. The third month I was there, I got the honor. It was good to achieve my goal, and I actually ended up moving into a managerial position there pretty quickly, I think because of my positive attitude and perseverance.

What They Want to Know: Sometimes, management has to make difficult decisions, and not all employees are happy when a new policy is put in place. If you’re interviewing for a decision-making role, the interviewer will want to know your process for implementing change.

Once, I inherited a group of employees when their supervisor relocated to another city. They had been allowed to cover each other’s shifts without management approval. I didn’t like the inconsistencies, where certain people were being given more opportunities than others. I introduced a policy where I had my assistant approve all staffing changes, to make sure that everyone who wanted extra hours and was available at certain times could be utilized.

7) Give an example of how you worked on a team.

What They Want to Know: Many jobs require working as part of a team. In interviews for those roles, the hiring manager will want to know how well you work with others and cooperate with other team members.

During my last semester in college, I worked as part of a research team in the History department. The professor leading the project was writing a book on the development of language in Europe in the Middle Ages. We were each assigned different sectors to focus on, and I suggested that we meet independently before our weekly meeting with the professor to discuss our progress, and help each other out if we were having any difficulties. The professor really appreciated the way we worked together, and it helped to streamline his research as well. He was ready to start on his final copy months ahead of schedule because of the work we helped him with.

8) What do you do if you disagree with someone at work?

What They Want to Know: With this question, the interviewer is seeking insight into how you handle issues at work. Focus on how you’ve solved a problem or compromised when there was a workplace disagreement.

A few years ago, I had a supervisor who wanted me to find ways to outsource most of the work we were doing in my department. I felt that my department was one where having the staff on the premises had a huge impact on our effectiveness and ability to relate to our clients. I presented a strong case to her, and she came up with a compromise plan.

Tips for Responding: How to answer interview questions about problems at work.

9) Share an example of how you were able to motivate employees or co-workers.

What They Want to Know: Do you have strong motivational skills? What strategies do you use to motivate your team? The hiring manager is looking for a concrete example of your ability to motivate others.

I was in a situation once where the management of our department was taken over by employees with experience in a totally different industry, in an effort to maximize profits over service. Many of my co-workers were resistant to the sweeping changes that were being made, but I immediately recognized some of the benefits, and was able to motivate my colleagues to give the new process a chance to succeed.

More Answers: What strategies would you use to motivate your team?

10) Have you handled a difficult situation? How?

What They Want to Know: Can you handle difficult situations at work, or do you not deal with them well? The employer will want to know what you do when there’s a problem.

When I worked at ABC Global, it came to my attention that one of my employees had become addicted to painkillers prescribed after she had surgery. Her performance was being negatively impacted, and she needed to get some help. I spoke with her privately, and I helped her to arrange a weekend treatment program that was covered by her insurance. Fortunately, she was able to get her life back on track, and she received a promotion about six months later.

11) Others

24. SGM

Alt text

24. What are the advantages of ReLU over sigmoid function in deep neural networks?

Question:

Alt text

Answer:

Two additional major benefits of ReLUs are

But first recall the definition of a ReLU is where .

One major benefit is the reduced likelihood of the gradient to vanish.

Some other advantages:

Disadvantage:

Just complementing the other answers: