New Year Special Sale - Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: mxmas70

Home > CertNexus > CertNexus Certification > AIP-210

AIP-210 CertNexus Certified Artificial Intelligence Practitioner (CAIP) Question and Answers

Question # 4

Which of the following methods can be used to rebalance a dataset using the rebalance design pattern?

A.

Bagging

B.

Boosting

C.

Stacking

D.

Weighted class

Full Access
Question # 5

Which two techniques are used to build personas in the ML development lifecycle? (Select two.)

A.

Population estimates

B.

Population regression

C.

Population resampling

D.

Population triage

E.

Population variance

Full Access
Question # 6

What is Word2vec?

A.

A bag of words.

B.

A matrix of how frequently words appear in a group of documents.

C.

A word embedding method that builds a one-hot encoded matrix from samples and the terms that appear in them.

D.

A word embedding method that finds characteristics of words in a very large number of documents.

Full Access
Question # 7

Which of the following text vectorization methods is appropriate and correctly defined for an English-to-Spanish translation machine?

A.

Using TF-IDF because in translation machines, we do not care about the order of the words.

B.

Using TF-IDF because in translation machines, we need to consider the order of the words.

C.

Using Word2vec because in translation machines, we do not care about the order of the words.

D.

Using Word2vec because in translation machines, we need to consider the order of the words.

Full Access
Question # 8

An organization sells house security cameras and has asked their data scientists to implement a model to detect human feces, as distinguished from animals, so they can alert th customers only when a human gets close to their house.

Which of the following algorithms is an appropriate option with a correct reason?

A.

A decision tree algorithm, because the problem is a classification problem with a small number of features.

B.

k-means, because this is a clustering problem with a small number of features.

C.

Logistic regression, because this is a classification problem and our data is linearly separable.

D.

Neural network model, because this is a classification problem with a large number of features.

Full Access
Question # 9

A company is developing a merchandise sales application The product team uses training data to teach the AI model predicting sales, and discovers emergent bias. What caused the biased results?

A.

The AI model was trained in winter and applied in summer.

B.

The application was migrated from on-premise to a public cloud.

C.

The team set flawed expectations when training the model.

D.

The training data used was inaccurate.

Full Access
Question # 10

You have a dataset with thousands of features, all of which are categorical. Using these features as predictors, you are tasked with creating a prediction model to accurately predict the value of a continuous dependent variable. Which of the following would be appropriate algorithms to use? (Select two.)

A.

K-means

B.

K-nearest neighbors

C.

Lasso regression

D.

Logistic regression

E.

Ridge regression

Full Access
Question # 11

Given a feature set with rows that contain missing continuous values, and assuming the data is normally distributed, what is the best way to fill in these missing features?

A.

Delete entire rows that contain any missing features.

B.

Fill in missing features with random values for that feature in the training set.

C.

Fill in missing features with the average of observed values for that feature in the entire dataset.

D.

Delete entire columns that contain any missing features.

Full Access
Question # 12

In which of the following scenarios is lasso regression preferable over ridge regression?

A.

The number of features is much larger than the sample size.

B.

There are many features with no association with the dependent variable.

C.

There is high collinearity among some of the features associated with the dependent variable.

D.

The sample size is much larger than the number of features.

Full Access
Question # 13

Which two of the following decrease technical debt in ML systems? (Select two.)

A.

Boundary erosion

B.

Design anti-patterns

C.

Documentation readability

D.

Model complexity

E.

Refactoring

Full Access
Question # 14

Which of the following principles supports building an ML system with a Privacy by Design methodology?

A.

Avoiding mechanisms to explain and justify automated decisions.

B.

Collecting and processing the largest amount of data possible.

C.

Understanding, documenting, and displaying data lineage.

D.

Utilizing quasi-identifiers and non-unique identifiers, alone or in combination.

Full Access
Question # 15

Which of the following scenarios is an example of entanglement in ML pipelines?

A.

Add a new method for drift detection in the model evaluation step.

B.

Add a new pipeline for retraining the model in the model training step.

C.

Change in normalization function in the feature engineering step.

D.

Change the way output is visualized in the monitoring step.

Full Access
Question # 16

A big data architect needs to be cautious about personally identifiable information (PII) that may be captured with their new IoT system. What is the final stage of the Data Management Life Cycle, which the architect must complete in order to implement data privacy and security appropriately?

A.

De-Duplicate

B.

Destroy

C.

Detain

D.

Duplicate

Full Access
Question # 17

Which of the following is NOT an activation function?

A.

Additive

B.

Hyperbolic tangent

C.

ReLU

D.

Sigmoid

Full Access
Question # 18

Which of the following is TRUE about SVM models?

A.

They can be used only for classification.

B.

They can be used only for regression.

C.

They can take the feature space into higher dimensions to solve the problem.

D.

They use the sigmoid function to classify the data points.

Full Access
Question # 19

When should the model be retrained in the ML pipeline?

A.

A new monitoring component is added.

B.

Concept drift is detected in the pipeline.

C.

More data become available for the training phase.

D.

Some outliers are detected in live data.

Full Access
Question # 20

In a self-driving car company, ML engineers want to develop a model for dynamic pathing. Which of following approaches would be optimal for this task?

A.

Dijkstra Algorithm

B.

Reinforcement learning

C.

Supervised Learning.

D.

Unsupervised Learning

Full Access
Question # 21

A dataset can contain a range of values that depict a certain characteristic, such as grades on tests in a class during the semester. A specific student has so far received the following grades: 76,81, 78, 87, 75, and 72. There is one final test in the semester. What minimum grade would the student need to achieve on the last test to get an 80% average?

A.

82

B.

89

C.

91

D.

94

Full Access
Question # 22

When should you use semi-supervised learning? (Select two.)

A.

A small set of labeled data is available but not representative of the entire distribution.

B.

A small set of labeled data is biased toward one class.

C.

Labeling data is challenging and expensive.

D.

There is a large amount of labeled data to be used for predictions.

E.

There is a large amount of unlabeled data to be used for predictions.

Full Access
Question # 23

Word Embedding describes a task in natural language processing (NLP) where:

A.

Words are converted into numerical vectors.

B.

Words are featurized by taking a histogram of letter counts.

C.

Words are featurized by taking a matrix of bigram counts.

D.

Words are grouped together into clusters and then represented by word cluster membership.

Full Access
Question # 24

In general, models that perform their tasks:

A.

Less accurately are less robust against adversarial attacks.

B.

Less accurately are neither more nor less robust against adversarial attacks.

C.

More accurately are less robust against adversarial attacks.

D.

More accurately are neither more nor less robust against adversarial attacks.

Full Access
Question # 25

A data scientist is tasked to extract business intelligence from primary data captured from the public. Which of the following is the most important aspect that the scientist cannot forget to include?

A.

Cyberprotection

B.

Cybersecurity

C.

Data privacy

D.

Data security

Full Access
Question # 26

Which of the following is a type 1 error in statistical hypothesis testing?

A.

The null hypothesis is false, but fails to be rejected.

B.

The null hypothesis is false and is rejected.

C.

The null hypothesis is true and fails to be rejected.

D.

The null hypothesis is true, but is rejected.

Full Access