Black Friday Special Sale - Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: mxmas70

Home > Databricks > ML Data Scientist > Databricks-Machine-Learning-Professional

Databricks-Machine-Learning-Professional Databricks Certified Machine Learning Professional Question and Answers

Question # 4

A data scientist has created a Python functioncompute_featuresthat returns a Spark DataFrame with the following schema:

The resulting DataFrame is assigned to thefeatures_dfvariable. The data scientist wants to create a Feature Store table usingfeatures_df.

Which of the following code blocks can they use to create and populate the Feature Store table using the Feature Store Clientfs?

A.

B.

C.

features_df.write.mode("fs").path("new_table")

D.

E.

features_df.write.mode("feature").path("new_table")

Full Access
Question # 5

A machine learning engineering team has written predictions computed in a batch job to a Delta table for querying. However, the team has noticed that the querying is running slowly. The team has alreadytuned the size of the data files. Upon investigating, the team has concluded that the rows meeting the query condition are sparsely located throughout each of the data files.

Based on the scenario, which of the following optimization techniques could speed up the query by colocating similar records while considering values in multiple columns?

A.

Z-Ordering

B.

Bin-packing

C.

Write as a Parquet file

D.

Data skipping

E.

Tuning the file size

Full Access
Question # 6

In a continuous integration, continuous deployment (CI/CD) process for machine learning pipelines, which of the following events commonly triggers the execution of automated testing?

A.

The launch of a new cost-efficient SQL endpoint

B.

CI/CD pipelines are not needed for machine learning pipelines

C.

The arrival of a new feature table in the Feature Store

D.

The launch of a new cost-efficient job cluster

E.

The arrival of a new model version in the MLflow Model Registry

Full Access
Question # 7

A machine learning engineer is attempting to create a webhook that will trigger a Databricks Jobjob_idwhen a model version for modelmodeltransitions into any MLflow Model Registry stage.

They have the following incomplete code block:

Which of the following lines of code can be used to fill in the blank so that the code block accomplishes the task?

A.

"MODEL_VERSION_CREATED"

B.

"MODEL_VERSION_TRANSITIONED_TO_PRODUCTION"

C.

"MODEL_VERSION_TRANSITIONED_TO_STAGING"

D.

"MODEL_VERSION_TRANSITIONED_STAGE"

E.

"MODEL_VERSION_TRANSITIONED_TO_STAGING", "MODEL_VERSION_TRANSITIONED_TO_PRODUCTION"

Full Access
Question # 8

Which of the following deployment paradigms can centrally compute predictions for a single record with exceedingly fast results?

A.

Streaming

B.

Batch

C.

Edge/on-device

D.

None of these strategies will accomplish the task.

E.

Real-time

Full Access
Question # 9

After a data scientist noticed that a column was missing from a production feature set stored as a Delta table, the machine learning engineering team has been tasked with determining when the column was dropped from the feature set.

Which of the following SQL commands can be used to accomplish this task?

A.

VERSION

B.

DESCRIBE

C.

HISTORY

D.

DESCRIBE HISTORY

E.

TIMESTAMP

Full Access
Question # 10

Which of the following is a simple, low-cost method of monitoring numeric feature drift?

A.

Jensen-Shannon test

B.

Summary statistics trends

C.

Chi-squared test

D.

None of these can be used to monitor feature drift

E.

Kolmogorov-Smirnov (KS) test

Full Access
Question # 11

A data scientist wants to remove the star_rating column from the Delta table at the location path. To do this, they need to load in data and drop the star_rating column.

Which of the following code blocks accomplishes this task?

A.

spark.read.format(“delta”).load(path).drop(“star_rating”)

B.

spark.read.format(“delta”).table(path).drop(“star_rating”)

C.

Delta tables cannot be modified

D.

spark.read.table(path).drop(“star_rating”)

E.

spark.sql(“SELECT * EXCEPT star_rating FROM path”)

Full Access
Question # 12

A machine learning engineer wants to log feature importance data from a CSV file at path importance_path with an MLflow run for model model.

Which of the following code blocks will accomplish this task inside of an existing MLflow run block?

A.

B.

C.

mlflow.log_data(importance_path, "feature-importance.csv")

D.

mlflow.log_artifact(importance_path, "feature-importance.csv")

E.

None of these code blocks tan accomplish the task.

Full Access
Question # 13

A data scientist is using MLflow to track their machine learning experiment. As a part of each MLflow run, they are performing hyperparameter tuning. The data scientist would like to have one parent run for the tuning process with a child run for each unique combination of hyperparameter values.

They are using the following code block:

The code block is not nesting the runs in MLflow as they expected.

Which of the following changes does the data scientist need to make to the above code block so that it successfully nests the child runs under the parent run in MLflow?

A.

Indent the child run blocks within the parent run block

B.

Add the nested=True argument to the parent run

C.

Remove the nested=True argument from the child runs

D.

Provide the same name to the run name parameter for all three run blocks

E.

Add the nested=True argument to the parent run and remove the nested=True arguments from the child runs

Full Access
Question # 14

A data scientist has computed updated feature values for all primary key values stored in the Feature Store table features. In addition, feature values for some new primary key values have also been computed. The updated feature values are stored in the DataFrame features_df. They want to replace all data in features with the newly computed data.

Which of the following code blocks can they use to perform this task using the Feature Store Client fs?

A)

B)

C)

D)

E)

A.

Option A

B.

Option B

C.

Option C

D.

Option D

E.

Option E

Full Access
Question # 15

A machine learning engineer is in the process of implementing a concept drift monitoring solution. They are planning to use the following steps:

1. Deploy a model to production and compute predicted values

2. Obtain the observed (actual) label values

3. _____

4. Run a statistical test to determine if there are changes over time

Which of the following should be completed as Step #3?

A.

Obtain the observed values (actual) feature values

B.

Measure the latency of the prediction time

C.

Retrain the model

D.

None of these should be completed as Step #3

E.

Compute the evaluation metric using the observed and predicted values

Full Access
Question # 16

A machine learning engineer has deployed a model recommender using MLflow Model Serving. They now want to query the version of that model that is in the Production stage of the MLflow Model Registry.

Which of the following model URIs can be used to query the described model version?

A.

https:// /model-serving/recommender/Production/invocations

B.

The version number of the model version in Production is necessary to complete this task.

C.

https:// /model/recommender/stage-production/invocations

D.

https:// /model-serving/recommender/stage-production/invocations

E.

https:// /model/recommender/Production/invocations

Full Access
Question # 17

Which of the following tools can assist in real-time deployments by packaging software with its own application, tools, and libraries?

A.

Cloud-based compute

B.

None of these tools

C.

REST APIs

D.

Containers

E.

Autoscaling clusters

Full Access
Question # 18

A machine learning engineer wants to programmatically create a new Databricks Job whose schedule depends on the result of some automated tests in a machine learning pipeline.

Which of the following Databricks tools can be used to programmatically create the Job?

A.

MLflow APIs

B.

AutoML APIs

C.

MLflow Client

D.

Jobs cannot be created programmatically

E.

Databricks REST APIs

Full Access