Databricks-Machine-Learning-Professional Databricks Certified Machine Learning Professional sample Question + Exam 2025 Practice Exam Dumps

Question # 4

A data scientist has created a Python functioncompute_featuresthat returns a Spark DataFrame with the following schema:

The resulting DataFrame is assigned to thefeatures_dfvariable. The data scientist wants to create a Feature Store table usingfeatures_df.

Which of the following code blocks can they use to create and populate the Feature Store table using the Feature Store Clientfs?

features_df.write.mode("fs").path("new_table")

features_df.write.mode("feature").path("new_table")

Full Access

Answer:

Explanation:

The Feature Store Client fs provides a method called create_table that can be used to create and populate a Feature Store table from a Spark DataFrame. The create_table method takes the following parameters:

name: The name of the feature table to create. It must include the database name, such as â€˜recommender_system.new_tableâ€™.
primary_keys: The name or list of names of the columns that uniquely identify each row in the feature table. For example, â€˜customer_idâ€™ or [â€˜customer_idâ€™, â€˜dateâ€™].
df: The Spark DataFrame that contains the data to populate the feature table. It must have the same schema as the feature table, and include the primary key columns.
description: An optional string that describes the feature table.
partition_columns: An optional list of column names to partition the feature table by. For example, [â€˜dateâ€™, â€˜regionâ€™].
online_store: An optional boolean flag that indicates whether to publish the feature table to an online store. The default value is False.
online_store_options: An optional dictionary that specifies the configuration options for the online store, such as table name, database name, and write mode. For example, {â€˜table_nameâ€™: â€˜new_table_onlineâ€™, â€˜database_nameâ€™: â€˜recommender_systemâ€™, â€˜write_modeâ€™: â€˜overwriteâ€™}.

The create_table method creates a Delta table with the specified name, primary keys, partition columns, and description, and writes the data from the DataFrame to the table. If online_store is True, it also publishes the feature table to an online store with the specified options.Â The create_table method returns a FeatureTable object that represents the feature table12

The other code blocks are incorrect because:

B. The create_table method requires the df parameter to populate the feature table with data.
C. The write method of the DataFrame does not have a mode called â€˜fsâ€™ or a path parameter. To write a DataFrame to a Delta table, the mode should be one of â€˜appendâ€™, â€˜overwriteâ€™, â€˜ignoreâ€™, or â€˜errorâ€™, and the format should be â€˜deltaâ€™. To write a DataFrame to a Feature Store table, the create_table or write_table method of the Feature Store Client should be used.
D. The create_table method does not have a function parameter. To use a Python function to compute the features, the function should be called first and the output DataFrame should be passed to the df parameter.
E. The write method of the DataFrame does not have a mode called â€˜featureâ€™ or a path parameter. To write a DataFrame to a Feature Store table, the create_table or write_table method of the Feature Store Client should be used.

References:

Feature Store Python API reference - Databricks
Work with features in Workspace Feature Store - Databricks

Question # 5

A machine learning engineering team has written predictions computed in a batch job to a Delta table for querying. However, the team has noticed that the querying is running slowly. The team has alreadytuned the size of the data files. Upon investigating, the team has concluded that the rows meeting the query condition are sparsely located throughout each of the data files.

Based on the scenario, which of the following optimization techniques could speed up the query by colocating similar records while considering values in multiple columns?

Z-Ordering

Bin-packing

Write as a Parquet file

Data skipping

Tuning the file size

Full Access

Answer:

Explanation:

Z-Ordering is an optimization technique that can speed up the query by colocating similar records while considering values in multiple columns. Z-Ordering is a way of organizing data in storage based on the values of one or more columns. Z-Ordering maps multidimensional data to one dimension while preserving locality of the data points. This means that rows with similar values for the specified columns are stored close together in the same set of files. This improves the performance of queries that filter on those columns, as they can skip over irrelevant files or data blocks.Â Z-Ordering also enhances data skipping and caching, as it reduces the number of distinct values per file for the chosen columns1. The other options are incorrect because:

Option B: Bin-packing is an optimization technique that compacts small files into larger ones, but does not colocate similar records based on multiple columns.Â Bin-packing can improve the performance of queries by reducing the number of files that need to be read, but it does not affect the data layout within the files2.
Option C: Writing as a Parquet file is not an optimization technique, but a file format choice. Parquet is a columnar storage format that supports efficient compression and encoding schemes.Â Parquet can improve the performance of queries by reducing the storage footprint and the amount of data transferred, but it does not colocate similar records based on multiple columns3.
Option D: Data skipping is an optimization technique that skips over files or data blocks that do not match the query predicates, but does not colocate similar records based on multiple columns.Â Data skipping can improve the performance of queries by avoiding unnecessary data scans, but it depends on the data layout and the metadata collected for each file4.
Option E: Tuning the file size is an optimization technique that adjusts the size of the data files to a target value, but does not colocate similar records based on multiple columns.Â Tuning the file size can improve the performance of queries by balancing the trade-off between parallelism and overhead, but it does not affectthe data layout within the files5.Â References:Â Z-Ordering (multi-dimensional clustering),Â Compaction (bin-packing),Â Parquet,Â Data skipping,Â Tuning file sizes

Question # 6

In a continuous integration, continuous deployment (CI/CD) process for machine learning pipelines, which of the following events commonly triggers the execution of automated testing?

The launch of a new cost-efficient SQL endpoint

CI/CD pipelines are not needed for machine learning pipelines

The arrival of a new feature table in the Feature Store

The launch of a new cost-efficient job cluster

The arrival of a new model version in the MLflow Model Registry

Full Access

Question # 7

A machine learning engineer is attempting to create a webhook that will trigger a Databricks Jobjob_idwhen a model version for modelmodeltransitions into any MLflow Model Registry stage.

They have the following incomplete code block:

Which of the following lines of code can be used to fill in the blank so that the code block accomplishes the task?

"MODEL_VERSION_CREATED"

"MODEL_VERSION_TRANSITIONED_TO_PRODUCTION"

"MODEL_VERSION_TRANSITIONED_TO_STAGING"

"MODEL_VERSION_TRANSITIONED_STAGE"

"MODEL_VERSION_TRANSITIONED_TO_STAGING", "MODEL_VERSION_TRANSITIONED_TO_PRODUCTION"

Full Access

Question # 8

Which of the following deployment paradigms can centrally compute predictions for a single record with exceedingly fast results?

Streaming

Batch

Edge/on-device

None of these strategies will accomplish the task.

Real-time

Full Access

Question # 9

After a data scientist noticed that a column was missing from a production feature set stored as a Delta table, the machine learning engineering team has been tasked with determining when the column was dropped from the feature set.

Which of the following SQL commands can be used to accomplish this task?

VERSION

DESCRIBE

HISTORY

DESCRIBE HISTORY

TIMESTAMP

Full Access

Question # 10

Which of the following is a simple, low-cost method of monitoring numeric feature drift?

Jensen-Shannon test

Summary statistics trends

Chi-squared test

None of these can be used to monitor feature drift

Kolmogorov-Smirnov (KS) test

Full Access

Question # 11

A data scientist wants to remove the star_rating column from the Delta table at the location path. To do this, they need to load in data and drop the star_rating column.

Which of the following code blocks accomplishes this task?

spark.read.format(â€œdeltaâ€).load(path).drop(â€œstar_ratingâ€)

spark.read.format(â€œdeltaâ€).table(path).drop(â€œstar_ratingâ€)

Delta tables cannot be modified

spark.read.table(path).drop(â€œstar_ratingâ€)

spark.sql(â€œSELECT * EXCEPT star_rating FROM pathâ€)

Full Access

Question # 12

A machine learning engineer wants to log feature importance data from a CSV file at path importance_path with an MLflow run for model model.

Which of the following code blocks will accomplish this task inside of an existing MLflow run block?

mlflow.log_data(importance_path, "feature-importance.csv")

mlflow.log_artifact(importance_path, "feature-importance.csv")

None of these code blocks tan accomplish the task.

Full Access

Question # 13

A data scientist is using MLflow to track their machine learning experiment. As a part of each MLflow run, they are performing hyperparameter tuning. The data scientist would like to have one parent run for the tuning process with a child run for each unique combination of hyperparameter values.

They are using the following code block:

The code block is not nesting the runs in MLflow as they expected.

Which of the following changes does the data scientist need to make to the above code block so that it successfully nests the child runs under the parent run in MLflow?

Indent the child run blocks within the parent run block

Add the nested=True argument to the parent run

Remove the nested=True argument from the child runs

Provide the same name to the run name parameter for all three run blocks

Add the nested=True argument to the parent run and remove the nested=True arguments from the child runs

Full Access

Question # 14

A data scientist has computed updated feature values for all primary key values stored in the Feature Store table features. In addition, feature values for some new primary key values have also been computed. The updated feature values are stored in the DataFrame features_df. They want to replace all data in features with the newly computed data.

Which of the following code blocks can they use to perform this task using the Feature Store Client fs?

Option A

Option B

Option C

Option D

Option E

Full Access

Question # 15

A machine learning engineer is in the process of implementing a concept drift monitoring solution. They are planning to use the following steps:

1. Deploy a model to production and compute predicted values

2. Obtain the observed (actual) label values

3. _____

4. Run a statistical test to determine if there are changes over time

Which of the following should be completed as Step #3?

Obtain the observed values (actual) feature values

Measure the latency of the prediction time

Retrain the model

None of these should be completed as Step #3

Compute the evaluation metric using the observed and predicted values

Full Access

Question # 16

A machine learning engineer has deployed a model recommender using MLflow Model Serving. They now want to query the version of that model that is in the Production stage of the MLflow Model Registry.

Which of the following model URIs can be used to query the described model version?

https:// /model-serving/recommender/Production/invocations

The version number of the model version in Production is necessary to complete this task.

https:// /model/recommender/stage-production/invocations

https:// /model-serving/recommender/stage-production/invocations

https:// /model/recommender/Production/invocations

Full Access

Question # 17

Which of the following tools can assist in real-time deployments by packaging software with its own application, tools, and libraries?

Cloud-based compute

None of these tools

REST APIs

Containers

Autoscaling clusters

Full Access

Question # 18

A machine learning engineer wants to programmatically create a new Databricks Job whose schedule depends on the result of some automated tests in a machine learning pipeline.

Which of the following Databricks tools can be used to programmatically create the Job?

MLflow APIs

AutoML APIs

MLflow Client

Jobs cannot be created programmatically

Databricks REST APIs

Full Access

Summer Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: myex65

MyExamCollection

Databricks-Machine-Learning-Professional Databricks Certified Machine Learning Professional Question and Answers

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Quick Links

Why Us

Unlimited Packages

Site Secure

We Accept