Databricks-Generative-AI-Engineer-Associate Databricks Certified Generative AI Engineer Associate sample Question + Exam 2025 Practice Exam Dumps

Question # 4

A Generative Al Engineer is tasked with improving the RAG quality by addressing its inflammatory outputs.

Which action would be most effective in mitigating the problem of offensive text outputs?

Increase the frequency of upstream data updates

Inform the user of the expected RAG behavior

Restrict access to the data sources to a limited number of users

Curate upstream data properly that includes manual review before it is fed into the RAG system

Full Access

Question # 5

What is the most suitable library for building a multi-step LLM-based workflow?

Pandas

TensorFlow

PySpark

LangChain

Full Access

Question # 6

A Generative Al Engineer is building a RAG application that answers questions about internal documents for the company SnoPen AI.

The source documents may contain a significant amount of irrelevant content, such as advertisements, sports news, or entertainment news, or content about other companies.

Which approach is advisable when building a RAG application to achieve this goal of filtering irrelevant information?

Keep all articles because the RAG application needs to understand non-company content to avoid answering questions about them.

Include in the system prompt that any information it sees will be about SnoPenAI, even if no data filtering is performed.

Include in the system prompt that the application is not supposed to answer any questions unrelated to SnoPen Al.

Consolidate all SnoPen AI related documents into a single chunk in the vector database.

Full Access

Question # 7

A Generative AI Engineer is designing an LLM-powered live sports commentary platform. The platform provides real-time updates and LLM-generated analyses for any users who would like to have live summaries, rather than reading a series of potentially outdated news articles.

Which tool below will give the platform access to real-time data for generating game analyses based on the latest game scores?

DatabrickslQ

Foundation Model APIs

Feature Serving

AutoML

Full Access

Question # 8

Which TWO chain components are required for building a basic LLM-enabled chat application that includes conversational capabilities, knowledge retrieval, and contextual memory?

(Q)

Vector Stores

Conversation Buffer Memory

External tools

Chat loaders

React Components

Full Access

Answer:

B, C

Explanation:

Building a basic LLM-enabled chat application with conversational capabilities, knowledge retrieval, and contextual memory requires specific components that work together to process queries, maintain context, and retrieve relevant information. Databricksâ€™ Generative AI Engineer documentation outlines key components for such systems, particularly in the context of frameworks like LangChain or Databricksâ€™ MosaicML integrations. Letâ€™s evaluate the required components:

Understanding the Requirements:

Conversational capabilities: The app must generate natural, coherent responses.

Knowledge retrieval: It must access external or domain-specific knowledge.

Contextual memory: It must remember prior interactions in the conversation.

Databricks Reference:"A typical LLM chat application includes a memory component to track conversation history and a retrieval mechanism to incorporate external knowledge"("Databricks Generative AI Cookbook," 2023).

Evaluating the Options:

A. (Q): This appears incomplete or unclear (possibly a typo). Without further context, itâ€™s not a valid component.

B. Vector Stores: These store embeddings of documents or knowledge bases, enabling semantic search and retrieval of relevant information for the LLM. This is critical for knowledge retrieval in a chat application.

Databricks Reference:"Vector stores, such as those integrated with Databricksâ€™ Lakehouse, enable efficient retrieval of contextual data for LLMs"("Building LLM Applications with Databricks").

C. Conversation Buffer Memory: This component stores the conversation history, allowing the LLM to maintain context across multiple turns. Itâ€™s essential for contextual memory.

Databricks Reference:"Conversation Buffer Memory tracks prior user inputs and LLM outputs, ensuring context-aware responses"("Generative AI Engineer Guide").

D. External tools: These (e.g., APIs or calculators) enhance functionality but arenâ€™t required for abasicchat app with the specified capabilities.

E. Chat loaders: These might refer to data loaders for chat logs, but theyâ€™re not a core chain component for conversational functionality or memory.

F. React Components: These relate to front-end UI development, not the LLM chainâ€™s backend functionality.

Selecting the Two Required Components:

Forknowledge retrieval, Vector Stores (B) are necessary to fetch relevant external data, a cornerstone of Databricksâ€™ RAG-based chat systems.

Forcontextual memory, Conversation Buffer Memory (C) is required to maintain conversation history, ensuring coherent and context-aware responses.

While an LLM itself is implied as the core generator, the question asks for chain components beyond the model, making B and C the minimal yet sufficient pair for a basic application.

Conclusion: The two required chain components areB. Vector StoresandC. Conversation Buffer Memory, as they directly address knowledge retrieval and contextual memory, respectively, aligning with Databricksâ€™ documented best practices for LLM-enabled chat applications.

Question # 9

A Generative Al Engineer at an automotive company would like to build a question-answering chatbot for customers to inquire about their vehicles. They have a database containing various documents of different vehicle makes, their hardware parts, and common maintenance information.

Which of the following components will NOT be useful in building such a chatbot?

Response-generating LLM

Invite users to submit long, rather than concise, questions

Vector database

Embedding model

Full Access

Answer:

Explanation:

The task involves building a question-answering chatbot for an automotive company using a database of vehicle-related documents. The chatbot must efficiently process customer inquiries and provide accurate responses. Letâ€™s evaluate each component to determine which isnotuseful, per Databricks Generative AI Engineer principles.

Option A: Response-generating LLM

An LLM is essential for generating natural language responses to customer queries based on retrieved information. This is a core component of any chatbot.

Databricks Reference:"The response-generating LLM processes retrieved context to produce coherent answers"("Building LLM Applications with Databricks," 2023).

Option B: Invite users to submit long, rather than concise, questions

Encouraging long questions is a user interaction design choice, not a technical component of the chatbotâ€™s architecture. Moreover, long, verbose questions can complicate intent detection and retrieval, reducing efficiency and accuracyâ€”counter to best practices for chatbot design. Concise questions are typically preferred for clarity and performance.

Databricks Reference: While not explicitly stated, Databricksâ€™ "Generative AI Cookbook" emphasizes efficient query processing, implying that simpler, focused inputs improve LLM performance. Inviting long questions doesnâ€™t align with this.

Option C: Vector database

A vector database stores embeddings of the vehicle documents, enabling fast retrieval of relevant information via semantic search. This is critical for a question-answering system with a large document corpus.

Databricks Reference:"Vector databases enable scalable retrieval of context from large datasets"("Databricks Generative AI Engineer Guide").

Option D: Embedding model

An embedding model converts text (documents and queries) into vector representations for similarity search. Itâ€™s a foundational component for retrieval-augmented generation (RAG) in chatbots.

Databricks Reference:"Embedding models transform text into vectors, facilitating efficient matching of queries to documents"("Building LLM-Powered Applications").

Conclusion: Option B is not a usefulcomponentin building the chatbot. Itâ€™s a user-facing suggestion rather than a technical building block, and it could even degrade performance by introducing unnecessary complexity. Options A, C, and D are all integral to a Databricks-aligned chatbot architecture.

Question # 10

A Generative AI Engineer just deployed an LLM application at a digital marketing company that assists with answering customer service inquiries.

Which metric should they monitor for their customer service LLM application in production?

Number of customer inquiries processed per unit of time

Energy usage per query

Final perplexity scores for the training of the model

HuggingFace Leaderboard values for the base LLM

Full Access

Question # 11

A Generative Al Engineer is building a system that will answer questions on currently unfolding news topics. As such, it pulls information from a variety of sources including articles and social media posts. They are concerned about toxic posts on social media causing toxic outputs from their system.

Which guardrail will limit toxic outputs?

Use only approved social media and news accounts to prevent unexpected toxic data from getting to the LLM.

Implement rate limiting

Reduce the amount of context Items the system will Include in consideration for its response.

Log all LLM system responses and perform a batch toxicity analysis monthly.

Full Access

Answer:

Explanation:

The system answers questions on unfolding news topics using articles and social media, with a concern about toxic outputs from toxic inputs. A guardrail must limit toxicity in the LLMâ€™s responses. Letâ€™s evaluate the options.

Option A: Use only approved social media and news accounts to prevent unexpected toxic data from getting to the LLM

Curating input sources (e.g., verified accounts) reduces exposure to toxic content at the data ingestion stage, directly limiting toxic outputs. This is a proactive guardrail aligned with data quality control.

Databricks Reference:"Control input data quality to mitigate unwanted LLM behavior, such as toxicity"("Building LLM Applications with Databricks," 2023).

Option B: Implement rate limiting

Rate limiting controls request frequency, not content quality. It prevents overload but doesnâ€™t address toxicity in social media inputs or outputs.

Databricks Reference: Rate limiting is for performance, not safety:"Use rate limits to manage compute load"("Generative AI Cookbook").

Option C: Reduce the amount of context items the system will include in consideration for its response

Reducing context might limit exposure to some toxic items but risks losing relevant information, and it doesnâ€™t specifically target toxicity. Itâ€™s an indirect, imprecise fix.

Databricks Reference: Context reduction is for efficiency, not safety:"Adjust context size based on performance needs"("Databricks Generative AI Engineer Guide").

Option D: Log all LLM system responses and perform a batch toxicity analysis monthly

Logging and analyzing responses is reactive, identifying toxicity after it occurs rather than preventing it. Monthly analysis doesnâ€™t limit real-time toxic outputs.

Databricks Reference: Monitoring is for auditing, not prevention:"Log outputs for post-hoc analysis, but use input filters for safety"("Building LLM-Powered Applications").

Conclusion: Option A is the most effective guardrail, proactively filtering toxic inputs from unverified sources, which aligns with Databricksâ€™ emphasis on data quality as a primary safety mechanism for LLM systems.

Question # 12

A Generative Al Engineer has created a RAG application to look up answers to questions about a series of fantasy novels that are being asked on the authorâ€™s web forum. The fantasy novel texts are chunked and embedded into a vector store with metadata (page number, chapter number, book title), retrieved with the userâ€™s query, and provided to an LLM for response generation. The Generative AI Engineer used their intuition to pick the chunking strategy and associated configurations but now wants to more methodically choose the best values.

Which TWO strategies should the Generative AI Engineer take to optimize their chunking strategy and parameters? (Choose two.)

Change embedding models and compare performance.

Add a classifier for user queries that predicts which book will best contain the answer. Use this to filter retrieval.

Choose an appropriate evaluation metric (such as recall or NDCG) and experiment with changes in the chunking strategy, such as splitting chunks by paragraphs or chapters.

Choose the strategy that gives the best performance metric.

Pass known questions and best answers to an LLM and instruct the LLM to provide the best token count. Use a summary statistic (mean, median, etc.) of the best token counts to choose chunk size.

Create an LLM-as-a-judge metric to evaluate how well previous questions are answered by the most appropriate chunk. Optimize the chunking parameters based upon the values of the metric.

Full Access

Answer:

C, E

Explanation:

To optimize a chunking strategy for a Retrieval-Augmented Generation (RAG) application, the Generative AI Engineer needs a structured approach to evaluating the chunking strategy, ensuring that the chosen configuration retrieves the most relevant information and leads to accurate and coherent LLM responses. Here's whyCandEare the correct strategies:

Strategy C: Evaluation Metrics (Recall, NDCG)

Define an evaluation metric: Common evaluation metrics such as recall, precision, or NDCG (Normalized Discounted Cumulative Gain) measure how well the retrieved chunks match the user's query and the expected response.

Recallmeasures the proportion of relevant information retrieved.

NDCGis often used when you want to account for both the relevance of retrieved chunks and the ranking or order in which they are retrieved.

Experiment with chunking strategies: Adjusting chunking strategies based on text structure (e.g., splitting by paragraph, chapter, or a fixed number of tokens) allows the engineer to experiment with various ways of slicing the text. Some chunks may better align with the user's query than others.

Evaluate performance: By using recall or NDCG, the engineer can methodically test various chunking strategies to identify which one yields the highest performance. This ensures that the chunking method provides the most relevant information when embedding and retrieving data from the vector store.

Strategy E: LLM-as-a-Judge Metric

Use the LLM as an evaluator: After retrieving chunks, the LLM can be used to evaluate the quality of answers based on the chunks provided. This could be framed as a "judge" function, where the LLM compares how well a given chunk answers previous user queries.

Optimize based on the LLM's judgment: By having the LLM assess previous answers and rate their relevance and accuracy, the engineer can collect feedback on how well different chunking configurations perform in real-world scenarios.

This metric could be a qualitative judgment on how closely the retrieved information matches the user's intent.

Tune chunking parameters: Based on the LLM's judgment, the engineer can adjust the chunk size or structure to better align with the LLM's responses, optimizing retrieval for future queries.

By combining these two approaches, the engineer ensures that the chunking strategy is systematically evaluated using both quantitative (recall/NDCG) and qualitative (LLM judgment) methods. This balanced optimization process results in improved retrieval relevance and, consequently, better response generation by the LLM.

Question # 13

A small and cost-conscious startup in the cancer research field wants to build a RAG application using Foundation Model APIs.

Which strategy would allow the startup to build a good-quality RAG application while being cost-conscious and able to cater to customer needs?

Limit the number of relevant documents available for the RAG application to retrieve from

Pick a smaller LLM that is domain-specific

Limit the number of queries a customer can send per day

Use the largest LLM possible because that gives the best performance for any general queries

Full Access

Question # 14

A Generative Al Engineer is setting up a Databricks Vector Search that will lookup news articles by topic within 10 days of the date specified An example query might be "Tell me about monster truck news around January 5th 1992". They want to do this with the least amount of effort.

How can they set up their Vector Search index to support this use case?

Split articles by 10 day blocks and return the block closest to the query.

Include metadata columns for article date and topic to support metadata filtering.

pass the query directly to the vector search index and return the best articles.

Create separate indexes by topic and add a classifier model to appropriately pick the best index.

Full Access

Answer:

Explanation:

The task is to set up a Databricks Vector Search index for news articles, supporting queries like â€œmonster truck news around January 5th, 1992,â€ with minimal effort. The index must filter by topic and a 10-day date range. Letâ€™s evaluate the options.

Option A: Split articles by 10-day blocks and return the block closest to the query

Pre-splitting articles into 10-day blocks requires significant preprocessing and index management (e.g., one index per block). Itâ€™s effort-intensive and inflexible for dynamic date ranges.

Databricks Reference:"Static partitioning increases setup complexity; metadata filtering is preferred"("Databricks Vector Search Documentation").

Option B: Include metadata columns for article date and topic to support metadata filtering

Adding date and topic as metadata in the Vector Search index allows dynamic filtering (e.g., date Â± 5 days, topic = â€œmonster truckâ€) at query time. This leverages Databricksâ€™ built-in metadata filtering, minimizing setup effort.

Databricks Reference:"Vector Search supports metadata filtering on columns like date or category for precise retrieval with minimal preprocessing"("Vector Search Guide," 2023).

Option C: Pass the query directly to the vector search index and return the best articles

Passing the full query (e.g., â€œTell me about monster truck news around January 5th, 1992â€) to Vector Search relies solely on embeddings, ignoring structured filtering for date and topic. This risks inaccurate results without explicit range logic.

Databricks Reference:"Pure vector similarity may not handle temporal or categorical constraints effectively"("Building LLM Applications with Databricks").

Option D: Create separate indexes by topic and add a classifier model to appropriately pick the best index

Separate indexes per topic plus a classifier model adds significant complexity (index creation, model training, maintenance), far exceeding â€œleast effort.â€ Itâ€™s overkill for this use case.

Databricks Reference:"Multiple indexes increase overhead; single-index with metadata is simpler"("Databricks Vector Search Documentation").

Conclusion: Option B is the simplest and most effective solution, using metadata filtering in a single Vector Search index to handle date ranges and topics, aligning with Databricksâ€™ emphasis on efficient, low-effort setups.

Question # 15

A Generative Al Engineer is working with a retail company that wants to enhance its customer experience by automatically handling common customer inquiries. They are working on an LLM-powered Al solution that should improve response times while maintaining a personalized interaction. They want to define the appropriate input and LLM task to do this.

Which input/output pair will do this?

Input: Customer reviews; Output Group the reviews by users and aggregate per-user average rating, then respond

Input: Customer service chat logs; Output Group the chat logs by users, followed by summarizing each user's interactions, then respond

Input: Customer service chat logs; Output: Find the answers to similar questions and respond with a summary

Input: Customer reviews: Output Classify review sentiment

Full Access

Answer:

Explanation:

The task described in the question involves enhancing customer experience by automatically handling common customer inquiries using an LLM-powered AI solution. This requires the system to process input data (customer inquiries) and generate personalized, relevant responses efficiently. Letâ€™s evaluate the options step-by-step in the context of Databricks Generative AI Engineer principles, which emphasize leveraging LLMs for tasks like question answering, summarization, and retrieval-augmented generation (RAG).

Option A: Input: Customer reviews; Output: Group the reviews by users and aggregate per-user average rating, then respond

This option focuses on analyzing customer reviews to compute average ratings per user. While this might be useful for sentiment analysis or user profiling, it does not directly address the goal of handling common customer inquiries or improving response times for personalized interactions. Customer reviews are typically feedback data, not real-time inquiries requiring immediate responses.

Databricks Reference: Databricks documentation on LLMs (e.g., "Building LLM Applications with Databricks") emphasizes that LLMs excel at tasks like question answering and conversational responses, not just aggregation or statistical analysis of reviews.

Option B: Input: Customer service chat logs; Output: Group the chat logs by users, followed by summarizing each user's interactions, then respond

This option uses chat logs as input, which aligns with customer service scenarios. However, the outputâ€”grouping by users and summarizing interactionsâ€”focuses on user-specific summaries rather than directly addressing inquiries. While summarization is an LLM capability, this approach lacks the specificity of finding answers to common questions, which is central to the problem.

Databricks Reference: Per Databricksâ€™ "Generative AI Cookbook," LLMs can summarize text, but for customer service, the emphasis is on retrieval and response generation (e.g., RAG workflows) rather than user interaction summaries alone.

Option C: Input: Customer service chat logs; Output: Find the answers to similar questions and respond with a summary

This option uses chat logs (real customer inquiries) as input and tasks the LLM with identifying answers to similar questions, then providing a summarized response. This directly aligns with the goal of handling common inquiries efficiently while maintaining personalization (by referencing past interactions or similar cases). It leverages LLM capabilities like semantic search, retrieval, and response generation, which are core to Databricksâ€™ LLM workflows.

Databricks Reference: From Databricks documentation ("Building LLM-Powered Applications," 2023), an exact extract states:"For customer support use cases, LLMs can be used to retrieve relevant answers from historical data like chat logs and generate concise, contextually appropriate responses."This matches Option Câ€™s approach of finding answers and summarizing them.

Option D: Input: Customer reviews; Output: Classify review sentiment

This option focuses on sentiment classification of reviews, which is a valid LLM task but unrelated to handling customer inquiries or improving response times in a conversational context. Itâ€™s more suited for feedback analysis than real-time customer service.

Databricks Reference: Databricksâ€™ "Generative AI Engineer Guide" notes that sentiment analysis is a common LLM task, but itâ€™s not highlighted for real-time conversational applications like customer support.

Conclusion: Option C is the best fit because it uses relevant input (chat logs) and defines an LLM task (finding answers and summarizing) that meets the requirements of improving response times and maintaining personalized interaction. This aligns with Databricksâ€™ recommended practices for LLM-powered customer service solutions, such as retrieval-augmented generation (RAG) workflows.

Question # 16

A Generative AI Engineer is testing a simple prompt template in LangChain using the code below, but is getting an error.

Assuming the API key was properly defined, what change does the Generative AI Engineer need to make to fix their chain?

Option A

Option B

Option C

Option D

Full Access

Question # 17

A Generative Al Engineer has developed an LLM application to answer questions about internal company policies. The Generative AI Engineer must ensure that the application doesnâ€™t hallucinate or leak confidential data.

Which approach should NOT be used to mitigate hallucination or confidential data leakage?

Add guardrails to filter outputs from the LLM before it is shown to the user

Fine-tune the model on your data, hoping it will learn what is appropriate and not

Limit the data available based on the userâ€™s access level

Use a strong system prompt to ensure the model aligns with your needs.

Full Access

Question # 18

A Generative AI Engineer has a provisioned throughput model serving endpoint as part of a RAG application and would like to monitor the serving endpointâ€™s incoming requests and outgoing responses. The current approach is to include a micro-service in between the endpoint and the user interface to write logs to a remote server.

Which Databricks feature should they use instead which will perform the same task?

Vector Search

Lakeview

DBSQL

Inference Tables

Full Access

Weekend Sale - Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: mxmas70

MyExamCollection

Databricks-Generative-AI-Engineer-Associate Databricks Certified Generative AI Engineer Associate Question and Answers

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Answer:

Explanation:

Quick Links

Why Us

Unlimited Packages

Site Secure

We Accept