Home
Databricks
Databricks-Certified-Professional-Data-Engineer Exam Info
Databricks-Certified-Professional-Data-Engineer Exam Questions

Curious about Actual Databricks Certified Data Engineer Professional Exam Questions?

Here are sample Databricks Certified Data Engineer Professional (Databricks-Certified-Professional-Data-Engineer) Exam questions from real exam. You can get more Databricks Data Engineer Professional (Databricks-Certified-Professional-Data-Engineer) Exam premium practice questions at TestInsights.

Page: 1 /
Total 120 questions

Want more questions? Get Premium Access.

Question 1

A Delta Lake table in the Lakehouse named customer_parsams is used in churn prediction by the machine learning team. The table contains information about customers derived from a number of upstream sources. Currently, the data engineering team populates this table nightly by overwriting the table with the current valid values derived from upstream data sources.

Immediately after each update succeeds, the data engineer team would like to determine the difference between the new version and the previous of the table.

Given the current implementation, which method can be used?

AParse the Delta Lake transaction log to identify all newly written data files.

BExecute DESCRIBE HISTORY customer_churn_params to obtain the full operation metrics for the update, including a log of all records that have been added or modified.

CExecute a query to calculate the difference between the new version and the previous version using Delta Lake's built-in versioning and time travel functionality.

DParse the Spark event logs to identify those rows that were updated, inserted, or deleted.

Correct : C

Delta Lake provides built-in versioning and time travel capabilities, allowing users to query previous snapshots of a table. This feature is particularly useful for understanding changes between different versions of the table. In this scenario, where the table is overwritten nightly, you can use Delta Lake's time travel feature to execute a query comparing the latest version of the table (the current state) with its previous version. This approach effectively identifies the differences (such as new, updated, or deleted records) between the two versions. The other options do not provide a straightforward or efficient way to directly compare different versions of a Delta Lake table.

Delta Lake Documentation on Time Travel: Delta Time Travel

Delta Lake Versioning: Delta Lake Versioning Guide

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

AParse the Delta Lake transaction log to identify all newly written data files.

BExecute DESCRIBE HISTORY customer_churn_params to obtain the full operation metrics for the update, including a log of all records that have been added or modified.

CExecute a query to calculate the difference between the new version and the previous version using Delta Lake's built-in versioning and time travel functionality.

DParse the Spark event logs to identify those rows that were updated, inserted, or deleted.

0 / 1500

Question 2

A data engineer is performing a join operating to combine values from a static userlookup table with a streaming DataFrame streamingDF.

Which code block attempts to perform an invalid stream-static join?

AuserLookup.join(streamingDF, ['userid'], how='inner')

BstreamingDF.join(userLookup, ['user_id'], how='outer')

CstreamingDF.join(userLookup, ['user_id''], how='left')

DstreamingDF.join(userLookup, ['userid'], how='inner')

EuserLookup.join(streamingDF, ['user_id'], how='right')

Correct : E

In Spark Structured Streaming, certain types of joins between a static DataFrame and a streaming DataFrame are not supported. Specifically, a right outer join where the static DataFrame is on the left side and the streaming DataFrame is on the right side is not valid. This is because Spark Structured Streaming cannot handle scenarios where it has to wait for new rows to arrive in the streaming DataFrame to match rows in the static DataFrame. The other join types listed (inner, left, and full outer joins) are supported in streaming-static DataFrame joins.

Structured Streaming Programming Guide: Join Operations

Databricks Documentation on Stream-Static Joins: Databricks Stream-Static Joins

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

AuserLookup.join(streamingDF, ['userid'], how='inner')

BstreamingDF.join(userLookup, ['user_id'], how='outer')

CstreamingDF.join(userLookup, ['user_id''], how='left')

DstreamingDF.join(userLookup, ['userid'], how='inner')

EuserLookup.join(streamingDF, ['user_id'], how='right')

0 / 1500

Question 3

Which statement describes the default execution mode for Databricks Auto Loader?

ANew files are identified by listing the input directory; new files are incrementally and idempotently loaded into the target Delta Lake table.

BCloud vendor-specific queue storage and notification services are configured to track newly arriving files; new files are incrementally and impotently into the target Delta Lake table.

CWebhook trigger Databricks job to run anytime new data arrives in a source directory; new data automatically merged into target tables using rules inferred from the data.

DNew files are identified by listing the input directory; the target table is materialized by directory querying all valid files in the source directory.

Correct : A

Databricks Auto Loader simplifies and automates the process of loading data into Delta Lake. The default execution mode of the Auto Loader identifies new files by listing the input directory. It incrementally and idempotently loads these new files into the target Delta Lake table. This approach ensures that files are not missed and are processed exactly once, avoiding data duplication. The other options describe different mechanisms or integrations that are not part of the default behavior of the Auto Loader.

Databricks Auto Loader Documentation: Auto Loader Guide

Delta Lake and Auto Loader: Delta Lake Integration

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

ANew files are identified by listing the input directory; new files are incrementally and idempotently loaded into the target Delta Lake table.

BCloud vendor-specific queue storage and notification services are configured to track newly arriving files; new files are incrementally and impotently into the target Delta Lake table.

CWebhook trigger Databricks job to run anytime new data arrives in a source directory; new data automatically merged into target tables using rules inferred from the data.

DNew files are identified by listing the input directory; the target table is materialized by directory querying all valid files in the source directory.

0 / 1500

Question 4

Spill occurs as a result of executing various wide transformations. However, diagnosing spill requires one to proactively look for key indicators.

Where in the Spark UI are two of the primary indicators that a partition is spilling to disk?

AStage's detail screen and Executor's files

BStage's detail screen and Query's detail screen

CDriver's and Executor's log files

DExecutor's detail screen and Executor's log files

Correct : B

In Apache Spark's UI, indicators of data spilling to disk during the execution of wide transformations can be found in the Stage's detail screen and the Query's detail screen. These screens provide detailed metrics about each stage of a Spark job, including information about memory usage and spill data. If a task is spilling data to disk, it indicates that the data being processed exceeds the available memory, causing Spark to spill data to disk to free up memory. This is an important performance metric as excessive spill can significantly slow down the processing.

Apache Spark Monitoring and Instrumentation: Spark Monitoring Guide

Spark UI Explained: Spark UI Documentation

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

AStage's detail screen and Executor's files

BStage's detail screen and Query's detail screen

CDriver's and Executor's log files

DExecutor's detail screen and Executor's log files

0 / 1500

Question 5

What is the first of a Databricks Python notebook when viewed in a text editor?

A%python

B% Databricks notebook source

C-- Databricks notebook source

D//Databricks notebook source

Correct : B

When viewing a Databricks Python notebook in a text editor, the first line indicates the format and source type of the notebook. The correct option is % Databricks notebook source, which is a magic command that specifies the start of a Databricks notebook source file.

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

A%python

B% Databricks notebook source

C-- Databricks notebook source

D//Databricks notebook source

0 / 1500

Page: 1 / 24
Total 120 questions

Unlock Full
Databricks-Certified-Professional-Data-Engineer Exam Features

In Just $49 You can Access

All Official Question Types
Interactive Web-Based Practice Test Software
No Installation or 3rd Party Software Required
Customize your practice sessions (Free Demo)
24/7 Customer Support

Get Full Access Now

Marked Questions
Databricks-Certified-Professional-Data-Engineer Exam

Databricks-Certified-Professional-Data-Engineer Exam Question 1
Databricks-Certified-Professional-Data-Engineer Exam Question 2
Databricks-Certified-Professional-Data-Engineer Exam Question 3
Databricks-Certified-Professional-Data-Engineer Exam Question 4
Databricks-Certified-Professional-Data-Engineer Exam Question 5

Download PDF File Demo

Try Web-Based Exam Practice Software Demo

Commenting

In order to participate in the comments you need to be logged-in.
You can sign-up or login