Home
Databricks
Databricks-Certified-Data-Engineer-Associate Exam Info
Databricks-Certified-Data-Engineer-Associate Exam Questions

Curious about Actual Databricks Certified Data Engineer Associate Exam Questions?

Here are sample Databricks Certified Data Engineer Associate (Databricks-Certified-Data-Engineer-Associate) Exam questions from real exam. You can get more Databricks Data Engineer Associate (Databricks-Certified-Data-Engineer-Associate) Exam premium practice questions at TestInsights.

Page: 1 /
Total 100 questions

Want more questions? Get Premium Access.

Question 1

Which of the following commands will return the number of null values in the member_id column?

ASELECT count(member_id) FROM my_table;

BSELECT count(member_id) - count_null(member_id) FROM my_table;

CSELECT count_if(member_id IS NULL) FROM my_table;

DSELECT null(member_id) FROM my_table;

ESELECT count_null(member_id) FROM my_table;

Correct : C

To return the number of null values in the member_id column, the best option is to use the count_if function, which counts the number of rows that satisfy a given condition. In this case, the condition is that the member_id column is null. The other options are either incorrect or not supported by Spark SQL. Option A will return the number of non-null values in the member_id column. Option B will not work because there is no count_null function in Spark SQL. Option D will not work because there is no null function in Spark SQL. Option E will not work because there is no count_null function in Spark SQL.Reference:

Built-in Functions - Spark SQL, Built-in Functions

count_if - Spark SQL, Built-in Functions

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

ASELECT count(member_id) FROM my_table;

BSELECT count(member_id) - count_null(member_id) FROM my_table;

CSELECT count_if(member_id IS NULL) FROM my_table;

DSELECT null(member_id) FROM my_table;

ESELECT count_null(member_id) FROM my_table;

0 / 1500

Question 2

Which of the following must be specified when creating a new Delta Live Tables pipeline?

AA key-value pair configuration

BThe preferred DBU/hour cost

CA path to cloud storage location for the written data

DA location of a target database for the written data

EAt least one notebook library to be executed

Correct : E

Option E is the correct answer because it is the only mandatory requirement when creating a new Delta Live Tables pipeline. A pipeline is a data processing workflow that contains materialized views and streaming tables declared in Python or SQL source files. Delta Live Tables infers the dependencies between these tables and ensures updates occur in the correct order. To create a pipeline, you need to specify at least one notebook library to be executed, which contains the Delta Live Tables syntax. You can also specify multiple libraries of different languages within your pipeline. The other options are optional or not applicable for creating a pipeline. Option A is not required, but you can optionally provide a key-value pair configuration to customize the pipeline settings, such as the storage location, the target schema, the notifications, and the pipeline mode. Option B is not applicable, as the DBU/hour cost is determined by the cluster configuration, not the pipeline creation. Option C is not required, but you can optionally specify a storage location for the output data from the pipeline. If you leave it empty, the system uses a default location. Option D is not required, but you can optionally specify a location of a target database for the written data, either in the Hive metastore or the Unity Catalog.

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

AA key-value pair configuration

BThe preferred DBU/hour cost

CA path to cloud storage location for the written data

DA location of a target database for the written data

EAt least one notebook library to be executed

0 / 1500

Question 3

A data engineer has a Python notebook in Databricks, but they need to use SQL to accomplish a specific task within a cell. They still want all of the other cells to use Python without making any changes to those cells.

Which of the following describes how the data engineer can use SQL within a cell of their Python notebook?

AIt is not possible to use SQL in a Python notebook

BThey can attach the cell to a SQL endpoint rather than a Databricks cluster

CThey can simply write SQL syntax in the cell

DThey can add %sql to the first line of the cell

EThey can change the default language of the notebook to SQL

Correct : D

In Databricks, you can use different languages within the same notebook by using magic commands. Magic commands are special commands that start with a percentage sign (%) and allow you to change the behavior of the cell. To use SQL within a cell of a Python notebook, you can add %sql to the first line of the cell. This will tell Databricks to interpret the rest of the cell as SQL code and execute it against the default database. You can also specify a different database by using the USE statement. The result of the SQL query will be displayed as a table or a chart, depending on the output mode. You can also assign the result to a Python variable by using the -o option. For example, %sql -o df SELECT * FROM my_table will run the SQL query and store the result as a pandas DataFrame in the Python variable df. Option A is incorrect, as it is possible to use SQL in a Python notebook using magic commands. Option B is incorrect, as attaching the cell to a SQL endpoint is not necessary and will not change the language of the cell. Option C is incorrect, as simply writing SQL syntax in the cell will result in a syntax error, as the cell will still be interpreted as Python code. Option E is incorrect, as changing the default language of the notebook to SQL will affect all the cells, not just one.Reference:Use SQL in Notebooks - Knowledge Base - Noteable, [SQL magic commands - Databricks], [Databricks SQL Guide - Databricks]

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

AIt is not possible to use SQL in a Python notebook

BThey can attach the cell to a SQL endpoint rather than a Databricks cluster

CThey can simply write SQL syntax in the cell

DThey can add %sql to the first line of the cell

EThey can change the default language of the notebook to SQL

0 / 1500

Question 4

Which of the following is a benefit of the Databricks Lakehouse Platform embracing open source technologies?

ACloud-specific integrations

BSimplified governance

CAbility to scale storage

DAbility to scale workloads

EAvoiding vendor lock-in

Correct : E

One of the benefits of the Databricks Lakehouse Platform embracing open source technologies is that it avoids vendor lock-in. This means that customers can use the same open source tools and frameworks across different cloud providers, and migrate their data and workloads without being tied to a specific vendor. The Databricks Lakehouse Platform is built on open source projects such as Apache Spark, Delta Lake, MLflow, and Redash, which are widely used and trusted by millions of developers. By supporting these open source technologies, the Databricks Lakehouse Platform enables customers to leverage the innovation and community of the open source ecosystem, and avoid the risk of being locked into proprietary or closed solutions. The other options are either not related to open source technologies (A, B, C, D), or not benefits of the Databricks Lakehouse Platform (A, B).Reference:Databricks Documentation - Built on open source,Databricks Documentation - What is the Lakehouse Platform?,Databricks Blog - Introducing the Databricks Lakehouse Platform.

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

ACloud-specific integrations

BSimplified governance

CAbility to scale storage

DAbility to scale workloads

EAvoiding vendor lock-in

0 / 1500

Question 5

Which tool is used by Auto Loader to process data incrementally?

ASpark Structured Streaming

BUnity Catalog

CCheckpointing

DDatabricks SQL

Correct : A

Auto Loader in Databricks utilizes Spark Structured Streaming for processing data incrementally. This allows Auto Loader to efficiently ingest streaming or batch data at scale and to recognize new data as it arrives in cloud storage. Spark Structured Streaming provides the underlying engine that supports various incremental data loading capabilities like schema inference and file notification mode, which are crucial for the dynamic nature of data lakes.

Reference: Databricks documentation on Auto Loader: Auto Loader Overview

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

ASpark Structured Streaming

BUnity Catalog

CCheckpointing

DDatabricks SQL

0 / 1500

Page: 1 / 20
Total 100 questions

Unlock Full
Databricks-Certified-Data-Engineer-Associate Exam Features

In Just $49 You can Access

All Official Question Types
Interactive Web-Based Practice Test Software
No Installation or 3rd Party Software Required
Customize your practice sessions (Free Demo)
24/7 Customer Support

Get Full Access Now

Marked Questions
Databricks-Certified-Data-Engineer-Associate Exam

Databricks-Certified-Data-Engineer-Associate Exam Question 1
Databricks-Certified-Data-Engineer-Associate Exam Question 2
Databricks-Certified-Data-Engineer-Associate Exam Question 3
Databricks-Certified-Data-Engineer-Associate Exam Question 4
Databricks-Certified-Data-Engineer-Associate Exam Question 5

Download PDF File Demo

Try Web-Based Exam Practice Software Demo

Commenting

In order to participate in the comments you need to be logged-in.
You can sign-up or login