1. Home
  2. Databricks
  3. Databricks-Certified-Data-Engineer-Associate Exam Info
  4. Databricks-Certified-Data-Engineer-Associate Exam Questions

Curious about Actual Databricks Certified Data Engineer Associate Exam Questions?

Here are sample Databricks Certified Data Engineer Associate (Databricks-Certified-Data-Engineer-Associate) Exam questions from real exam. You can get more Databricks Data Engineer Associate (Databricks-Certified-Data-Engineer-Associate) Exam premium practice questions at TestInsights.

Page: 1 /
Total 100 questions
Question 1

Which of the following commands will return the number of null values in the member_id column?


Correct : C

To return the number of null values in the member_id column, the best option is to use the count_if function, which counts the number of rows that satisfy a given condition. In this case, the condition is that the member_id column is null. The other options are either incorrect or not supported by Spark SQL. Option A will return the number of non-null values in the member_id column. Option B will not work because there is no count_null function in Spark SQL. Option D will not work because there is no null function in Spark SQL. Option E will not work because there is no count_null function in Spark SQL.Reference:

Built-in Functions - Spark SQL, Built-in Functions

count_if - Spark SQL, Built-in Functions


Options Selected by Other Users:
Mark Question:

Start a Discussions

Submit Your Answer:
0 / 1500
Question 2

Which of the following must be specified when creating a new Delta Live Tables pipeline?


Correct : E

Option E is the correct answer because it is the only mandatory requirement when creating a new Delta Live Tables pipeline. A pipeline is a data processing workflow that contains materialized views and streaming tables declared in Python or SQL source files. Delta Live Tables infers the dependencies between these tables and ensures updates occur in the correct order. To create a pipeline, you need to specify at least one notebook library to be executed, which contains the Delta Live Tables syntax. You can also specify multiple libraries of different languages within your pipeline. The other options are optional or not applicable for creating a pipeline. Option A is not required, but you can optionally provide a key-value pair configuration to customize the pipeline settings, such as the storage location, the target schema, the notifications, and the pipeline mode. Option B is not applicable, as the DBU/hour cost is determined by the cluster configuration, not the pipeline creation. Option C is not required, but you can optionally specify a storage location for the output data from the pipeline. If you leave it empty, the system uses a default location. Option D is not required, but you can optionally specify a location of a target database for the written data, either in the Hive metastore or the Unity Catalog.


Options Selected by Other Users:
Mark Question:

Start a Discussions

Submit Your Answer:
0 / 1500
Question 3

A data engineer has a Python notebook in Databricks, but they need to use SQL to accomplish a specific task within a cell. They still want all of the other cells to use Python without making any changes to those cells.

Which of the following describes how the data engineer can use SQL within a cell of their Python notebook?


Correct : D

In Databricks, you can use different languages within the same notebook by using magic commands. Magic commands are special commands that start with a percentage sign (%) and allow you to change the behavior of the cell. To use SQL within a cell of a Python notebook, you can add %sql to the first line of the cell. This will tell Databricks to interpret the rest of the cell as SQL code and execute it against the default database. You can also specify a different database by using the USE statement. The result of the SQL query will be displayed as a table or a chart, depending on the output mode. You can also assign the result to a Python variable by using the -o option. For example, %sql -o df SELECT * FROM my_table will run the SQL query and store the result as a pandas DataFrame in the Python variable df. Option A is incorrect, as it is possible to use SQL in a Python notebook using magic commands. Option B is incorrect, as attaching the cell to a SQL endpoint is not necessary and will not change the language of the cell. Option C is incorrect, as simply writing SQL syntax in the cell will result in a syntax error, as the cell will still be interpreted as Python code. Option E is incorrect, as changing the default language of the notebook to SQL will affect all the cells, not just one.Reference:Use SQL in Notebooks - Knowledge Base - Noteable, [SQL magic commands - Databricks], [Databricks SQL Guide - Databricks]


Options Selected by Other Users:
Mark Question:

Start a Discussions

Submit Your Answer:
0 / 1500
Question 4

Which of the following is a benefit of the Databricks Lakehouse Platform embracing open source technologies?


Correct : E

One of the benefits of the Databricks Lakehouse Platform embracing open source technologies is that it avoids vendor lock-in. This means that customers can use the same open source tools and frameworks across different cloud providers, and migrate their data and workloads without being tied to a specific vendor. The Databricks Lakehouse Platform is built on open source projects such as Apache Spark, Delta Lake, MLflow, and Redash, which are widely used and trusted by millions of developers. By supporting these open source technologies, the Databricks Lakehouse Platform enables customers to leverage the innovation and community of the open source ecosystem, and avoid the risk of being locked into proprietary or closed solutions. The other options are either not related to open source technologies (A, B, C, D), or not benefits of the Databricks Lakehouse Platform (A, B).Reference:Databricks Documentation - Built on open source,Databricks Documentation - What is the Lakehouse Platform?,Databricks Blog - Introducing the Databricks Lakehouse Platform.


Options Selected by Other Users:
Mark Question:

Start a Discussions

Submit Your Answer:
0 / 1500
Question 5

Which tool is used by Auto Loader to process data incrementally?


Correct : A

Auto Loader in Databricks utilizes Spark Structured Streaming for processing data incrementally. This allows Auto Loader to efficiently ingest streaming or batch data at scale and to recognize new data as it arrives in cloud storage. Spark Structured Streaming provides the underlying engine that supports various incremental data loading capabilities like schema inference and file notification mode, which are crucial for the dynamic nature of data lakes.

Reference: Databricks documentation on Auto Loader: Auto Loader Overview


Options Selected by Other Users:
Mark Question:

Start a Discussions

Submit Your Answer:
0 / 1500
Page:    1 / 20   
Total 100 questions