Home
Amazon
MLS-C01 Exam Info
MLS-C01 Exam Questions

Curious about Actual Amazon Specialty (MLS-C01) Exam Questions?

Here are sample Amazon AWS Certified Machine Learning - Specialty (MLS-C01) Exam questions from real exam. You can get more Amazon Specialty (MLS-C01) Exam premium practice questions at TestInsights.

Page: 1 /
Total 324 questions

Want more questions? Get Premium Access.

Question 1

A machine learning engineer is building a bird classification model. The engineer randomly separates a dataset into a training dataset and a validation dataset. During the training phase, the model achieves very high accuracy. However, the model did not generalize well during validation of the validation dataset. The engineer realizes that the original dataset was imbalanced.

What should the engineer do to improve the validation accuracy of the model?

APerform stratified sampling on the original dataset.

BAcquire additional data about the majority classes in the original dataset.

CUse a smaller, randomly sampled version of the training dataset.

DPerform systematic sampling on the original dataset.

Correct : A

Stratified sampling is a technique that preserves the class distribution of the original dataset when creating a smaller or split dataset. This means that the proportion of examples from each class in the original dataset is maintained in the smaller or split dataset. Stratified sampling can help improve the validation accuracy of the model by ensuring that the validation dataset is representative of the original dataset and not biased towards any class. This can reduce the variance and overfitting of the model and increase its generalization ability. Stratified sampling can be applied to both oversampling and undersampling methods, depending on whether the goal is to increase or decrease the size of the dataset.

The other options are not effective ways to improve the validation accuracy of the model. Acquiring additional data about the majority classes in the original dataset will only increase the imbalance and make the model more biased towards the majority classes. Using a smaller, randomly sampled version of the training dataset will not guarantee that the class distribution is preserved and may result in losing important information from the minority classes. Performing systematic sampling on the original dataset will also not ensure that the class distribution is preserved and may introduce sampling bias if the original dataset is ordered or grouped by class.

References:

* Stratified Sampling for Imbalanced Datasets

* Imbalanced Data

* Tour of Data Sampling Methods for Imbalanced Classification

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

APerform stratified sampling on the original dataset.

BAcquire additional data about the majority classes in the original dataset.

CUse a smaller, randomly sampled version of the training dataset.

DPerform systematic sampling on the original dataset.

0 / 1500

Question 2

A data scientist is trying to improve the accuracy of a neural network classification model. The data scientist wants to run a large hyperparameter tuning job in Amazon SageMaker.

However, previous smaller tuning jobs on the same model often ran for several weeks. The ML specialist wants to reduce the computation time required to run the tuning job.

Which actions will MOST reduce the computation time for the hyperparameter tuning job? (Select TWO.)

AUse the Hyperband tuning strategy.

BIncrease the number of hyperparameters.

CSet a lower value for the MaxNumberOfTrainingJobs parameter.

DUse the grid search tuning strategy

ESet a lower value for the MaxParallelTrainingJobs parameter.

Correct : A, C

The Hyperband tuning strategy is a multi-fidelity based tuning strategy that dynamically reallocates resources to the most promising hyperparameter configurations. Hyperband uses both intermediate and final results of training jobs to stop under-performing jobs and reallocate epochs to well-utilized hyperparameter configurations. Hyperband can provide up to three times faster hyperparameter tuning compared to other strategies1. Setting a lower value for the MaxNumberOfTrainingJobs parameter can also reduce the computation time for the hyperparameter tuning job by limiting the number of training jobs that the tuning job can launch. This can help avoid unnecessary or redundant training jobs that do not improve the objective metric.

The other options are not effective ways to reduce the computation time for the hyperparameter tuning job. Increasing the number of hyperparameters will increase the complexity and dimensionality of the search space, which can result in longer computation time and lower performance. Using the grid search tuning strategy will also increase the computation time, as grid search methodically searches through every combination of hyperparameter values, which can be very expensive and inefficient for large search spaces. Setting a lower value for the MaxParallelTrainingJobs parameter will reduce the number of training jobs that can run in parallel, which can slow down the tuning process and increase the waiting time.

References:

* How Hyperparameter Tuning Works

* Best Practices for Hyperparameter Tuning

* HyperparameterTuner

* Amazon SageMaker Automatic Model Tuning now provides up to three times faster hyperparameter tuning with Hyperband

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

AUse the Hyperband tuning strategy.

BIncrease the number of hyperparameters.

CSet a lower value for the MaxNumberOfTrainingJobs parameter.

DUse the grid search tuning strategy

ESet a lower value for the MaxParallelTrainingJobs parameter.

0 / 1500

Question 3

A company is setting up a mechanism for data scientists and engineers from different departments to access an Amazon SageMaker Studio domain. Each department has a unique SageMaker Studio domain.

The company wants to build a central proxy application that data scientists and engineers can log in to by using their corporate credentials. The proxy application will authenticate users by using the company's existing Identity provider (IdP). The application will then route users to the appropriate SageMaker Studio domain.

The company plans to maintain a table in Amazon DynamoDB that contains SageMaker domains for each department.

How should the company meet these requirements?

AUse the SageMaker CreatePresignedDomainUrl API to generate a presigned URL for each domain according to the DynamoDB table. Pass the presigned URL to the proxy application.

BUse the SageMaker CreateHuman TaskUi API to generate a UI URL. Pass the URL to the proxy application.

CUse the Amazon SageMaker ListHumanTaskUis API to list all UI URLs. Pass the appropriate URL to the DynamoDB table so that the proxy application can use the URL.

DUse the SageMaker CreatePresignedNotebookInstanceUrl API to generate a presigned URL. Pass the presigned URL to the proxy application.

Correct : A

The SageMaker CreatePresignedDomainUrl API is the best option to meet the requirements of the company. This API creates a URL for a specified UserProfile in a Domain. When accessed in a web browser, the user will be automatically signed in to the domain, and granted access to all of the Apps and files associated with the Domain's Amazon Elastic File System (EFS) volume. This API can only be called when the authentication mode equals IAM, which means the company can use its existing IdP to authenticate users. The company can use the DynamoDB table to store the domain IDs and user profile names for each department, and use the proxy application to query the table and generate the presigned URL for the appropriate domain according to the user's credentials. The presigned URL is valid only for a specified duration, which can be set by the SessionExpirationDurationInSeconds parameter. This can help enhance the security and prevent unauthorized access to the domains.

The other options are not suitable for the company's requirements. The SageMaker CreateHumanTaskUi API is used to define the settings for the human review workflow user interface, which is not related to accessing the SageMaker Studio domains. The SageMaker ListHumanTaskUis API is used to return information about the human task user interfaces in the account, which is also not relevant to the company's use case. The SageMaker CreatePresignedNotebookInstanceUrl API is used to create a URL to connect to the Jupyter server from a notebook instance, which is different from accessing the SageMaker Studio domain.

References:

* CreatePresignedDomainUrl

* CreatePresignedNotebookInstanceUrl

* CreateHumanTaskUi

* ListHumanTaskUis

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

AUse the SageMaker CreatePresignedDomainUrl API to generate a presigned URL for each domain according to the DynamoDB table. Pass the presigned URL to the proxy application.

BUse the SageMaker CreateHuman TaskUi API to generate a UI URL. Pass the URL to the proxy application.

CUse the Amazon SageMaker ListHumanTaskUis API to list all UI URLs. Pass the appropriate URL to the DynamoDB table so that the proxy application can use the URL.

DUse the SageMaker CreatePresignedNotebookInstanceUrl API to generate a presigned URL. Pass the presigned URL to the proxy application.

0 / 1500

Question 4

A global bank requires a solution to predict whether customers will leave the bank and choose another bank. The bank is using a dataset to train a model to predict customer loss. The training dataset has 1,000 rows. The training dataset includes 100 instances of customers who left the bank.

A machine learning (ML) specialist is using Amazon SageMaker Data Wrangler to train a churn prediction model by using a SageMaker training job. After training, the ML specialist notices that the model returns only false results. The ML specialist must correct the model so that it returns more accurate predictions.

Which solution will meet these requirements?

AApply anomaly detection to remove outliers from the training dataset before training.

BApply Synthetic Minority Oversampling Technique (SMOTE) to the training dataset before training.

CApply normalization to the features of the training dataset before training.

DApply undersampling to the training dataset before training.

Correct : B

The best solution to meet the requirements is to apply Synthetic Minority Oversampling Technique (SMOTE) to the training dataset before training. SMOTE is a technique that generates synthetic samples for the minority class by interpolating between existing samples. This can help balance the class distribution and provide more information to the model. SMOTE can improve the performance of the model on the minority class, which is the class of interest in churn prediction. SMOTE can be applied using the SageMaker Data Wrangler, which provides a built-in analysis for oversampling the minority class1.

The other options are not effective solutions for the problem. Applying anomaly detection to remove outliers from the training dataset before training may not improve the model's accuracy, as outliers may not be the main cause of the false results. Moreover, removing outliers may reduce the diversity of the data and make the model less robust. Applying normalization to the features of the training dataset before training may improve the model's convergence and stability, but it does not address the class imbalance issue. Normalization can also be applied using the SageMaker Data Wrangler, which provides a built-in transformation for scaling the features2. Applying undersampling to the training dataset before training may reduce the class imbalance, but it also discards potentially useful information from the majority class. Undersampling can also result in underfitting and high bias for the model.

References:

* Analyze and Visualize

* Transform and Export

* SMOTE for Imbalanced Classification with Python

* Churn prediction using Amazon SageMaker built-in tabular algorithms LightGBM, CatBoost, TabTransformer, and AutoGluon-Tabular

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

AApply anomaly detection to remove outliers from the training dataset before training.

BApply Synthetic Minority Oversampling Technique (SMOTE) to the training dataset before training.

CApply normalization to the features of the training dataset before training.

DApply undersampling to the training dataset before training.

0 / 1500

Question 5

A developer at a retail company is creating a daily demand forecasting model. The company stores the historical hourly demand data in an Amazon S3 bucket. However, the historical data does not include demand data for some hours.

The developer wants to verify that an autoregressive integrated moving average (ARIMA) approach will be a suitable model for the use case.

How should the developer verify the suitability of an ARIMA approach?

AUse Amazon SageMaker Data Wrangler. Import the data from Amazon S3. Impute hourly missing data. Perform a Seasonal Trend decomposition.

BUse Amazon SageMaker Autopilot. Create a new experiment that specifies the S3 data location. Choose ARIMA as the machine learning (ML) problem. Check the model performance.

CUse Amazon SageMaker Data Wrangler. Import the data from Amazon S3. Resample data by using the aggregate daily total. Perform a Seasonal Trend decomposition.

DUse Amazon SageMaker Autopilot. Create a new experiment that specifies the S3 data location. Impute missing hourly values. Choose ARIMA as the machine learning (ML) problem. Check the model performance.

Correct : A

The best solution to verify the suitability of an ARIMA approach is to use Amazon SageMaker Data Wrangler. Data Wrangler is a feature of SageMaker Studio that provides an end-to-end solution for importing, preparing, transforming, featurizing, and analyzing data. Data Wrangler includes built-in analyses that help generate visualizations and data insights in a few clicks. One of the built-in analyses is the Seasonal-Trend decomposition, which can be used to decompose a time series into its trend, seasonal, and residual components. This analysis can help the developer understand the patterns and characteristics of the time series, such as stationarity, seasonality, and autocorrelation, which are important for choosing an appropriate ARIMA model. Data Wrangler also provides built-in transformations that can help the developer handle missing data, such as imputing with mean, median, mode, or constant values, or dropping rows with missing values. Imputing missing data can help avoid gaps and irregularities in the time series, which can affect the ARIMA model performance. Data Wrangler also allows the developer to export the prepared data and the analysis code to various destinations, such as SageMaker Processing, SageMaker Pipelines, or SageMaker Feature Store, for further processing and modeling.

The other options are not suitable for verifying the suitability of an ARIMA approach. Amazon SageMaker Autopilot is a feature-set that automates key tasks of an automatic machine learning (AutoML) process. It explores the data, selects the algorithms relevant to the problem type, and prepares the data to facilitate model training and tuning. However, Autopilot does not support ARIMA as a machine learning problem type, and it does not provide any visualization or analysis of the time series data. Resampling data by using the aggregate daily total can reduce the granularity and resolution of the time series, which can affect the ARIMA model accuracy and applicability.

References:

* Analyze and Visualize

* Transform and Export

* Amazon SageMaker Autopilot

* ARIMA Model -- Complete Guide to Time Series Forecasting in Python

Options Selected by Other Users:

Mark Question:

Start a Discussions

Submit Your Answer:

AUse Amazon SageMaker Data Wrangler. Import the data from Amazon S3. Impute hourly missing data. Perform a Seasonal Trend decomposition.

BUse Amazon SageMaker Autopilot. Create a new experiment that specifies the S3 data location. Choose ARIMA as the machine learning (ML) problem. Check the model performance.

CUse Amazon SageMaker Data Wrangler. Import the data from Amazon S3. Resample data by using the aggregate daily total. Perform a Seasonal Trend decomposition.

0 / 1500

Page: 1 / 65
Total 324 questions

Unlock Full
MLS-C01 Exam Features

In Just $49 You can Access

All Official Question Types
Interactive Web-Based Practice Test Software
No Installation or 3rd Party Software Required
Customize your practice sessions (Free Demo)
24/7 Customer Support

Get Full Access Now

Marked Questions
MLS-C01 Exam

MLS-C01 Exam Question 1
MLS-C01 Exam Question 2
MLS-C01 Exam Question 3
MLS-C01 Exam Question 4
MLS-C01 Exam Question 5

Download PDF File Demo

Try Web-Based Exam Practice Software Demo

Commenting

In order to participate in the comments you need to be logged-in.
You can sign-up or login