Exam 18: Professional Data Engineer on Google Cloud Platform

You have historical data covering the last three years in BigQuery and a data pipeline that delivers new data to BigQuery daily. You have noticed that when the Data Science team runs a query filtered on a date column and limited to 30-90 days of data, the query scans the entire table. You also noticed that your bill is increasing more quickly than you expected. You want to resolve the issue as cost-effectively as possible while maintaining the ability to conduct SQL queries. What should you do?

(Multiple Choice)

4.9/5

(31)

Question 101

What are two of the benefits of using denormalized data structures in BigQuery?

(Multiple Choice)

5.0/5

(40)

Question 102

You want to use a BigQuery table as a data sink. In which writing mode(s) can you use BigQuery as a sink?

(Multiple Choice)

4.8/5

(44)

Question 103

Cloud Dataproc charges you only for what you really use with _____ billing.

(Multiple Choice)

4.8/5

(42)

Question 104

You need to move 2 PB of historical data from an on-premises storage appliance to Cloud Storage within six months, and your outbound network capacity is constrained to 20 Mb/sec. How should you migrate this data to Cloud Storage?

(Multiple Choice)

4.7/5

(36)

Question 105

You store historic data in Cloud Storage. You need to perform analytics on the historic data. You want to use a solution to detect invalid data entries and perform data transformations that will not require programming or knowledge of SQL. What should you do?

(Multiple Choice)

4.8/5

(25)

Question 106

The CUSTOM tier for Cloud Machine Learning Engine allows you to specify the number of which types of cluster nodes?

(Multiple Choice)

4.8/5

(34)

Question 107

As your organization expands its usage of GCP, many teams have started to create their own projects. Projects are further multiplied to accommodate different stages of deployments and target audiences. Each project requires unique access control configurations. The central IT team needs to have access to all projects. Furthermore, data from Cloud Storage buckets and BigQuery datasets must be shared for use in other projects in an ad hoc way. You want to simplify access control management by minimizing the number of policies. Which two steps should you take? (Choose two.)

(Multiple Choice)

4.9/5

(30)

Question 108

To run a TensorFlow training job on your own computer using Cloud Machine Learning Engine, what would your command start with?

(Multiple Choice)

4.8/5

(33)

Question 109

Your company is performing data preprocessing for a learning algorithm in Google Cloud Dataflow. Numerous data logs are being are being generated during this step, and the team wants to analyze them. Due to the dynamic nature of the campaign, the data is growing exponentially every hour. The data scientists have written the following code to read the data for a new key features in the logs. BigQueryIO.Read .named("ReadLogData") .from("clouddataflow-readonly:samples.log_data") You want to improve the performance of this data read. What should you do?

(Multiple Choice)

4.9/5

(36)

Question 110

You're using Bigtable for a real-time application, and you have a heavy load that is a mix of read and writes. You've recently identified an additional use case and need to perform hourly an analytical job to calculate certain statistics across the whole database. You need to ensure both the reliability of your production application as well as the analytical workload. What should you do?

(Multiple Choice)

4.8/5

(36)

Question 111

You are operating a Cloud Dataflow streaming pipeline. The pipeline aggregates events from a Cloud Pub/Sub subscription source, within a window, and sinks the resulting aggregation to a Cloud Storage bucket. The source has consistent throughput. You want to monitor an alert on behavior of the pipeline with Cloud Stackdriver to ensure that it is processing data. Which Stackdriver alerts should you create?

(Multiple Choice)

4.8/5

(34)

Question 112

You are building a new application that you need to collect data from in a scalable way. Data arrives continuously from the application throughout the day, and you expect to generate approximately 150 GB of JSON data per day by the end of the year. Your requirements are: Decoupling producer from consumer Space and cost-efficient storage of the raw ingested data, which is to be stored indefinitely Near real-time SQL query Maintain at least 2 years of historical data, which will be queried with SQL Which pipeline should you use to meet these requirements?

(Multiple Choice)

4.8/5

(28)

Question 113

You want to analyze hundreds of thousands of social media posts daily at the lowest cost and with the fewest steps. You have the following requirements: You will batch-load the posts once per day and run them through the Cloud Natural Language API. You will extract topics and sentiment from the posts. You must store the raw posts for archiving and reprocessing. You will create dashboards to be shared with people both inside and outside your organization. You need to store both the data extracted from the API to perform analysis as well as the raw social media posts for historical archiving. What should you do?

(Multiple Choice)

4.8/5

(33)

Question 114

Which of the following is NOT one of the three main types of triggers that Dataflow supports?

(Multiple Choice)

4.8/5

(33)

Question 115

You create an important report for your large team in Google Data Studio 360. The report uses Google BigQuery as its data source. You notice that visualizations are not showing data that is less than 1 hour old. What should you do?

(Multiple Choice)

4.8/5

(36)

Question 116

The Dataflow SDKs have been recently transitioned into which Apache service?

(Multiple Choice)

4.9/5

(30)

Question 117

You are deploying a new storage system for your mobile application, which is a media streaming service. You decide the best fit is Google Cloud Datastore. You have entities with multiple properties, some of which can take on multiple values. For example, in the entity 'Movie' the property 'actors' and the property 'tags' have multiple values but the property 'date released' does not. A typical query would ask for all movies with actor=<actorname> ordered by date _ released or all movies with tag=Comedy date_released. How should you avoid a combinatorial explosion in the number of indexes?

(Multiple Choice)

4.8/5

(36)

Question 118

You are designing a data processing pipeline. The pipeline must be able to scale automatically as load increases. Messages must be processed at least once and must be ordered within windows of 1 hour. How should you design the solution?

(Multiple Choice)

4.8/5

(36)

Question 119

You are working on a niche product in the image recognition domain. Your team has developed a model that is dominated by custom C++ TensorFlow ops your team has implemented. These ops are used inside your main training loop and are performing bulky matrix multiplications. It currently takes up to several days to train a model. You want to decrease this time significantly and keep the cost low by using an accelerator on Google Cloud. What should you do?

(Multiple Choice)

4.9/5

(33)

Question 120

Showing 101 - 120 of 256

What are two of the benefits of using denormalized data structures in BigQuery?

You want to use a BigQuery table as a data sink. In which writing mode(s) can you use BigQuery as a sink?

Cloud Dataproc charges you only for what you really use with _____ billing.

You need to move 2 PB of historical data from an on-premises storage appliance to Cloud Storage within six months, and your outbound network capacity is constrained to 20 Mb/sec. How should you migrate this data to Cloud Storage?

You store historic data in Cloud Storage. You need to perform analytics on the historic data. You want to use a solution to detect invalid data entries and perform data transformations that will not require programming or knowledge of SQL. What should you do?

The CUSTOM tier for Cloud Machine Learning Engine allows you to specify the number of which types of cluster nodes?

To run a TensorFlow training job on your own computer using Cloud Machine Learning Engine, what would your command start with?

Which of the following is NOT one of the three main types of triggers that Dataflow supports?

You create an important report for your large team in Google Data Studio 360. The report uses Google BigQuery as its data source. You notice that visualizations are not showing data that is less than 1 hour old. What should you do?

The Dataflow SDKs have been recently transitioned into which Apache service?

You are designing a data processing pipeline. The pipeline must be able to scale automatically as load increases. Messages must be processed at least once and must be ordered within windows of 1 hour. How should you design the solution?

Google AdWords: Display Advertising

Google AdWords Fundamentals

Associate Android Developer

Associate Cloud Engineer

Cloud Digital Leader

Google Analytics Individual Qualification (IQ)

Google Analytics Individual Qualification

GSuite

Looker Business Analyst

LookML Developer

Mobile Web Specialist

Professional Cloud Architect on Google Cloud Platform

Professional Cloud Developer

Professional Cloud DevOps Engineer

Professional Cloud Network Engineer

Professional Cloud Security Engineer

Professional Collaboration Engineer

Professional Machine Learning Engineer

Filters