Question 1

Which of these numbers are adjusted by a neural network as it learns from a training dataset (select 2 answers)?

Accepted Answer

A)  Weights 
B)  Biases 
C)  Continuous features 
D)  Input values 
A)  Weights 
B)  Biases 
C)  Continuous features 
D)  Input values

Question 2

You are designing storage for very large text files for a data pipeline on Google Cloud. You want to support ANSI SQL queries. You also want to support compression and parallel load from the input locations using Google recommended practices. What should you do?

Accepted Answer

A)  Transform text files to compressed Avro using Cloud Dataflow. Use BigQuery for storage and query. 
B)  Transform text files to compressed Avro using Cloud Dataflow. Use Cloud Storage and BigQuery permanent linked tables for query. 
C)  Compress text files to gzip using the Grid Computing Tools. Use BigQuery for storage and query. 
D)  Compress text files to gzip using the Grid Computing Tools. Use Cloud Storage, and then import into Cloud Bigtable for query. 
A)  Transform text files to compressed Avro using Cloud Dataflow. Use BigQuery for storage and query. 
B)  Transform text files to compressed Avro using Cloud Dataflow. Use Cloud Storage and BigQuery permanent linked tables for query. 
C)  Compress text files to gzip using the Grid Computing Tools. Use BigQuery for storage and query. 
D)  Compress text files to gzip using the Grid Computing Tools. Use Cloud Storage, and then import into Cloud Bigtable for query.

Question 3

You are building a model to make clothing recommendations. You know a user's fashion preference is likely to change over time, so you build a data pipeline to stream new data back to the model as it becomes available. How should you use this data to train the model?

Accepted Answer

A)  Continuously retrain the model on just the new data. 
B)  Continuously retrain the model on a combination of existing data and the new data. 
C)  Train on the existing data while using the new data as your test set. 
D)  Train on the new data while using the existing data as your test set. 
A)  Continuously retrain the model on just the new data. 
B)  Continuously retrain the model on a combination of existing data and the new data. 
C)  Train on the existing data while using the new data as your test set. 
D)  Train on the new data while using the existing data as your test set.

Question 4

You are implementing security best practices on your data pipeline. Currently, you are manually executing jobs as the Project Owner. You want to automate these jobs by taking nightly batch files containing non-public information from Google Cloud Storage, processing them with a Spark Scala job on a Google Cloud Dataproc cluster, and depositing the results into Google BigQuery. How should you securely run this workload?

Accepted Answer

A)  Restrict the Google Cloud Storage bucket so only you can see the files 
B)  Grant the Project Owner role to a service account, and run the job with it 
C)  Use a service account with the ability to read the batch files and to write to BigQuery 
D)  Use a user account with the Project Viewer role on the Cloud Dataproc cluster to read the batch files and write to BigQuery 
A)  Restrict the Google Cloud Storage bucket so only you can see the files 
B)  Grant the Project Owner role to a service account, and run the job with it 
C)  Use a service account with the ability to read the batch files and to write to BigQuery 
D)  Use a user account with the Project Viewer role on the Cloud Dataproc cluster to read the batch files and write to BigQuery

Question 5

You have a requirement to insert minute-resolution data from 50,000 sensors into a BigQuery table. You expect significant growth in data volume and need the data to be available within 1 minute of ingestion for real-time analysis of aggregated trends. What should you do?

Accepted Answer

A)  Use bq load to load a batch of sensor data every 60 seconds. Use bq load to load a batch of sensor data every 60 seconds. 
B)  Use a Cloud Dataflow pipeline to stream data into the BigQuery table. 
C)  Use the INSERT statement to insert a batch of data every 60 seconds. 
D)  Use the MERGE statement to apply updates in batch every 60 seconds. 
A)  Use bq load to load a batch of sensor data every 60 seconds. Use bq load to load a batch of sensor data every 60 seconds. 
B)  Use a Cloud Dataflow pipeline to stream data into the BigQuery table. 
C)  Use the INSERT statement to insert a batch of data every 60 seconds. 
D)  Use the MERGE statement to apply updates in batch every 60 seconds.

Question 6

You have a data stored in BigQuery. The data in the BigQuery dataset must be highly available. You need to define a storage, backup, and recovery strategy of this data that minimizes cost. How should you configure the BigQuery table?

Accepted Answer

A)  Set the BigQuery dataset to be regional. In the event of an emergency, use a point-in-time snapshot to recover the data. 
B)  Set the BigQuery dataset to be regional. Create a scheduled query to make copies of the data to tables suffixed with the time of the backup. In the event of an emergency, use the backup copy of the table. 
C)  Set the BigQuery dataset to be multi-regional. In the event of an emergency, use a point-in-time snapshot to recover the data. 
D)  Set the BigQuery dataset to be multi-regional. Create a scheduled query to make copies of the data to tables suffixed with the time of the backup. In the event of an emergency, use the backup copy of the table. 
A)  Set the BigQuery dataset to be regional. In the event of an emergency, use a point-in-time snapshot to recover the data. 
B)  Set the BigQuery dataset to be regional. Create a scheduled query to make copies of the data to tables suffixed with the time of the backup. In the event of an emergency, use the backup copy of the table. 
C)  Set the BigQuery dataset to be multi-regional. In the event of an emergency, use a point-in-time snapshot to recover the data. 
D)  Set the BigQuery dataset to be multi-regional. Create a scheduled query to make copies of the data to tables suffixed with the time of the backup. In the event of an emergency, use the backup copy of the table.

Question 7

Your company is currently setting up data pipelines for their campaign. For all the Google Cloud Pub/Sub streaming data, one of the important business requirements is to be able to periodically identify the inputs and their timings during their campaign. Engineers have decided to use windowing and transformation in Google Cloud Dataflow for this purpose. However, when testing this feature, they find that the Cloud Dataflow job fails for the all streaming insert. What is the most likely cause of this problem?

Accepted Answer

A)  They have not assigned the timestamp, which causes the job to fail 
B)  They have not set the triggers to accommodate the data coming in late, which causes the job to fail 
C)  They have not applied a global windowing function, which causes the job to fail when the pipeline is created 
D)  They have not applied a non-global windowing function, which causes the job to fail when the pipeline is created 
A)  They have not assigned the timestamp, which causes the job to fail 
B)  They have not set the triggers to accommodate the data coming in late, which causes the job to fail 
C)  They have not applied a global windowing function, which causes the job to fail when the pipeline is created 
D)  They have not applied a non-global windowing function, which causes the job to fail when the pipeline is created

Question 8

Which Cloud Dataflow / Beam feature should you use to aggregate data in an unbounded data source every hour based on the time when the data entered the pipeline?

Accepted Answer

A)  An hourly watermark 
B)  An event time trigger 
C)  The with Allowed Lateness method 
D)  A processing time trigger 
A)  An hourly watermark 
B)  An event time trigger 
C)  The with Allowed Lateness method 
D)  A processing time trigger

Question 9

You have a query that filters a BigQuery table using a WHERE clause on timestamp and ID columns. By using bq query - -dry_run you learn that the query triggers a full scan of the table, even though the filter on timestamp and ID select a tiny fraction of the overall data. You want to reduce the amount of data scanned by BigQuery with minimal changes to existing SQL queries. What should you do?

Accepted Answer

A)  Create a separate table for each ID. 
B)  Use the LIMIT keyword to reduce the number of rows returned. 
C)  Recreate the table with a partitioning column and clustering column. 
D)  Use the bq query - -maximum_bytes_billed flag to restrict the number of bytes billed. Use the bq query - -maximum_bytes_billed flag to restrict the number of bytes billed. 
A)  Create a separate table for each ID. 
B)  Use the LIMIT keyword to reduce the number of rows returned. 
C)  Recreate the table with a partitioning column and clustering column. 
D)  Use the bq query - -maximum_bytes_billed flag to restrict the number of bytes billed. Use the bq query - -maximum_bytes_billed flag to restrict the number of bytes billed.

Question 10

Which is not a valid reason for poor Cloud Bigtable performance?

Accepted Answer

A)  The workload isn't appropriate for Cloud Bigtable. 
B)  The table's schema is not designed correctly. 
C)  The Cloud Bigtable cluster has too many nodes. 
D)  There are issues with the network connection. 
A)  The workload isn't appropriate for Cloud Bigtable. 
B)  The table's schema is not designed correctly. 
C)  The Cloud Bigtable cluster has too many nodes. 
D)  There are issues with the network connection.

Question 11

Your analytics team wants to build a simple statistical model to determine which customers are most likely to work with your company again, based on a few different metrics. They want to run the model on Apache Spark, using data housed in Google Cloud Storage, and you have recommended using Google Cloud Dataproc to execute this job. Testing has shown that this workload can run in approximately 30 minutes on a 15-node cluster, outputting the results into Google BigQuery. The plan is to run this workload weekly. How should you optimize the cluster for cost?

Accepted Answer

A)  Migrate the workload to Google Cloud Dataflow 
B)  Use pre-emptible virtual machines (VMs) for the cluster 
C)  Use a higher-memory node so that the job runs faster 
D)  Use SSDs on the worker nodes so that the job can run faster 
A)  Migrate the workload to Google Cloud Dataflow 
B)  Use pre-emptible virtual machines (VMs) for the cluster 
C)  Use a higher-memory node so that the job runs faster 
D)  Use SSDs on the worker nodes so that the job can run faster

Question 12

You are designing an Apache Beam pipeline to enrich data from Cloud Pub/Sub with static reference data from BigQuery. The reference data is small enough to fit in memory on a single worker. The pipeline should write enriched results to BigQuery for analysis. Which job type and transforms should this pipeline use?

Accepted Answer

A)  Batch job, PubSubIO, side-inputs 
B)  Streaming job, PubSubIO, JdbcIO, side-outputs 
C)  Streaming job, PubSubIO, BigQueryIO, side-inputs 
D)  Streaming job, PubSubIO, BigQueryIO, side-outputs 
A)  Batch job, PubSubIO, side-inputs 
B)  Streaming job, PubSubIO, JdbcIO, side-outputs 
C)  Streaming job, PubSubIO, BigQueryIO, side-inputs 
D)  Streaming job, PubSubIO, BigQueryIO, side-outputs

Question 13

When running a pipeline that has a BigQuery source, on your local machine, you continue to get permission denied errors. What could be the reason for that?

Accepted Answer

A)  Your gcloud does not have access to the BigQuery resources 
B)  BigQuery cannot be accessed from local machines 
C)  You are missing gcloud on your machine 
D)  Pipelines cannot be run locally 
A)  Your gcloud does not have access to the BigQuery resources 
B)  BigQuery cannot be accessed from local machines 
C)  You are missing gcloud on your machine 
D)  Pipelines cannot be run locally

Question 14

Which of the following statements about Legacy SQL and Standard SQL is not true?

Accepted Answer

A)  Standard SQL is the preferred query language for BigQuery. 
B)  If you write a query in Legacy SQL, it might generate an error if you try to run it with Standard SQL. 
C)  One difference between the two query languages is how you specify fully-qualified table names (i.e. table names that include their associated project name). 
D)  You need to set a query language for each dataset and the default is Standard SQL. 
A)  Standard SQL is the preferred query language for BigQuery. 
B)  If you write a query in Legacy SQL, it might generate an error if you try to run it with Standard SQL. 
C)  One difference between the two query languages is how you specify fully-qualified table names (i.e. table names that include their associated project name). 
D)  You need to set a query language for each dataset and the default is Standard SQL.

Question 15

You are choosing a NoSQL database to handle telemetry data submitted from millions of Internet-of-Things (IoT) devices. The volume of data is growing at 100 TB per year, and each data entry has about 100 attributes. The data processing pipeline does not require atomicity, consistency, isolation, and durability (ACID). However, high availability and low latency are required. You need to analyze the data by querying against individual fields. Which three databases meet your requirements? (Choose three.)

Accepted Answer

A)  Redis 
B)  HBase 
C)  MySQL 
D)  MongoDB 
E)  Cassandra
F) HDFS with Hive 
A)  Redis 
B)  HBase 
C)  MySQL 
D)  MongoDB 
E)  Cassandra
F) HDFS with Hive

Question 16

What is the general recommendation when designing your row keys for a Cloud Bigtable schema?

Accepted Answer

A)  Include multiple time series values within the row key 
B)  Keep the row keep as an 8 bit integer 
C)  Keep your row key reasonably short 
D)  Keep your row key as long as the field permits 
A)  Include multiple time series values within the row key 
B)  Keep the row keep as an 8 bit integer 
C)  Keep your row key reasonably short 
D)  Keep your row key as long as the field permits

Question 17

Flowlogistic Case Study Company Overview Flowlogistic is a leading logistics and supply chain provider. They help businesses throughout the world manage their resources and transport them to their final destination. The company has grown rapidly, expanding their offerings to include rail, truck, aircraft, and oceanic shipping. Company Background The company started as a regional trucking company, and then expanded into other logistics market. Because they have not updated their infrastructure, managing and tracking orders and shipments has become a bottleneck. To improve operations, Flowlogistic developed proprietary technology for tracking shipments in real time at the parcel level. However, they are unable to deploy it because their technology stack, based on Apache Kafka, cannot support the processing volume. In addition, Flowlogistic wants to further analyze their orders and shipments to determine how best to deploy their resources. Solution Concept Flowlogistic wants to implement two concepts using the cloud: Use their proprietary technology in a real-time inventory-tracking system that indicates the location of their loads Perform analytics on all their orders and shipment logs, which contain both structured and unstructured data, to determine how best to deploy resources, which markets to expand info. They also want to use predictive analytics to learn earlier when a shipment will be delayed. Existing Technical Environment Flowlogistic architecture resides in a single data center: Databases 8 physical servers in 2 clusters - SQL Server - user data, inventory, static data 3 physical servers - Cassandra - metadata, tracking messages 10 Kafka servers - tracking message aggregation and batch insert Application servers - customer front end, middleware for order/customs 60 virtual machines across 20 physical servers - Tomcat - Java services - Nginx - static content - Batch servers Storage appliances - iSCSI for virtual machine (VM) hosts - Fibre Channel storage area network (FC SAN) - SQL server storage - Network-attached storage (NAS) image storage, logs, backups 10 Apache Hadoop /Spark servers - Core Data Lake - Data analysis workloads 20 miscellaneous servers - Jenkins, monitoring, bastion hosts, Business Requirements Build a reliable and reproducible environment with scaled panty of production. Aggregate data in a centralized Data Lake for analysis Use historical data to perform predictive analytics on future shipments Accurately track every shipment worldwide using proprietary technology Improve business agility and speed of innovation through rapid provisioning of new resources Analyze and optimize architecture for performance in the cloud Migrate fully to the cloud if all other requirements are met Technical Requirements Handle both streaming and batch data Migrate existing Hadoop workloads Ensure architecture is scalable and elastic to meet the changing demands of the company. Use managed services whenever possible Encrypt data flight and at rest Connect a VPN between the production data center and cloud environment SEO Statement We have grown so quickly that our inability to upgrade our infrastructure is really hampering further growth and efficiency. We are efficient at moving shipments around the world, but we are inefficient at moving data around. We need to organize our information so we can more easily understand where our customers are and what they are shipping. CTO Statement IT has never been a priority for us, so as our data has grown, we have not invested enough in our technology. I have a good staff to manage IT, but they are so busy managing our infrastructure that I cannot get them to do the things that really matter, such as organizing our data, building the analytics, and figuring out how to implement the CFO' s tracking technology. CFO Statement Part of our competitive advantage is that we penalize ourselves for late shipments and deliveries. Knowing where out shipments are at all times has a direct correlation to our bottom line and profitability. Additionally, I don't want to commit capital to building out a server environment. Flowlogistic is rolling out their real-time inventory tracking system. The tracking devices will all send package-tracking messages, which will now go to a single Google Cloud Pub/Sub topic instead of the Apache Kafka cluster. A subscriber application will then process the messages for real-time reporting and store them in Google BigQuery for historical analysis. You want to ensure the package data can be analyzed over time. Which approach should you take?

Accepted Answer

A)  Attach the timestamp on each message in the Cloud Pub/Sub subscriber application as they are received. 
B)  Attach the timestamp and Package ID on the outbound message from each publisher device as they are sent to Clod Pub/Sub. 
C)  Use the NOW () function in BigQuery to record the event's time. 
D)  Use the automatically generated timestamp from Cloud Pub/Sub to order the data. 
A)  Attach the timestamp on each message in the Cloud Pub/Sub subscriber application as they are received. 
B)  Attach the timestamp and Package ID on the outbound message from each publisher device as they are sent to Clod Pub/Sub. 
C)  Use the NOW () function in BigQuery to record the event's time. 
D)  Use the automatically generated timestamp from Cloud Pub/Sub to order the data.

Question 18

Which role must be assigned to a service account used by the virtual machines in a Dataproc cluster so they can execute jobs?

Accepted Answer

A)  Dataproc Worker 
B)  Dataproc Viewer 
C)  Dataproc Runner 
D)  Dataproc Editor 
A)  Dataproc Worker 
B)  Dataproc Viewer 
C)  Dataproc Runner 
D)  Dataproc Editor

Question 19

You have a petabyte of analytics data and need to design a storage and processing platform for it. You must be able to perform data warehouse-style analytics on the data in Google Cloud and expose the dataset as files for batch analysis tools in other cloud providers. What should you do?

Accepted Answer

A)  Store and process the entire dataset in BigQuery. 
B)  Store and process the entire dataset in Cloud Bigtable. 
C)  Store the full dataset in BigQuery, and store a compressed copy of the data in a Cloud Storage bucket. 
D)  Store the warm data as files in Cloud Storage, and store the active data in BigQuery. Keep this ratio as 80% warm and 20% active. 
A)  Store and process the entire dataset in BigQuery. 
B)  Store and process the entire dataset in Cloud Bigtable. 
C)  Store the full dataset in BigQuery, and store a compressed copy of the data in a Cloud Storage bucket. 
D)  Store the warm data as files in Cloud Storage, and store the active data in BigQuery. Keep this ratio as 80% warm and 20% active.

Question 20

You work for a financial institution that lets customers register online. As new customers register, their user data is sent to Pub/Sub before being ingested into BigQuery. For security reasons, you decide to redact your customers' Government issued Identification Number while allowing customer service representatives to view the original values when necessary. What should you do?

Accepted Answer

A)  Use BigQuery's built-in AEAD encryption to encrypt the SSN column. Save the keys to a new table that is only viewable by permissioned users. 
B)  Use BigQuery column-level security. Set the table permissions so that only members of the Customer Service user group can see the SSN column. 
C)  Before loading the data into BigQuery, use Cloud Data Loss Prevention (DLP) to replace input values with a cryptographic hash. 
D)  Before loading the data into BigQuery, use Cloud Data Loss Prevention (DLP) to replace input values with a cryptographic format-preserving encryption token. 
A)  Use BigQuery's built-in AEAD encryption to encrypt the SSN column. Save the keys to a new table that is only viewable by permissioned users. 
B)  Use BigQuery column-level security. Set the table permissions so that only members of the Customer Service user group can see the SSN column. 
C)  Before loading the data into BigQuery, use Cloud Data Loss Prevention (DLP) to replace input values with a cryptographic hash. 
D)  Before loading the data into BigQuery, use Cloud Data Loss Prevention (DLP) to replace input values with a cryptographic format-preserving encryption token.

Which of these numbers are adjusted by a neural network as it learns from a training dataset (select 2 answers)?

You are designing storage for very large text files for a data pipeline on Google Cloud. You want to support ANSI SQL queries. You also want to support compression and parallel load from the input locations using Google recommended practices. What should you do?

You are building a model to make clothing recommendations. You know a user's fashion preference is likely to change over time, so you build a data pipeline to stream new data back to the model as it becomes available. How should you use this data to train the model?

You have a requirement to insert minute-resolution data from 50,000 sensors into a BigQuery table. You expect significant growth in data volume and need the data to be available within 1 minute of ingestion for real-time analysis of aggregated trends. What should you do?

You have a data stored in BigQuery. The data in the BigQuery dataset must be highly available. You need to define a storage, backup, and recovery strategy of this data that minimizes cost. How should you configure the BigQuery table?

Which Cloud Dataflow / Beam feature should you use to aggregate data in an unbounded data source every hour based on the time when the data entered the pipeline?

Which is not a valid reason for poor Cloud Bigtable performance?

When running a pipeline that has a BigQuery source, on your local machine, you continue to get permission denied errors. What could be the reason for that?

Which of the following statements about Legacy SQL and Standard SQL is not true?

What is the general recommendation when designing your row keys for a Cloud Bigtable schema?

Which role must be assigned to a service account used by the virtual machines in a Dataproc cluster so they can execute jobs?

You have a petabyte of analytics data and need to design a storage and processing platform for it. You must be able to perform data warehouse-style analytics on the data in Google Cloud and expose the dataset as files for batch analysis tools in other cloud providers. What should you do?

Google AdWords: Display Advertising

Google AdWords Fundamentals

Associate Android Developer

Associate Cloud Engineer

Cloud Digital Leader

Google Analytics Individual Qualification (IQ)

Google Analytics Individual Qualification

GSuite

Looker Business Analyst

LookML Developer

Mobile Web Specialist

Professional Cloud Architect on Google Cloud Platform

Professional Cloud Developer

Professional Cloud DevOps Engineer

Professional Cloud Network Engineer

Professional Cloud Security Engineer

Professional Collaboration Engineer

Professional Machine Learning Engineer

Filters

Exam 18: Professional Data Engineer on Google Cloud Platform

Which of these numbers are adjusted by a neural network as it learns from a training dataset (select 2 answers)?

You are designing storage for very large text files for a data pipeline on Google Cloud. You want to support ANSI SQL queries. You also want to support compression and parallel load from the input locations using Google recommended practices. What should you do?

You are building a model to make clothing recommendations. You know a user's fashion preference is likely to change over time, so you build a data pipeline to stream new data back to the model as it becomes available. How should you use this data to train the model?

You have a requirement to insert minute-resolution data from 50,000 sensors into a BigQuery table. You expect significant growth in data volume and need the data to be available within 1 minute of ingestion for real-time analysis of aggregated trends. What should you do?

You have a data stored in BigQuery. The data in the BigQuery dataset must be highly available. You need to define a storage, backup, and recovery strategy of this data that minimizes cost. How should you configure the BigQuery table?

Which Cloud Dataflow / Beam feature should you use to aggregate data in an unbounded data source every hour based on the time when the data entered the pipeline?

Which is not a valid reason for poor Cloud Bigtable performance?

When running a pipeline that has a BigQuery source, on your local machine, you continue to get permission denied errors. What could be the reason for that?

Which of the following statements about Legacy SQL and Standard SQL is not true?

What is the general recommendation when designing your row keys for a Cloud Bigtable schema?

Which role must be assigned to a service account used by the virtual machines in a Dataproc cluster so they can execute jobs?

You have a petabyte of analytics data and need to design a storage and processing platform for it. You must be able to perform data warehouse-style analytics on the data in Google Cloud and expose the dataset as files for batch analysis tools in other cloud providers. What should you do?

Google AdWords: Display Advertising

Google AdWords Fundamentals

Associate Android Developer

Associate Cloud Engineer

Cloud Digital Leader

Google Analytics Individual Qualification (IQ)

Google Analytics Individual Qualification

GSuite

Looker Business Analyst

LookML Developer

Mobile Web Specialist

Professional Cloud Architect on Google Cloud Platform

Professional Cloud Developer

Professional Cloud DevOps Engineer

Professional Cloud Network Engineer

Professional Cloud Security Engineer

Professional Collaboration Engineer

Professional Machine Learning Engineer

Filters