Question 1

Which is the preferred method to use to avoid hotspotting in time series data in Bigtable?

Accepted Answer

A)  Field promotion 
B)  Randomization 
C)  Salting 
D)  Hashing 
A)  Field promotion 
B)  Randomization 
C)  Salting 
D)  Hashing

Question 2

You need to copy millions of sensitive patient records from a relational database to BigQuery. The total size of the database is 10 TB. You need to design a solution that is secure and time-efficient. What should you do?

Accepted Answer

A)  Export the records from the database as an Avro file. Upload the file to GCS using gsutil, and then load the Avro file into BigQuery using the BigQuery web UI in the GCP Console. 
B)  Export the records from the database as an Avro file. Copy the file onto a Transfer Appliance and send it to Google, and then load the Avro file into BigQuery using the BigQuery web UI in the GCP Console. 
C)  Export the records from the database into a CSV file. Create a public URL for the CSV file, and then use Storage Transfer Service to move the file to Cloud Storage. Load the CSV file into BigQuery using the BigQuery web UI in the GCP Console. 
D)  Export the records from the database as an Avro file. Create a public URL for the Avro file, and then use Storage Transfer Service to move the file to Cloud Storage. Load the Avro file into BigQuery using the BigQuery web UI in the GCP Console. 
A)  Export the records from the database as an Avro file. Upload the file to GCS using gsutil, and then load the Avro file into BigQuery using the BigQuery web UI in the GCP Console. 
B)  Export the records from the database as an Avro file. Copy the file onto a Transfer Appliance and send it to Google, and then load the Avro file into BigQuery using the BigQuery web UI in the GCP Console. 
C)  Export the records from the database into a CSV file. Create a public URL for the CSV file, and then use Storage Transfer Service to move the file to Cloud Storage. Load the CSV file into BigQuery using the BigQuery web UI in the GCP Console. 
D)  Export the records from the database as an Avro file. Create a public URL for the Avro file, and then use Storage Transfer Service to move the file to Cloud Storage. Load the Avro file into BigQuery using the BigQuery web UI in the GCP Console.

Question 3

Which of these rules apply when you add preemptible workers to a Dataproc cluster (select 2 answers)?

Accepted Answer

A)  Preemptible workers cannot use persistent disk. 
B)  Preemptible workers cannot store data. 
C)  If a preemptible worker is reclaimed, then a replacement worker must be added manually. 
D)  A Dataproc cluster cannot have only preemptible workers. 
A)  Preemptible workers cannot use persistent disk. 
B)  Preemptible workers cannot store data. 
C)  If a preemptible worker is reclaimed, then a replacement worker must be added manually. 
D)  A Dataproc cluster cannot have only preemptible workers.

Question 4

You operate an IoT pipeline built around Apache Kafka that normally receives around 5000 messages per second. You want to use Google Cloud Platform to create an alert as soon as the moving average over 1 hour drops below 4000 messages per second. What should you do?

Accepted Answer

A)  Consume the stream of data in Cloud Dataflow using Kafka IO. Set a sliding time window of 1 hour every 5 minutes. Compute the average when the window closes, and send an alert if the average is less than 4000 messages. 
B)  Consume the stream of data in Cloud Dataflow using Kafka IO. Set a fixed time window of 1 hour. Compute the average when the window closes, and send an alert if the average is less than 4000 messages. 
C)  Use Kafka Connect to link your Kafka message queue to Cloud Pub/Sub. Use a Cloud Dataflow template to write your messages from Cloud Pub/Sub to Cloud Bigtable. Use Cloud Scheduler to run a script every hour that counts the number of rows created in Cloud Bigtable in the last hour. If that number falls below 4000, send an alert. 
D)  Use Kafka Connect to link your Kafka message queue to Cloud Pub/Sub. Use a Cloud Dataflow template to write your messages from Cloud Pub/Sub to BigQuery. Use Cloud Scheduler to run a script every five minutes that counts the number of rows created in BigQuery in the last hour. If that number falls below 4000, send an alert. 
A)  Consume the stream of data in Cloud Dataflow using Kafka IO. Set a sliding time window of 1 hour every 5 minutes. Compute the average when the window closes, and send an alert if the average is less than 4000 messages. 
B)  Consume the stream of data in Cloud Dataflow using Kafka IO. Set a fixed time window of 1 hour. Compute the average when the window closes, and send an alert if the average is less than 4000 messages. 
C)  Use Kafka Connect to link your Kafka message queue to Cloud Pub/Sub. Use a Cloud Dataflow template to write your messages from Cloud Pub/Sub to Cloud Bigtable. Use Cloud Scheduler to run a script every hour that counts the number of rows created in Cloud Bigtable in the last hour. If that number falls below 4000, send an alert. 
D)  Use Kafka Connect to link your Kafka message queue to Cloud Pub/Sub. Use a Cloud Dataflow template to write your messages from Cloud Pub/Sub to BigQuery. Use Cloud Scheduler to run a script every five minutes that counts the number of rows created in BigQuery in the last hour. If that number falls below 4000, send an alert.

Question 5

Which of these are examples of a value in a sparse vector? (Select 2 answers.)

Accepted Answer

A)  [0, 5, 0, 0, 0, 0] 
B)  [0, 0, 0, 1, 0, 0, 1] 
C)  [0, 1] 
D)  [1, 0, 0, 0, 0, 0, 0] 
A)  [0, 5, 0, 0, 0, 0] 
B)  [0, 0, 0, 1, 0, 0, 1] 
C)  [0, 1] 
D)  [1, 0, 0, 0, 0, 0, 0]

Question 6

Flowlogistic Case Study Company Overview Flowlogistic is a leading logistics and supply chain provider. They help businesses throughout the world manage their resources and transport them to their final destination. The company has grown rapidly, expanding their offerings to include rail, truck, aircraft, and oceanic shipping. Company Background The company started as a regional trucking company, and then expanded into other logistics market. Because they have not updated their infrastructure, managing and tracking orders and shipments has become a bottleneck. To improve operations, Flowlogistic developed proprietary technology for tracking shipments in real time at the parcel level. However, they are unable to deploy it because their technology stack, based on Apache Kafka, cannot support the processing volume. In addition, Flowlogistic wants to further analyze their orders and shipments to determine how best to deploy their resources. Solution Concept Flowlogistic wants to implement two concepts using the cloud: Use their proprietary technology in a real-time inventory-tracking system that indicates the location of their loads Perform analytics on all their orders and shipment logs, which contain both structured and unstructured data, to determine how best to deploy resources, which markets to expand info. They also want to use predictive analytics to learn earlier when a shipment will be delayed. Existing Technical Environment Flowlogistic architecture resides in a single data center: Databases - 8 physical servers in 2 clusters - SQL Server - user data, inventory, static data - 3 physical servers - Cassandra - metadata, tracking messages 10 Kafka servers - tracking message aggregation and batch insert Application servers - customer front end, middleware for order/customs - 60 virtual machines across 20 physical servers - Tomcat - Java services - Nginx - static content - Batch servers Storage appliances - iSCSI for virtual machine (VM) hosts - Fibre Channel storage area network (FC SAN) - SQL server storage Network-attached storage (NAS) image storage, logs, backups 10 Apache Hadoop /Spark servers - Core Data Lake - Data analysis workloads 20 miscellaneous servers - Jenkins, monitoring, bastion hosts, Business Requirements Build a reliable and reproducible environment with scaled panty of production. Aggregate data in a centralized Data Lake for analysis Use historical data to perform predictive analytics on future shipments Accurately track every shipment worldwide using proprietary technology Improve business agility and speed of innovation through rapid provisioning of new resources Analyze and optimize architecture for performance in the cloud Migrate fully to the cloud if all other requirements are met Technical Requirements Handle both streaming and batch data Migrate existing Hadoop workloads Ensure architecture is scalable and elastic to meet the changing demands of the company. Use managed services whenever possible Encrypt data flight and at rest Connect a VPN between the production data center and cloud environment SEO Statement We have grown so quickly that our inability to upgrade our infrastructure is really hampering further growth and efficiency. We are efficient at moving shipments around the world, but we are inefficient at moving data around. We need to organize our information so we can more easily understand where our customers are and what they are shipping. CTO Statement IT has never been a priority for us, so as our data has grown, we have not invested enough in our technology. I have a good staff to manage IT, but they are so busy managing our infrastructure that I cannot get them to do the things that really matter, such as organizing our data, building the analytics, and figuring out how to implement the CFO' s tracking technology. CFO Statement Part of our competitive advantage is that we penalize ourselves for late shipments and deliveries. Knowing where out shipments are at all times has a direct correlation to our bottom line and profitability. Additionally, I don't want to commit capital to building out a server environment. Flowlogistic's management has determined that the current Apache Kafka servers cannot handle the data volume for their real-time inventory tracking system. You need to build a new system on Google Cloud Platform (GCP) that will feed the proprietary tracking software. The system must be able to ingest data from a variety of global sources, process and query in real-time, and store the data reliably. Which combination of GCP products should you choose?

Accepted Answer

A)  Cloud Pub/Sub, Cloud Dataflow, and Cloud Storage 
B)  Cloud Pub/Sub, Cloud Dataflow, and Local SSD 
C)  Cloud Pub/Sub, Cloud SQL, and Cloud Storage 
D)  Cloud Load Balancing, Cloud Dataflow, and Cloud Storage 
E)  Cloud Dataflow, Cloud SQL, and Cloud Storage 
A)  Cloud Pub/Sub, Cloud Dataflow, and Cloud Storage 
B)  Cloud Pub/Sub, Cloud Dataflow, and Local SSD 
C)  Cloud Pub/Sub, Cloud SQL, and Cloud Storage 
D)  Cloud Load Balancing, Cloud Dataflow, and Cloud Storage 
E)  Cloud Dataflow, Cloud SQL, and Cloud Storage

Question 7

You need to migrate a 2TB relational database to Google Cloud Platform. You do not have the resources to significantly refactor the application that uses this database and cost to operate is of primary concern. Which service do you select for storing and serving your data?

Accepted Answer

A)  Cloud Spanner 
B)  Cloud Bigtable 
C)  Cloud Firestore 
D)  Cloud SQL 
A)  Cloud Spanner 
B)  Cloud Bigtable 
C)  Cloud Firestore 
D)  Cloud SQL

Question 8

You want to use Google Stackdriver Logging to monitor Google BigQuery usage. You need an instant notification to be sent to your monitoring tool when new data is appended to a certain table using an insert job, but you do not want to receive notifications for other tables. What should you do?

Accepted Answer

A)  Make a call to the Stackdriver API to list all logs, and apply an advanced filter. 
B)  In the Stackdriver logging admin interface, and enable a log sink export to BigQuery. 
C)  In the Stackdriver logging admin interface, enable a log sink export to Google Cloud Pub/Sub, and subscribe to the topic from your monitoring tool. 
D)  Using the Stackdriver API, create a project sink with advanced log filter to export to Pub/Sub, and subscribe to the topic from your monitoring tool. 
A)  Make a call to the Stackdriver API to list all logs, and apply an advanced filter. 
B)  In the Stackdriver logging admin interface, and enable a log sink export to BigQuery. 
C)  In the Stackdriver logging admin interface, enable a log sink export to Google Cloud Pub/Sub, and subscribe to the topic from your monitoring tool. 
D)  Using the Stackdriver API, create a project sink with advanced log filter to export to Pub/Sub, and subscribe to the topic from your monitoring tool.

Question 9

Which Java SDK class can you use to run your Dataflow programs locally?

Accepted Answer

A)  LocalRunner 
B)  DirectPipelineRunner 
C)  MachineRunner 
D)  LocalPipelineRunner 
A)  LocalRunner 
B)  DirectPipelineRunner 
C)  MachineRunner 
D)  LocalPipelineRunner

Question 10

An external customer provides you with a daily dump of data from their database. The data flows into Google Cloud Storage GCS as comma-separated values (CSV) files. You want to analyze this data in Google BigQuery, but the data could have rows that are formatted incorrectly or corrupted. How should you build this pipeline?

Accepted Answer

A)  Use federated data sources, and check data in the SQL query. 
B)  Enable BigQuery monitoring in Google Stackdriver and create an alert. 
C)  Import the data into BigQuery using the gcloud CLI and set max_bad_records to 0 . Import the data into BigQuery using the gcloud CLI and set max_bad_records to 0 . 
D)  Run a Google Cloud Dataflow batch pipeline to import the data into BigQuery, and push errors to another dead-letter table for analysis. 
A)  Use federated data sources, and check data in the SQL query. 
B)  Enable BigQuery monitoring in Google Stackdriver and create an alert. 
C)  Import the data into BigQuery using the gcloud CLI and set max_bad_records to 0 . Import the data into BigQuery using the gcloud CLI and set max_bad_records to 0 . 
D)  Run a Google Cloud Dataflow batch pipeline to import the data into BigQuery, and push errors to another dead-letter table for analysis.

Question 11

Which of the following are feature engineering techniques? (Select 2 answers)

Accepted Answer

A)  Hidden feature layers 
B)  Feature prioritization 
C)  Crossed feature columns 
D)  Bucketization of a continuous feature 
A)  Hidden feature layers 
B)  Feature prioritization 
C)  Crossed feature columns 
D)  Bucketization of a continuous feature

Question 12

What is the HBase Shell for Cloud Bigtable?

Accepted Answer

A)  The HBase shell is a GUI based interface that performs administrative tasks, such as creating and deleting tables. 
B)  The HBase shell is a command-line tool that performs administrative tasks, such as creating and deleting tables. 
C)  The HBase shell is a hypervisor based shell that performs administrative tasks, such as creating and deleting new virtualized instances. 
D)  The HBase shell is a command-line tool that performs only user account management functions to grant access to Cloud Bigtable instances. 
A)  The HBase shell is a GUI based interface that performs administrative tasks, such as creating and deleting tables. 
B)  The HBase shell is a command-line tool that performs administrative tasks, such as creating and deleting tables. 
C)  The HBase shell is a hypervisor based shell that performs administrative tasks, such as creating and deleting new virtualized instances. 
D)  The HBase shell is a command-line tool that performs only user account management functions to grant access to Cloud Bigtable instances.

Question 13

MJTelco Case Study Company Overview MJTelco is a startup that plans to build networks in rapidly growing, underserved markets around the world. The company has patents for innovative optical communications hardware. Based on these patents, they can create many reliable, high-speed backbone links with inexpensive hardware. Company Background Founded by experienced telecom executives, MJTelco uses technologies originally developed to overcome communications challenges in space. Fundamental to their operation, they need to create a distributed data infrastructure that drives real-time analysis and incorporates machine learning to continuously optimize their topologies. Because their hardware is inexpensive, they plan to overdeploy the network allowing them to account for the impact of dynamic regional politics on location availability and cost. Their management and operations teams are situated all around the globe creating many-to-many relationship between data consumers and provides in their system. After careful consideration, they decided public cloud is the perfect environment to support their needs. Solution Concept MJTelco is running a successful proof-of-concept (PoC) project in its labs. They have two primary needs: Scale and harden their PoC to support significantly more data flows generated when they ramp to more than 50,000 installations. Refine their machine-learning cycles to verify and improve the dynamic models they use to control topology definition. MJTelco will also use three separate operating environments - development/test, staging, and production - to meet the needs of running experiments, deploying new features, and serving production customers. Business Requirements Scale up their production environment with minimal cost, instantiating resources when and where needed in an unpredictable, distributed telecom user community. Ensure security of their proprietary data to protect their leading-edge machine learning and analysis. Provide reliable and timely access to data for analysis from distributed research workers Maintain isolated environments that support rapid iteration of their machine-learning models without affecting their customers. Technical Requirements Ensure secure and efficient transport and storage of telemetry data Rapidly scale instances to support between 10,000 and 100,000 data providers with multiple flows each. Allow analysis and presentation against data tables tracking up to 2 years of data storing approximately 100m records/day Support rapid iteration of monitoring infrastructure focused on awareness of data pipeline problems both in telemetry flows and in production learning cycles. CEO Statement Our business model relies on our patents, analytics and dynamic machine learning. Our inexpensive hardware is organized to be highly reliable, which gives us cost advantages. We need to quickly stabilize our large distributed data pipelines to meet our reliability and capacity commitments. CTO Statement Our public cloud services must operate as advertised. We need resources that scale and keep our data secure. We also need environments in which our data scientists can carefully study and quickly adapt our models. Because we rely on automation to process our data, we also need our development and test environments to work as we iterate. CFO Statement The project is too large for us to maintain the hardware and software required for the data and analysis. Also, we cannot afford to staff an operations team to monitor so many data feeds, so we will rely on automation and infrastructure. Google Cloud's machine learning will allow our quantitative researchers to work on our high-value problems instead of problems with our data pipelines. MJTelco needs you to create a schema in Google Bigtable that will allow for the historical analysis of the last 2 years of records. Each record that comes in is sent every 15 minutes, and contains a unique identifier of the device and a data record. The most common query is for all the data for a given device for a given day. Which schema should you use?

Accepted Answer

A)  Rowkey: date#device_id Column data: data_point Rowkey: date#device_id Column data: data_point 
B)  Rowkey: date Column data: device_id, data_point date device_id, 
C)  Rowkey: device_id Column data: date, data_point device_id date, data_point 
D)  Rowkey: data_point Column data: device_id, date 
E)  Rowkey: date#data_point Column data: device_id date#data_point 
A)  Rowkey: date#device_id Column data: data_point Rowkey: date#device_id Column data: data_point 
B)  Rowkey: date Column data: device_id, data_point date device_id, 
C)  Rowkey: device_id Column data: date, data_point device_id date, data_point 
D)  Rowkey: data_point Column data: device_id, date 
E)  Rowkey: date#data_point Column data: device_id date#data_point

Question 14

Which row keys are likely to cause a disproportionate number of reads and/or writes on a particular node in a Bigtable cluster (select 2 answers)?

Accepted Answer

A)  A sequential numeric ID 
B)  A timestamp followed by a stock symbol 
C)  A non-sequential numeric ID 
D)  A stock symbol followed by a timestamp 
A)  A sequential numeric ID 
B)  A timestamp followed by a stock symbol 
C)  A non-sequential numeric ID 
D)  A stock symbol followed by a timestamp

Question 15

What are two of the characteristics of using online prediction rather than batch prediction?

Accepted Answer

A)  It is optimized to handle a high volume of data instances in a job and to run more complex models. 
B)  Predictions are returned in the response message. 
C)  Predictions are written to output files in a Cloud Storage location that you specify. 
D)  It is optimized to minimize the latency of serving predictions. 
A)  It is optimized to handle a high volume of data instances in a job and to run more complex models. 
B)  Predictions are returned in the response message. 
C)  Predictions are written to output files in a Cloud Storage location that you specify. 
D)  It is optimized to minimize the latency of serving predictions.

Question 16

Which Google Cloud Platform service is an alternative to Hadoop with Hive?

Accepted Answer

A)  Cloud Dataflow 
B)  Cloud Bigtable 
C)  BigQuery 
D)  Cloud Datastore 
A)  Cloud Dataflow 
B)  Cloud Bigtable 
C)  BigQuery 
D)  Cloud Datastore

Question 17

Your team is working on a binary classification problem. You have trained a support vector machine (SVM) classifier with default parameters, and received an area under the Curve (AUC) of 0.87 on the validation set. You want to increase the AUC of the model. What should you do?

Accepted Answer

A)  Perform hyperparameter tuning 
B)  Train a classifier with deep neural networks, because neural networks would always beat SVMs 
C)  Deploy the model and measure the real-world AUC; it's always higher because of generalization 
D)  Scale predictions you get out of the model (tune a scaling factor as a hyperparameter) in order to get the highest AUC 
A)  Perform hyperparameter tuning 
B)  Train a classifier with deep neural networks, because neural networks would always beat SVMs 
C)  Deploy the model and measure the real-world AUC; it's always higher because of generalization 
D)  Scale predictions you get out of the model (tune a scaling factor as a hyperparameter) in order to get the highest AUC

Question 18

You work for a large fast food restaurant chain with over 400,000 employees. You store employee information in Google BigQuery in a Users table consisting of a FirstName field and a LastName field. A member of IT is building an application and asks you to modify the schema and data in BigQuery so the application can query a FullName field consisting of the value of the field concatenated with a space, followed by the value of the field for each employee. How can you make that data available while minimizing cost?

Accepted Answer

A)  Create a view in BigQuery that concatenates the FirstName and LastName field values to produce the FullName . Create a view in BigQuery that concatenates the and field values to produce the . 
B)  Add a new column called FullName to the Users table. Run an UPDATE statement that updates the FullName column for each user with the concatenation of the FirstName and LastName values. Add a new column called to the Users table. Run an UPDATE statement that updates the column for each user with the concatenation of the and values. 
C)  Create a Google Cloud Dataflow job that queries BigQuery for the entire Users table, concatenates the FirstName value and LastName value for each user, and loads the proper values for FirstName , LastName , and FullName into a new table in BigQuery. Create a Google Cloud Dataflow job that queries BigQuery for the entire table, concatenates the value and value for each user, and loads the proper values for , , and into a new table in BigQuery. 
D)  Use BigQuery to export the data for the table to a CSV file. Create a Google Cloud Dataproc job to process the CSV file and output a new CSV file containing the proper values for FirstName , LastName and FullName . Run a BigQuery load job to load the new CSV file into BigQuery. Use BigQuery to export the data for the table to a CSV file. Create a Google Cloud Dataproc job to process the CSV file and output a new CSV file containing the proper values for , . Run a BigQuery load job to load the new CSV file into BigQuery. 
A)  Create a view in BigQuery that concatenates the FirstName and LastName field values to produce the FullName . Create a view in BigQuery that concatenates the and field values to produce the . 
B)  Add a new column called FullName to the Users table. Run an UPDATE statement that updates the FullName column for each user with the concatenation of the FirstName and LastName values. Add a new column called to the Users table. Run an UPDATE statement that updates the column for each user with the concatenation of the and values. 
C)  Create a Google Cloud Dataflow job that queries BigQuery for the entire Users table, concatenates the FirstName value and LastName value for each user, and loads the proper values for FirstName , LastName , and FullName into a new table in BigQuery. Create a Google Cloud Dataflow job that queries BigQuery for the entire table, concatenates the value and value for each user, and loads the proper values for , , and into a new table in BigQuery. 
D)  Use BigQuery to export the data for the table to a CSV file. Create a Google Cloud Dataproc job to process the CSV file and output a new CSV file containing the proper values for FirstName , LastName and FullName . Run a BigQuery load job to load the new CSV file into BigQuery. Use BigQuery to export the data for the table to a CSV file. Create a Google Cloud Dataproc job to process the CSV file and output a new CSV file containing the proper values for , . Run a BigQuery load job to load the new CSV file into BigQuery.

Question 19

Which of the following IAM roles does your Compute Engine account require to be able to run pipeline jobs?

Accepted Answer

A)  dataflow.worker 
B)  dataflow.compute 
C)  dataflow.developer 
D)  dataflow.viewer 
A)  dataflow.worker 
B)  dataflow.compute 
C)  dataflow.developer 
D)  dataflow.viewer

Question 20

Your company is streaming real-time sensor data from their factory floor into Bigtable and they have noticed extremely poor performance. How should the row key be redesigned to improve Bigtable performance on queries that populate real-time dashboards?

Accepted Answer

A) Use a row key of the form . Use a row key of the form . B) Use a row key of the form . . C) Use a row key of the form #. #. D) Use a row key of the form >##. >##. A) Use a row key of the form . Use a row key of the form . B) Use a row key of the form . . C) Use a row key of the form #. #. D) Use a row key of the form >##. >##.

Which is the preferred method to use to avoid hotspotting in time series data in Bigtable?

You need to copy millions of sensitive patient records from a relational database to BigQuery. The total size of the database is 10 TB. You need to design a solution that is secure and time-efficient. What should you do?

Which of these rules apply when you add preemptible workers to a Dataproc cluster (select 2 answers)?

You operate an IoT pipeline built around Apache Kafka that normally receives around 5000 messages per second. You want to use Google Cloud Platform to create an alert as soon as the moving average over 1 hour drops below 4000 messages per second. What should you do?

Which of these are examples of a value in a sparse vector? (Select 2 answers.)

You need to migrate a 2TB relational database to Google Cloud Platform. You do not have the resources to significantly refactor the application that uses this database and cost to operate is of primary concern. Which service do you select for storing and serving your data?

You want to use Google Stackdriver Logging to monitor Google BigQuery usage. You need an instant notification to be sent to your monitoring tool when new data is appended to a certain table using an insert job, but you do not want to receive notifications for other tables. What should you do?

Which Java SDK class can you use to run your Dataflow programs locally?

Which of the following are feature engineering techniques? (Select 2 answers)

What is the HBase Shell for Cloud Bigtable?

Which row keys are likely to cause a disproportionate number of reads and/or writes on a particular node in a Bigtable cluster (select 2 answers)?

What are two of the characteristics of using online prediction rather than batch prediction?

Which Google Cloud Platform service is an alternative to Hadoop with Hive?

Your team is working on a binary classification problem. You have trained a support vector machine (SVM) classifier with default parameters, and received an area under the Curve (AUC) of 0.87 on the validation set. You want to increase the AUC of the model. What should you do?

Which of the following IAM roles does your Compute Engine account require to be able to run pipeline jobs?

Your company is streaming real-time sensor data from their factory floor into Bigtable and they have noticed extremely poor performance. How should the row key be redesigned to improve Bigtable performance on queries that populate real-time dashboards?

Google AdWords: Display Advertising

Google AdWords Fundamentals

Associate Android Developer

Associate Cloud Engineer

Cloud Digital Leader

Google Analytics Individual Qualification (IQ)

Google Analytics Individual Qualification

GSuite

Looker Business Analyst

LookML Developer

Mobile Web Specialist

Professional Cloud Architect on Google Cloud Platform

Professional Cloud Developer

Professional Cloud DevOps Engineer

Professional Cloud Network Engineer

Professional Cloud Security Engineer

Professional Collaboration Engineer

Professional Machine Learning Engineer

Filters

Exam 18: Professional Data Engineer on Google Cloud Platform

Which is the preferred method to use to avoid hotspotting in time series data in Bigtable?

You need to copy millions of sensitive patient records from a relational database to BigQuery. The total size of the database is 10 TB. You need to design a solution that is secure and time-efficient. What should you do?

Which of these rules apply when you add preemptible workers to a Dataproc cluster (select 2 answers)?

You operate an IoT pipeline built around Apache Kafka that normally receives around 5000 messages per second. You want to use Google Cloud Platform to create an alert as soon as the moving average over 1 hour drops below 4000 messages per second. What should you do?

Which of these are examples of a value in a sparse vector? (Select 2 answers.)

You need to migrate a 2TB relational database to Google Cloud Platform. You do not have the resources to significantly refactor the application that uses this database and cost to operate is of primary concern. Which service do you select for storing and serving your data?

You want to use Google Stackdriver Logging to monitor Google BigQuery usage. You need an instant notification to be sent to your monitoring tool when new data is appended to a certain table using an insert job, but you do not want to receive notifications for other tables. What should you do?

Which Java SDK class can you use to run your Dataflow programs locally?

Which of the following are feature engineering techniques? (Select 2 answers)

What is the HBase Shell for Cloud Bigtable?

Which row keys are likely to cause a disproportionate number of reads and/or writes on a particular node in a Bigtable cluster (select 2 answers)?

What are two of the characteristics of using online prediction rather than batch prediction?

Which Google Cloud Platform service is an alternative to Hadoop with Hive?

Your team is working on a binary classification problem. You have trained a support vector machine (SVM) classifier with default parameters, and received an area under the Curve (AUC) of 0.87 on the validation set. You want to increase the AUC of the model. What should you do?

Which of the following IAM roles does your Compute Engine account require to be able to run pipeline jobs?

Your company is streaming real-time sensor data from their factory floor into Bigtable and they have noticed extremely poor performance. How should the row key be redesigned to improve Bigtable performance on queries that populate real-time dashboards?

Google AdWords: Display Advertising

Google AdWords Fundamentals

Associate Android Developer

Associate Cloud Engineer

Cloud Digital Leader

Google Analytics Individual Qualification (IQ)

Google Analytics Individual Qualification

GSuite

Looker Business Analyst

LookML Developer

Mobile Web Specialist

Professional Cloud Architect on Google Cloud Platform

Professional Cloud Developer

Professional Cloud DevOps Engineer

Professional Cloud Network Engineer

Professional Cloud Security Engineer

Professional Collaboration Engineer

Professional Machine Learning Engineer

Filters