Which of the following code blocks returns a 2-column DataFrame that shows the distinct values in column productId and the number of rows with that productId in DataFrame transactionsDf?
Which of the following code blocks returns a copy of DataFrame itemsDf where the column supplier has been renamed to manufacturer?
Which of the following code blocks creates a new 6-column DataFrame by appending the rows of the 6-column DataFrame yesterdayTransactionsDf to the rows of the 6-column DataFrame
todayTransactionsDf, ignoring that both DataFrames have different column names?
The code block shown below should convert up to 5 rows in DataFrame transactionsDf that have the value 25 in column storeId into a Python list. Choose the answer that correctly fills the blanks in
the code block to accomplish this.
Code block:
transactionsDf.__1__(__2__).__3__(__4__)
In which order should the code blocks shown below be run in order to create a table of all values in column attributes next to the respective values in column supplier in DataFrame itemsDf?
1. itemsDf.createOrReplaceView("itemsDf")
2. spark.sql("FROM itemsDf SELECT 'supplier', explode('Attributes')")
3. spark.sql("FROM itemsDf SELECT supplier, explode(attributes)")
4. itemsDf.createOrReplaceTempView("itemsDf")
The code block displayed below contains an error. The code block is intended to return all columns of DataFrame transactionsDf except for columns predError, productId, and value. Find the error.
Excerpt of DataFrame transactionsDf:
transactionsDf.select(~col("predError"), ~col("productId"), ~col("value"))
Which of the following code blocks sorts DataFrame transactionsDf both by column storeId in ascending and by column productId in descending order, in this priority?
Which of the following code blocks reads in the parquet file stored at location filePath, given that all columns in the parquet file contain only whole numbers and are stored in the most appropriate
format for this kind of data?
The code block displayed below contains an error. The code block should return a DataFrame where all entries in column supplier contain the letter combination et in this order. Find the error.
Code block:
itemsDf.filter(Column('supplier').isin('et'))
The code block shown below should return a two-column DataFrame with columns transactionId and supplier, with combined information from DataFrames itemsDf and transactionsDf. The code
block should merge rows in which column productId of DataFrame transactionsDf matches the value of column itemId in DataFrame itemsDf, but only where column storeId of DataFrame
transactionsDf does not match column itemId of DataFrame itemsDf. Choose the answer that correctly fills the blanks in the code block to accomplish this.
Code block:
transactionsDf.__1__(itemsDf, __2__).__3__(__4__)
The code block shown below should return a one-column DataFrame where the column storeId is converted to string type. Choose the answer that correctly fills the blanks in the code block to
accomplish this.
transactionsDf.__1__(__2__.__3__(__4__))
Which of the following statements about Spark's configuration properties is incorrect?
The code block displayed below contains an error. The code block should arrange the rows of DataFrame transactionsDf using information from two columns in an ordered fashion, arranging first by
column value, showing smaller numbers at the top and greater numbers at the bottom, and then by column predError, for which all values should be arranged in the inverse way of the order of items
in column value. Find the error.
Code block:
transactionsDf.orderBy('value', asc_nulls_first(col('predError')))
The code block displayed below contains an error. The code block should read the csv file located at path data/transactions.csv into DataFrame transactionsDf, using the first row as column header
and casting the columns in the most appropriate type. Find the error.
First 3 rows of transactions.csv:
1.transactionId;storeId;productId;name
2.1;23;12;green grass
3.2;35;31;yellow sun
4.3;23;12;green grass
Code block:
transactionsDf = spark.read.load("data/transactions.csv", sep=";", format="csv", header=True)
Which of the following code blocks performs an inner join of DataFrames transactionsDf and itemsDf on columns productId and itemId, respectively, excluding columns value and storeId from
DataFrame transactionsDf and column attributes from DataFrame itemsDf?
The code block shown below should return a copy of DataFrame transactionsDf without columns value and productId and with an additional column associateId that has the value 5. Choose the
answer that correctly fills the blanks in the code block to accomplish this.
transactionsDf.__1__(__2__, __3__).__4__(__5__, 'value')
Which of the following code blocks immediately removes the previously cached DataFrame transactionsDf from memory and disk?
Which of the following describes a difference between Spark's cluster and client execution modes?