Deck 10: Distributions and Associations
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Question
Unlock Deck
Sign up to unlock the cards in this deck!
Unlock Deck
Unlock Deck
1/48
Play
Full screen (f)
Deck 10: Distributions and Associations
1
What does the term "descriptive statistics" refer to?
A) trends in population data
B) description of metadata
C) patterns in the sample data
D) regression analysis
A) trends in population data
B) description of metadata
C) patterns in the sample data
D) regression analysis
C
2
A report shows the mean, median, mode of a quantitative variable. It also gives the range, standard deviation, skewness and kurtosis values for the variable. What is this type of report called?
A) univariate summary statistics
B) bivariate statistics
C) quantitative description
D) statistical summary
A) univariate summary statistics
B) bivariate statistics
C) quantitative description
D) statistical summary
A
3
What is the median value of the following set of numbers: 2, 5, 17, 1, 11, 18?
A) 9.5
B) 8
C) 17.5
D) 5
A) 9.5
B) 8
C) 17.5
D) 5
B
4
What type of distribution would you call the following set of numbers, in terms of measures of central tendency: 23, 25, 25, 41, 25, 50, 47, 50, 21, 50?
A) multi-modal
B) uniform
C) normal
D) bimodal
A) multi-modal
B) uniform
C) normal
D) bimodal
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
5
When presenting a frequency distribution of a variable in a sample, why is it important to show percentage of cases for each value of the variable?
A) for ease of understanding
B) to correct for sample size
C) to show a common scale
D) in order to show a histogram
A) for ease of understanding
B) to correct for sample size
C) to show a common scale
D) in order to show a histogram
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
6
What does the term "composition" refer to? Choose the best answer.
A) how the categories of a variable are spread across all categories
B) the measurement level of a variable in a given sample
C) how distinct values or categories of a variable make up the sample
D) values for central tendency of a set of numbers
A) how the categories of a variable are spread across all categories
B) the measurement level of a variable in a given sample
C) how distinct values or categories of a variable make up the sample
D) values for central tendency of a set of numbers
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
7
How is the range of a continuous variable calculated?
A) largest minus smallest value plus one
B) largest minus smallest value
C) largest minus smallest value minus one
D) largest value of the variable minus one
A) largest minus smallest value plus one
B) largest minus smallest value
C) largest minus smallest value minus one
D) largest value of the variable minus one
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
8
Which of the following statements are true regarding the term "interquantile range"?
A) It shows the range of each quantile of a distribution.
B) It captures the middle 50% of a ranked distribution.
C) It can be used as an alternative to the mode.
D) It gives the upper and lower limits of the median.
A) It shows the range of each quantile of a distribution.
B) It captures the middle 50% of a ranked distribution.
C) It can be used as an alternative to the mode.
D) It gives the upper and lower limits of the median.
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
9
The sum of the squared deviations from the mean of 20 values is 1750. What is the standard deviation of these values?
A) 41.59
B) 87.50
C) 9.00
D) 9.35
A) 41.59
B) 87.50
C) 9.00
D) 9.35
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
10
How would you define the term "outlier"? Choose the best answer
A) value that is greater than a defined range of values of a variable
B) value significantly higher or lower than the next adjacent value of a variable
C) value that is smaller than the minimum value of a variable
D) value that is higher than the 50th percentile of a quantitative variable
A) value that is greater than a defined range of values of a variable
B) value significantly higher or lower than the next adjacent value of a variable
C) value that is smaller than the minimum value of a variable
D) value that is higher than the 50th percentile of a quantitative variable
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
11
The shape of a distribution has a skewness value of -12.5. What can you say about the magnitude of the mean relative to the median of this distribution?
A) Mean is lower than the median.
B) Mean and median are equal.
C) Mean is greater than the median.
D) Mean is 0 and median is negative.
A) Mean is lower than the median.
B) Mean and median are equal.
C) Mean is greater than the median.
D) Mean is 0 and median is negative.
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
12
Which measure of central tendency is preferred to describe a distribution that is significantly skewed?
A) mean
B) variance
C) median
D) mode
A) mean
B) variance
C) median
D) mode
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
13
The mean and standard deviation of a distribution is 32.8 and 7.5, respectively. What is the z score of the number 70.6 in this distribution?
A) 4.37
B) 37.8
C) 9.41
D) 5.04
A) 4.37
B) 37.8
C) 9.41
D) 5.04
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
14
What does the z-score represent for a number in a roughly normal distribution?
A) how many standard deviations above or below the mean
B) number of variance units away from the mean
C) an absolute value without a unit of measurement
D) the absolute distance from the mean of a distribution
A) how many standard deviations above or below the mean
B) number of variance units away from the mean
C) an absolute value without a unit of measurement
D) the absolute distance from the mean of a distribution
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
15
What are the plausible values possible for the standard deviation of any distribution?
A) positive or negative values
B) negative values
C) positive values
D) positive or zero
A) positive or negative values
B) negative values
C) positive values
D) positive or zero
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
16
Which of the following correctly characterize the plausibility of Z-scores of a distribution calculated using the sample mean and deviation?
A) They must all be positive.
B) They must be positive and negative.
C) They must fall within -3 and +3.
D) 67% of scores must be positive.
A) They must all be positive.
B) They must be positive and negative.
C) They must fall within -3 and +3.
D) 67% of scores must be positive.
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
17
An ordinal variable called AGE GROUP contains five possible classes starting from "18-25" years, and a maximum class of "60-70" years. In addition to a frequency distribution, which measure of central tendency could be shown in presenting exploratory analysis results of the variable?
A) median
B) mean
C) range
D) mode
A) median
B) mean
C) range
D) mode
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
18
Which of the following is considered the best way to present data (in charts) of a nominal variable?
A) pie chart
B) line chart
C) histogram
D) vertical bars
A) pie chart
B) line chart
C) histogram
D) vertical bars
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
19
In a box-and-whiskers plot of an interval variable, what do the lower and upper lines of the box represent?
A) the mean and the median values
B) the first and third quantile values
C) the minimum and maximum values
D) the second and third quantile values
A) the mean and the median values
B) the first and third quantile values
C) the minimum and maximum values
D) the second and third quantile values
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
20
The correlation coefficient between a dependent and an independent variable is -0.4. What can we say about association between the two variables?
A) the association is positive
B) the association is not significant
C) the association is weak and positive
D) the association is inverse
A) the association is positive
B) the association is not significant
C) the association is weak and positive
D) the association is inverse
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
21
Which of the following could be used to present association between two categorical variables?
A) cross-tab
B) histogram
C) stacked bar chart
D) frequency table
A) cross-tab
B) histogram
C) stacked bar chart
D) frequency table
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
22
In a study, the number of cigarettes smoked by high school students was analyzed in relation to several social factors. The primary factor was found to be peer pressure, which was categorized as low, medium, and, high for the study. What kind of test would you do to find the association between peer pressure and smoking?
A) correlation coefficient
B) difference in means
C) analysis of variance
D) R2 test
A) correlation coefficient
B) difference in means
C) analysis of variance
D) R2 test
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
23
A 2-way ANOVA measure is used to identify association between which type of variables?
A) categorical dependent and two continuous variables taken together
B) continuous dependent and two continuous independent variables
C) categorical dependent, one continuous and one nominal variable
D) continuous dependent and two categorical variables taken together
A) categorical dependent and two continuous variables taken together
B) continuous dependent and two continuous independent variables
C) categorical dependent, one continuous and one nominal variable
D) continuous dependent and two categorical variables taken together
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
24
What are the plausible limits to values of coefficient of determination?
A) -1 to +1
B) 0 to +1
C) -1 to 0
D) -1 < to <+1
A) -1 to +1
B) 0 to +1
C) -1 to 0
D) -1 < to <+1
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
25
What are the plausible limits to the results from an ANOVA test?
A) lowest and highest value of independent variable
B) minimum and maximum value of the outcome variable
C) minimum of dependent and maximum of independent variable
D) lowest value of independent and highest value of outcome
A) lowest and highest value of independent variable
B) minimum and maximum value of the outcome variable
C) minimum of dependent and maximum of independent variable
D) lowest value of independent and highest value of outcome
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
26
Statistical methods refer to techniques used to describe, organize, and, interpret data.
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
27
Shape refers to how much values of a variable differ from one another.
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
28
Median is the 50th percentile value of a variable whose values have been sorted from lowest to highest value.
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
29
The values of a numeric variable were as follows: 148, 120, 91, 178, 202, 115, and, missing. The mean value of the variable is 142.33.
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
30
A distribution with two modal values is called a multimodal distribution.
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
31
The frequency distribution of a continuous variable with many unique values should be presented in a chart, rather than in a table.
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
32
The variability in a nominal variable could be shown by a frequency distribution or by the range measure.
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
33
The variance measure of a variable, with values clustered more closely around the mean value, is larger than when the values are more widely dispersed from the mean.
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
34
99.7% of values fall within +/- 3 standard deviation from the mean of a normally distributed variable.
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
35
The variance measure of a variable is in the same units as the variable values.
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
36
What is the plausible value range of a measure of position such as percentiles? Explain your answer.
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
37
Which is the best way to portray the distribution of a categorical variable? Explain with an example.
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
38
With the help of an example, illustrate the best way to present data distribution of a nominal variable. Explain your reasons for such a presentation.
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
39
The mean and median value of a continuous variable are 10.4 and 9.5 respectively. The maximum value of the variable is 41.7. How would these values be shown in a box-and-whiskers plot?
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
40
What is a bivariate association? Explain, with an example, the measure(s) used to show this type of an association.
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
41
Why is a numeric association between 2 variables, not necessarily indicative of a cause-effect relationship between the variables? Explain with an example.
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
42
The correlation coefficient between two variables was found to be 0.37. What is the coefficient of determination between the two variables? What does this measure represent?
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
43
When would you use ANOVA, rather than a difference in means, to measure association between a continuous and a categorical variable? Explain with an example.
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
44
What is the difference between ANOVA and a 2-way ANOVA. Illustrate with an example.
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
45
What are the plausible limits on values of the coefficient of determination measure? How are these different from limits of the correlation coefficient measure?
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
46
Design a research study on a topic of your choice. Make sure the outcome is a continuous variable, and that there are at least three independent variables with one of them being categorical. Construct a table with the variables in columns, and each observation or case for those variables, in rows. Have between seven to 10 rows in the table. Insert artificial values in each cell corresponding to the variable value for each row. This will be your data used in the study. If you have or find any public data on the topic of your choice, you could use it as your data source. You could also use survey data collected by the Center for Disease Control (CDC) on smoking habits among youth. You could access this data at HYPERLINK "https://www.cdc.gov/tobacco/data_statistics/surveys/nyts/index.htm" https://www.cdc.gov/tobacco/data_statistics/surveys/nyts/index.htm. Now, create a report showing central tendency, variability, and shape of any two of your variables in the study. Explain any calculations you make to arrive at your report results.
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
47
For the research study in question 1, create a report giving measures and charts (where applicable) for a bivariate and a three-way association between the variables in the study.
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck
48
Discuss the plausibility of values, and their limits, for the numbers you present in your reports in question 1 and 2. Explain your reasoning.
Unlock Deck
Unlock for access to all 48 flashcards in this deck.
Unlock Deck
k this deck