 # BUS105 Business Statistics Questions Assessment Answer Overview

A pair of datasets for country 1 and country 2. Looking at the datasets you will notice that each student has been given a sample, students must use their own sample.
An automatic dataset summarizer.
Instructions for checking that you have properly found your sample, students must use their sample.

“Title: semester 1, 2020 BUS105 computing assignment”

Overview

You need to submit a word file with the answers to 9 questions the first 8 are about the dataset the last question is a paraphrasing task (refer to pages 3 to 6)

You will use your dataset and the automatic dataset summarizer to get the descriptive statistics that are used questions 1 to 5 and the inferential statistics that are used in question 6 to 8.
to check you have correctly obtained your dataset check both p-values are correct when you investigate both categorical variables (question 6)

The word count can be less than 1500 words if you are giving answers that demonstrate you have understood the material.

Summary of the dataset (question 1 to 8 given on pages 3 to 6  are about the dataset)

Suppose market research company XYZ did a survey in two different countries. The survey was designed to gather basic information about some customers and their opinion about TV model XYZ

The survey questions were

“What is your Income?”
“What is your gender?”
“How much are you willing to spend on a TV?”
“Would you buy TV model XYZ?”

So there are two datasets, one for each country

One dataset is the survey answers country 1
One dataset is the survey answers country 2

students MUST use the datasets they are given, They CANNOT use datasets they make themselves or take from other sources.

Each of the datasets consists of the following variables,

income? : a quantitative variable
Gender?: a categorical variable
Amount you would spend? : A quantitative variable, the amount they would spend on a TV
Would you buy?:  A categorical variable, would they buy TV model XYZ

“Title: semester 1, 2020 BUS105 computing assignment”

Question 1
a) Just using the information for Country 1

i) Paste in descriptive sample statistics that let you investigate the relationship between the variables “Gender?” and “Would you buy?” using the sample

ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics (choose one)

Difference between sample means -
Difference between sample proportions  -
correlation coefficient
r

b) Just using the information for Country 2

i) Paste in descriptive sample statistics that let you investigate the claim there is a relationship between the variables “Gender?” and “Would you buy?”

ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics (chose one)

Difference between sample means -
Difference between sample proportions  -
correlation coefficient
r

c) Compare the results in parts (a) and (b)

Question 2

a) Just using the information for country 1

i) Paste in descriptive sample statistics that let you investigate the relationship between the variables “Would you buy?” and “amount you would spend?” using the sample

ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics (chose one)

Difference between sample means -
Difference between sample proportions  -
correlation coefficient
r

b) Just using the information for Country 2

i) Paste in descriptive sample statistics that let you investigate the claim there is a relationship between the variables “would you buy?” and “amount you would spend?”

ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics

Difference between sample means -
Difference between sample proportions  -
correlation coefficient
r

c) Compare the results in parts (a) and (b)

Question 3

a) Just using the information for Country 1

i) Paste in descriptive sample statistics and a graph that let you investigate the relationship between the variables “Income?” and “Amount you would spend?” using the sample

ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics

Difference between sample means -
Difference between sample proportions  -
correlation coefficient
r

b) Just using the information for Country 2

i) Paste in descriptive sample statistics and a graph that let you investigate the claim there is a relationship between the variables “Income?” and “Amount you would spend?” using the sample

ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics

Difference between sample means -
Difference between sample proportions  -
correlation coefficient
r

c) Compare the results in parts (a) and (b)

Question 4

1. Considering all people in the country 1 dataset
2. What is sample size n
3. What is the sample proportion of people that would buy the model XYZ TV
4. Use the answers in part (i) and (ii) to find the zscore of the sample proportion if you assume the population proportion p=0.5
5. Considering all people in the country 2 dataset
6. What is sample size n
7. What is the sample proportion of people that would buy the model XYZ TV
8. Use the answers to parts (i) and (ii) to find zscore of the sample proportion if you assume the population proportion p=0.5

Question 5

Just using the country 1 data set, more specifically the “variables income” and “would they buy” of the country 1 dataset

1. Just considering the people that would buy the TV
2. What is sample size, sample mean and sample standard deviation of income

Hint: this is easy just the dataset summarizer

ii) find a 95% confidence interval for income

1. Just considering the people would not buy the TV
2. What is sample size, sample mean and sample standard deviation of income

Hint: this is easy just the dataset summarizer

ii) find a 95% confidence interval for income

Question 6

a) Just using the information for country 1

i) Paste in inferential statistics that measure evidence for the claim there is a relationship between the variables “Gender?” and “Would you buy?” if you consider the whole population

ii) Make suitable comments about the output in part (i)

b) Just using the information for country 2

i) Paste in inferential statistics that measure evidence for the claim there is a relationship between the variables “Gender?” and “Would you buy?” if you consider the whole population

ii) Make suitable comments about the output in part (i)

c) Compare the results in parts (a) and (b)

Question 7

a) Just using the information for country 1

i) Paste in computer output that measure evidence for the claim there is a relationship between the variables “would you buy?” and “amount you would spend?” if you consider the whole population

Hint: inferential statistics measure evidence for a claim.

ii) Make suitable comments about the output in part (i)

b) Just using the information for country 2

i) Paste in computer output that measures evidence for the claim there is a relationship between the variables “would you buy?” and “amount you would spend?” if you consider the whole population

ii) Make suitable comments about the output in part (i)

c) Compare the results in parts (a) and (b)

Question 8

a) Just using the information for country 1

i) Paste in computer output that measures evidence for the claim there is a relationship between the variables “Income?” and “amount you would spend?” if you consider the whole population
Hint: inferential statistics measure evidence for a claim.

ii) Make suitable comments about the output in part (i)

b) Just using the information for country 2

i) Paste in computer output that measures evidence for the claim there is a relationship between the variables “Income?” and “amount you would spend?” if you consider the whole population

ii) Make suitable comments about the output in part (i)

c) Compare the results in parts (a) and (b)

Question 9

You need to pick ONE of the following options for question 9, Your answer should be about 300 words long

OPTION 1
Give a brief summary of at least on the major ideas in the following powerpoint  available from
OPTION 2

Briefly discuss the main message in the following sample report available from
You need to cut and paste just your dataset into a new excel file and follow the 4 instructions below, DO NOT use a cover page for the excel file, you must check that you have the correct sample

Note that you can still do this at home even if you do not have excel, just use google sheets

1. For country 1 Use excel PivotTable commands (or google sheet pivot table commands)  to find appropriate sample statistics that let you investigate  the relationship between the fields (variables) “Gender?” and “Would you buy?”
2. For country 1
Use excel PivotTable commands (or google sheet pivot table commands) to find appropriate sample statistics that let you investigate the relationship between the fields (variables) “Would you buy?” and “Amount you would spend?”
3. For country 1
Use excel commands to make a graph that lets you investigate the relationship between the fields (variables) “Income?” and “Amount you would spend?”

Question 1

a) Just using the information for Country 1

i) Paste in descriptive sample statistics that let you investigate the relationship between the variables “Gender?” and “Would you buy?” using the sample

 descriptive sample statistics N Y Total female count 13 43 56 female % 23.21% 76.79% 100.00% male count 17 27 44 male % 38.64% 61.36% 100.00%

ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics.

The above table describes the sample proportions: the sample size is 100 comprising of 56% females and 44% males. Further, while 13 females and 17 males are not willing to buy the new model, 43 females and 27 males are willing to buy the new model.

Difference in proportions method is used:

P1 is 0.30 (not willing to buy)

P2 is 0.70 (willing to buy)

P1-P2 = 0.30-0.70 = -0.40

Hence, there is a 40% upswing for those willing to buy new model.

b) Just using the information for Country 2

i) Paste in descriptive sample statistics that let you investigate the claim there is a relationship between the variables “Gender?” and “Would you buy?”

 descriptive sample statistics N Y Total female count 16 40 56 female % 28.57% 71.43% 100.00% male count 28 16 44 male % 63.64% 36.36% 100.00%

ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics (chose one)

The above table describes the sample proportions: the sample size is 100 comprising of 56% females and 44% males. Further, while 16 females and 28 males are not willing to buy the new model, 40 females and 16 males are willing to buy the new model.

Difference in proportions method is used:

P1 is 0.44 (not willing to buy)

P2 is 0.56 (willing to buy)

P1-P2 = 0.44-0.56 = -0.12

Hence, there is a 12% upswing for those willing to buy new model.

c) Compare the results in parts (a) and (b)

Country 1 Country 2  The above are stacked bar charts for the two countries. It can be seen that in both countries, large proportion of females are willing to buy the new model: 76.79% in country 1 and 71.43% in country 2. However, while the majority male population of Country 1 is willing to buy new model (61.36%), the same is not true for country 2 where only 36.36% of male population is willing to buy the new model.

Question 2

a) Just using the information for country 1

i) Paste in descriptive sample statistics that let you investigate the relationship between the variables “Would you buy?” and “amount you would spend?” using the sample

 descriptive sample statistics xbar1 xbar2 s1 s2 n1 n2 513.567 722.557 179.37 176 30 70

ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics (chose one)

The above table provides statistical measures with respect to the ‘amount you would spend’ for the two groups: those ‘willing to buy’ and ‘not willing to buy’. The table provides mean, standard deviation and sample size.

Difference in means method is used:

X1-X2 = 513.567-722.557 = -208.99

Hence, there is an upward swing of 208.99 in average amount of those who are willing to buy the new model.

b) Just using the information for Country 2

i) Paste in descriptive sample statistics that let you investigate the claim there is a relationship between the variables “would you buy?” and “amount you would spend?”

 descriptive sample statistics xbar1 xbar2 s1 s2 n1 n2 700.045 602.625 178.288 168 44 56

ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics

The above table provides statistical measures with respect to the ‘amount you would spend’ for the two groups: those ‘willing to buy’ and ‘not willing to buy’. The table provides mean, standard deviation and sample size.

Difference in means method is used:

X1-X2 = 700.045-602.625 = 97.42

Hence, there is a downward swing of 97.42 in average amount of those who are willing to buy the new model.

c) Compare the results in parts (a) and (b)

For Country 1, those who are willing to buy are willing to spend an average of 208.99 more as compared to those who are not willing to buy. Majority population of this country was willing to buy the new model.

For Country 2, those who are willing to buy are willing to spend an average of 97.42 less as compared to those who are not willing to buy. Majority male population of this country was not willing to buy the new model.

Question 3

a) Just using the information for Country 1

i) Paste in descriptive sample statistics and a graph that let you investigate the relationship between the variables “Income?” and “Amount you would spend?” using the sample

 descriptive sample statistics sample size 100 sample Slope 9.7149734 sample intercept 13.328519 sample correlation r 0.9556816 ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics

The above variables “Income?” and “Amount you would spend?” are quantitative in nature and hence, correlation method is used.

The table above shows a high degree of positive correlation (r) at 0.956 and it also provides slope and intercept which can be used to predict the value of y variable (amount you would spend). Similar information is also there in scatter graph which shows regression equation and the scatter plot that shows concentration of observations around the upward sloping trend line. This shows that as value of one variable (income) increases, value of other variable (amount you would spend) also increases and vice versa.

b) Just using the information for Country 2

i) Paste in descriptive sample statistics and a graph that let you investigate the claim there is a relationship between the variables “Income?” and “Amount you would spend?” using the sample

 descriptive sample statistics sample size 100 sample Slope 8.236304 sample intercept -16.214666 sample correlation r 0.9855916 ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics

The above variables “Income?” and “Amount you would spend?” are quantitative in nature and hence, correlation method is used.

The table above shows a high degree of positive correlation (r) at 0.986 and it also provides slope and intercept which can be used to predict the value of y variable (amount you would spend). Similar information is also there in scatter graph which shows regression equation and the scatter plot that shows concentration of observations around the upward sloping trend line. This shows that as value of one variable (income) increases, value of other variable (amount you would spend) also increases and vice versa.

c) Compare the results in parts (a) and (b

Both the countries indicate strong positive relationship between the two variables. This shows that as value of one variable (income) increases, value of other variable (amount you would spend) also increases and vice versa. The strength of this relationship is a little higher for Country 2 at 0.986 as compared to Country 1 at 0.956. Still, both have very high correlation as r varies from -1 to 1 and the value is very close to 1 in both cases.

Question 4

1. Considering all people in the country 1 dataset

2. What is sample size n

Sample size is 100

3. What is the sample proportion of people that would buy the model XYZ TV

0.70 or 70% of people in country 1 are willing to buy model XYZ TV.

4. Use the answers in part (i) and (ii) to find the zscore of the sample proportion if you assume the population proportion p=0.5

Let Sample proportion be variable X such that X can either be people willing to buy TV or not willing to buy TV making X a binomial variable where population p= 0.5 such that:

Mean = µ = np = 100*0.5 = 50

SD = σ = √np(1-p) = √100*.5*(1-0.5) = √50*.5 = √25 = 5

Sample proportion () = 0.70

z-score = (-p)/ σ  = (0.70-0.50)/5 = 0.04

5. Considering all people in the country 2 dataset

6. What is sample size n

Sample size is 100

7. What is the sample proportion of people that would buy the model XYZ TV

0.56 or 56% of people in country 2 are willing to buy model XYZ TV.

8. Use the answers in part (i) and (ii) to find the zscore of the sample proportion if you assume the population proportion p=0.5

Let Sample proportion be variable X such that X can either be people willing to buy TV or not willing to buy TV making X a binomial variable where population p= 0.5 such that:

Mean = µ = np = 100*0.5 = 50

SD = σ = √np(1-p) = √100*.5*(1-0.5) = √50*.5 = √25 = 5

Sample proportion () = 0.56

z-score = (-p)/ σ  = (0.56-0.50)/5 = 0.012

Question 5

Just using the country 1 data set, more specifically the “variables income” and “would they buy” of the country 1 dataset

1. Just considering the people that would buy the TV
2. What is sample size, sample mean and sample standard deviation of income
 Row Labels Count of income Average of income StdDev of income N 30 51.10 16.68 Y 70 73.17 17.16 Grand Total 100 66.55 19.75

The above pivot table shows that for only people willing to buy indicated by ‘Y’, sample size is 70, sample mean is 73.17 and standard deviation of income is 17.16

1. find a 95% confidence interval for income

Confidence interval for income =

Mean = 73.17, n = 70, SD = 17.16

SD/√n = 17.16/√70 = 2.0508, df = n-1 = 69, α = 0.05 (2tail)

CI (95%) = 73.17 ± (1.9949)(2.0508)

= 73.17 ± 4.0911

95% CI = 69.08 and 77.26

2. Just considering the people would not buy the TV

3.  What is sample size, sample mean and sample standard deviation of income

 Row Labels Count of income Average of income StdDev of income N 30 51.10 16.68 Y 70 73.17 17.16 Grand Total 100 66.55 19.75

The above pivot table shows that for only people willing to buy indicated by ‘Y’, sample size is 30, sample mean is 51.10 and standard deviation of income is 16.68

4. find a 95% confidence interval for income

Confidence interval for income =

Mean = 51.10, n = 30, SD = 16.68

SD/√n = 16.68/√30 = 3.0453, df = n-1 = 29, α = 0.05 (2tail)

CI (95%) = 51.10 ± (2.0452)(3.0453)

= 51.10 ± 6.2283

95% CI = 44.87 and 57.33

Question 6

a) Just using the information for country 1

i) Paste in inferential statistics that measure evidence for the claim there is a relationship between the variables “Gender?” and “Would you buy?” if you consider the whole population

 Inferential statistics n1 n2 phat 1 phat 2 56 44 0.76786 0.613636364 Estimate of the difference between population proportions phat1-phat2 0.154220779 standard error of estimate test stat two sided pvalue 0.092318618 1.670527378 0.094815067 To calculate the p-value H0:p1=p2 is assumed to be true since the test is two sided H1 is H1:p1≠p2

ii) Make suitable comments about the output in part (i)

The two-sided p-value 0.095 is greater than significance level of 0.05. Hence, we do not have evidence to reject the null hypothesis. Hence, we conclude that there is no significance difference between two proportions that have been tested.

b) Just using the information for country 2

i) Paste in inferential statistics that measure evidence for the claim there is a relationship between the variables “Gender?” and “Would you buy?” if you consider the whole population

 Inferential statistics n1 n2 phat 1 phat 2 56 44 0.71429 0.363636364 Estimate of the difference between population proportions phat1-phat2 0.350649351 standard error of estimate test stat two sided pvalue 0.1 3.506493506 0.000454053 To calculate the p-value H0:p1=p2 is assumed to be true since the test is two sided H1 is H1:p1≠p2

ii) Make suitable comments about the output in part (i)

The two-sided p-value 0.0001 is lesser than significance level of 0.05. Hence, we have evidence to reject the null hypothesis. Hence, we conclude that there is significance difference between two proportions that have been tested.

c) Compare the results in parts (a) and (b)

In case of Country 1, null hypothesis could not be rejected. Hence, we could not conclude significant difference between proportions tested.

In case of Country 2, null hypothesis could be rejected. Hence, we concluded significant difference between proportions tested.

Question 7

a) Just using the information for country 1

i) Paste in computer output that measure evidence for the claim there is a relationship between the variables “would you buy?” and “amount you would spend?” if you consider the whole population

 Inferential statistics Estimate of the difference between population means xbar1-xbar2 -208.99 standard error of estimate  xbar1-xbar2 38.9305 t test stat df two sided pvalue -5.3683 54 1.7E-06 To calculate the p-value H0:μ1=μ2 is assumed to be true since the test is two sided H1 is H1:μ1≠μ2

ii) Make suitable comments about the output in part (i)

The p-value is lower than significance level of 0.05. Hence, null hypothesis can be rejected. Hence, it can be concluded that the mean amount people are willing to pay varies significantly between those who are willing to buy TV model and those who are not willing to buy TV model.

b) Just using the information for country 2

i) Paste in computer output that measures evidence for the claim there is a relationship between the variables “would you buy?” and “amount you would spend?” if you consider the whole population

 Inferential statistics Estimate of the difference between population means xbar1-xbar2 97.4205 standard error of estimate  xbar1-xbar2 34.9831 t test stat df two sided pvalue 2.78479 89 0.00654 To calculate the p-value H0:μ1=μ2 is assumed to be true since the test is two sided H1 is H1:μ1≠μ2

ii) Make suitable comments about the output in part (i)

The p-value is lower than significance level of 0.05. Hence, null hypothesis can be rejected. Hence, it can be concluded that the mean amount people are willing to pay varies significantly between those who are willing to buy TV model and those who are not willing to buy TV model.

c) Compare the results in parts (a) and (b)

In both countries, the p-value is lower than significance level of 0.05 and it could be concluded that the mean amount people are willing to pay varies significantly between those who are willing to buy TV model and those who are not willing to buy TV model.

Question 8

a) Just using the information for country 1

i) Paste in computer output that measures evidence for the claim there is a relationship between the variables “Income?” and “amount you would spend?” if you consider the whole population

 Inferential statistics paste this into the word file and add comments correlation r 0.9556816 R square 0.9133272 standard error of slope 0.3023129 test stat of slope 32.13549 two sided p-value for slope 0.00000 To calculate the p-value H0:population slope =0  is assumed to be true since the test is two sided H1 is H1:population slope ≠0 ii) Make suitable comments about the output in part (i)

As discussed already, r has a very high and positive value indicating strong positive relationship between income and amount they are willing to pay. This is evident in the scatter graph also. The regression equation as also found to predict the value of amount that they are willing to spend.

Additionally, the p-value is lower than significance value of 0.05 indicating that results are statistically significant.

b) Just using the information for country 2

i) Paste in computer output that measures evidence for the claim there is a relationship between the variables “Income?” and “amount you would spend?” if you consider the whole population

 Inferential statistics paste this into the word file and add comments correlation r 0.9855916 R square 0.9713908 standard error of slope 0.1427827 test stat of slope 57.684197 two sided p-value for slope 0.00000 To calculate the p-value H0:population slope =0  is assumed to be true since the test is two sided H1 is H1:population slope ≠0 ii) Make suitable comments about the output in part (i)

As discussed already, r has a very high and positive value indicating strong positive relationship between income and amount they are willing to pay. This is evident in the scatter graph also. The regression equation as also found to predict the value of amount that they are willing to spend.

Additionally, the p-value is lower than significance value of 0.05 indicating that results are statistically significant.

c) Compare the results in parts (a) and (b)

In both cases, r has a very high and positive value indicating strong positive relationship between income and amount they are willing to pay. This is evident in the scatter graph also. The regression equation as also found to predict the value of amount that they are willing to spend. Additionally, the p-value is lower than significance value of 0.05 indicating that results are statistically significant.

Question 9

OPTION 1:

The selected option is the first option that provides PowerPoint presentation discussing various statistical concepts such as mean, standard deviation, types of variables, real world dataset with one categorical variable and one quantitative variable, ogive and ogive of p-values and interpretation of the same.

For mean, a numerical illustration has been provided to explain the calculations involved in finding mean and what symbols are used to represent sample mean.

For standard deviation, the same illustration as mean has been carried forward to explain the calculations involved in finding standard deviation.

For types of variables, the slides first make a reference to how two quantitative variables are presented through a graph or a scatterplot. It explains how this graph was used to derive relationship between the two variables in question.

Then the slides pick a numerical illustration based on data from a government website that provides postcode, dwelling type and weekly rent. The example seems to be carried forward from previous slides that discussed mean and standard deviation but now has more data and variables. The next slides then categorize the data basis dwelling being in Sydney or not and provides average rent and count of dwelling types. The count of dwelling types is in absolute numbers as well as percentage proportion of total dwelling types. The slides explain how such categorization and pivot tables can be used to understand if samples are wrong. The illustration is carried forward to understand the difference in rent in Sydney and out of Sydney. This is analysis of categorical variable and a quantitative variable.

The slides then explain ogive and how multiple samples can be generated such that p-value of each sample can be plotted in an ogive. If ogive is a straight line, it indicates weak relationship. If ogive has most values around zero, it indicates strong relationship. Ogives also help to visually understand percentiles and quartiles. The slides then go on to explain an illustration where multiple sample have been generated using a computer and then p-value for each sample has been found. The slides also attempt to explain the meaning of p-values and what they indicate and how they can be interpreted.