 # BUS105 Use of Descriptive Sample Statistics to Investigate Variable: Computing Assignment Answer Overview

Materials that must be used in the assignment, these are provided on moodle

*A pair of datasets for country 1 and country 2. Looking at the datasets you will notice that each student has been given a sample, students must use their own sample.
*An automatic dataset summarizer.
*Instructions for checking that you have properly found your sample, students must use their sample.

Use the following as the cover page for the word file

“Title: semester 1, 2020 BUS105 computing assignment”
“Name:”
“Student number:”
“Sample:   ”

Overview

You need to submit a word file with the answers to 9 questions the first 8 are about the dataset the last question is a paraphrasing task (refer to pages 3 to 6)

You will use your dataset and the automatic dataset summarizer to get the descriptive statistics that are used questions 1 to 5 and the inferential statistics that are used in question 6 to 8.
to check you have correctly obtained your dataset check both p-values are correct when you investigate both categorical variables (question 6)

The word count can be less than 1500 words if you are giving answers that demonstrate you have understood the material.

Summary of the dataset (question 1 to 8 given on pages 3 to 6  are about the dataset)

Suppose market research company XYZ did a survey in two different countries. The survey was designed to gather basic information about some customers and their opinion about TV model XYZ

The survey questions were

“How much are you willing to spend on a TV?”
“Would you buy TV model XYZ?”

So there are two datasets, one for each country

One dataset is the survey answers country 1
One dataset is the survey answers country 2

students MUST use the datasets they are given, They CANNOT use datasets they make themselves or take from other sources.

Each of the datasets consists of the following variables,

income? : a quantitative variable
Gender?: a categorical variable
Amount you would spend? : A quantitative variable, the amount they would spend on a TV
Would you buy?:  A categorical variable, would they buy TV model XYZ

paste the following cover page and the answers to questions 1 to 9 below into a word document

“Title: semester 1, 2020 BUS105 computing assignment”
“Name:”
“Student number:”
“Sample:   ”

“(students must use the dataset provided each student has been allocated their own sample)”

Question 1
a) Just using the information for Country 1

i) Paste in descriptive sample statistics that let you investigate the relationship between the variables “Gender?” and “Would you buy?” using the sample

ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics (choose one)

Difference between sample means -
Difference between sample proportions  -
correlation coefficient r

b) Just using the information for Country 2

i) Paste in descriptive sample statistics that let you investigate the claim there is a relationship between the variables “Gender?” and “Would you buy?”

ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics (chose one)

Difference between sample means -
Difference between sample proportions  -
correlation coefficient r

c) Compare the results in parts (a) and (b)

Question 2

a) Just using the information for country 1

i) Paste in descriptive sample statistics that let you investigate the relationship between the variables “Would you buy?” and “amount you would spend?” using the sample

ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics (chose one)

Difference between sample means -
Difference between sample proportions  -
correlation coefficient r

b) Just using the information for Country 2

i) Paste in descriptive sample statistics that let you investigate the claim there is a relationship between the variables “would you buy?” and “amount you would spend?”

ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics

Difference between sample means -
Difference between sample proportions  -
correlation coefficient r

c) Compare the results in parts (a) and (b)

Question 3

a) Just using the information for Country 1

i) Paste in descriptive sample statistics and a graph that let you investigate the relationship between the variables “Income?” and “Amount you would spend?” using the sample

ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics

Difference between sample means -
Difference between sample proportions  -
correlation coefficient r

b) Just using the information for Country 2

i) Paste in descriptive sample statistics and a graph that let you investigate the claim there is a relationship between the variables “Income?” and “Amount you would spend?” using the sample

ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics

Difference between sample means -
Difference between sample proportions  -
correlation coefficient r

c) Compare the results in parts (a) and (b)

Question 4

Question 5

Just using the country 1 data set, more specifically the “variables income” and “would they buy” of the country 1 dataset

Hint: this is easy just the dataset summarizer

ii) find a 95% confidence interval for income

Hint: this is easy just the dataset summarizer

ii) find a 95% confidence interval for income

Question 6

a) Just using the information for country 1

i) Paste in inferential statistics that measure evidence for the claim there is a relationship between the variables “Gender?” and “Would you buy?” if you consider the whole population

b) Just using the information for country 2

i) Paste in inferential statistics that measure evidence for the claim there is a relationship between the variables “Gender?” and “Would you buy?” if you consider the whole population

c) Compare the results in parts (a) and (b)

Question 7

a) Just using the information for country 1

i) Paste in computer output that measure evidence for the claim there is a relationship between the variables “would you buy?” and “amount you would spend?” if you consider the whole population

Hint: inferential statistics measure evidence for a claim.

b) Just using the information for country 2

i) Paste in computer output that measures evidence for the claim there is a relationship between the variables “would you buy?” and “amount you would spend?” if you consider the whole population

c) Compare the results in parts (a) and (b)

Question 8

a) Just using the information for country 1

i) Paste in computer output that measures evidence for the claim there is a relationship between the variables “Income?” and “amount you would spend?” if you consider the whole population
Hint: inferential statistics measure evidence for a claim.

b) Just using the information for country 2

i) Paste in computer output that measures evidence for the claim there is a relationship between the variables “Income?” and “amount you would spend?” if you consider the whole population

c) Compare the results in parts (a) and (b)

Question 9

You need to pick ONE of the following options for question 9, Your answer should be about 300 words long

OPTION 1
Give a brief summary of at least on the major ideas in the following power point  available from

app.box.com/s/0eddpa7wut2kdae4nmkynrz0fgfdcrbg

OPTON 2

Briefly discuss the main message in the following sample report available from

app.box.com/s/ael5pciel84wnveu3z4bd74ro0v2djgn

Instructions for the excel file ,

you have to use the excel commands discussed below and not the dataset summarizer
However you should check that your summaries are the same as the output from the dataset summarizer you used in the word file.
If you have different information you will get at most 1 out of 2

You need to cut and paste just your dataset into a new excel file and follow the 4 instructions below, DO NOT use a cover page for the excel file, you must check that you have the correct sample

Note that you can still do this at home even if you do not have excel, just use google sheets

1. For country 1  Use excel PivotTable commands (or google sheet pivot table commands)  to find appropriate sample statistics that let you investigate  the relationship between the fields (variables) “Gender?” and “Would you buy?”
2. For country 1
Use excel PivotTable commands (or google sheet pivot table commands) to find appropriate sample statistics that let you investigate the relationship between the fields (variables) “Would you buy?” and “Amount you would spend?”
3. For country 1
Use excel commands to make a graph that lets you investigate the relationship between the fields (variables) “Income?” and “Amount you would spend?”

Upload the excel file with the pivot tables and scatterplot to the assignment dropbox

Title: BUS105 Computing Assignment

Semester 1, 2020

Question 1

a) Just using the information for Country 1

i) Paste in descriptive sample statistics that let you investigate the relationship between the variables “Gender?” and “Would you buy?” using the sample

 descriptive sample statistics N Y total female count 11 35 46 female % 23.91% 76.09% 100.00% male count 16 38 54 male % 29.63% 70.37% 100.00%

ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics.

The two variables are ‘Gender’ and ‘Would you buy’ and both are categorical in nature. The above descriptive sample table presents absolute numbers and percentage-wise break-up between the two variables. Hence, 11 or 23.91% of the female population is not willing to buy while 35 or 76.09% of the female population is willing to buy. Similarly, 16 or 29.63% of the male population is not willing to buy while 38 or 70.37% of the male population is willing to buy.

Hence, it is visible that majority of population, be it male or female is willing to buy. Overall, 27 respondents are not willing to buy while 73 respondents are willing to buy. This will used to find difference between proportions:

Difference between sample proportions  -

- = 0.73 – 0.27 = 0.46.

The two proportions used are those who are willing to buy and those who are not willing to buy. The difference between the two proportions is 0.46 or 46% upswing for those willing to buy.

b) Just using the information for Country 2

i) Paste in descriptive sample statistics that let you investigate the claim there is a relationship between the variables “Gender?” and “Would you buy?”

 descriptive sample statistics N Y total female count 18 28 46 female % 39.13% 60.87% 100.00% male count 33 21 54 male % 61.11% 38.89% 100.00%

ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics (chose one)

The two variables are ‘Gender’ and ‘Would you buy’ and both are categorical in nature. The above descriptive sample table presents absolute numbers and percentage-wise break-up between the two variables. Hence, 18 or 39.13% of the female population is not willing to buy while 28 or 60.87% of the female population is willing to buy. Similarly, 33 or 61.11% of the male population is not willing to buy while 21 or 38.89% of the male population is willing to buy.

Hence, it is visible that majority of female population is willing to buy but not majority of male population. Overall, 51 respondents are not willing to buy while 49 respondents are willing to buy. This will used to find difference between proportions:

Difference between sample proportions  -

- = 0.51 – 0.49 = 0.02.

The two proportions used are those who are willing to buy and those who are not willing to buy. The difference between the two proportions is 0.02 or 2% upswing for those willing to buy.

c) Compare the results in parts (a) and (b)

The upswing for those willing to buy is 46% in case of country 1 and 2% in case of country 2. This indicates much stronger willingness to buy the TV model XYZ in Country 1. This is visible in following stacked bar graphs as well:  Question 2

a) Just using the information for country 1

i) Paste in descriptive sample statistics that let you investigate the relationship between the variables “Would you buy?” and “amount you would spend?” using the sample

 descriptive sample statistics xbar1 xbar2 s1 s2 n1 n2 544.704 709.356 175.324 170 27 73

ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics (chose one)

The two variables are ‘Would you buy’ and ‘Amount you would spend’ where one variable is categorical in nature and the other is quantitative in nature. The above descriptive sample table presents mean, standard deviation and count for the two categories (willing to buy and not willing to buy).

Difference between sample means -

- = 544.704-709.356 = -164.65

Hence, there is an upward swing of 164.65 for those willing to buy.

b) Just using the information for Country 2

i) Paste in descriptive sample statistics that let you investigate the claim there is a relationship between the variables “would you buy?” and “amount you would spend?”

 descriptive sample statistics xbar1 xbar2 s1 s2 n1 n2 707.569 607.898 154.455 188 51 49

ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics

The two variables are ‘Would you buy’ and ‘Amount you would spend’ where one variable is categorical in nature and the other is quantitative in nature. The above descriptive sample table presents mean, standard deviation and count for the two categories (willing to buy and not willing to buy).

Difference between sample means -

- = 707.569 - 607.898 = 99.67

Hence, there is an upward swing of 99.67 for those not willing to buy.

c) Compare the results in parts (a) and (b)

In case of Country 1, the mean amount that people are willing to spend indicates upswing for those willing to buy amounting to 164.65. In case of Country 2, the mean amount that people are willing to spend indicates downswing for those willing to buy amounting to 99.67

Question 3

a) Just using the information for Country 1

i) Paste in descriptive sample statistics and a graph that let you investigate the relationship between the variables “Income?” and “Amount you would spend?” using the sample

 descriptive sample statistics sample size 100 sample Slope 9.7647849 sample intercept 25.794827 sample correlation r 0.9467401

ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics

The two variables are ‘Income’ and ‘Amount you would spend’ where both variables are quantitative in nature. The above descriptive sample table presents information for correlation and also coefficients for regression equation where sample slope is coefficient for x variable and sample intercept is the intercept value. The r is 0.95 which indicates high degree of positive correlationship between the two variables. As one variable increases, the other also increases and vice versa.

b) Just using the information for Country 2

i) Paste in descriptive sample statistics and a graph that let you investigate the claim there is a relationship between the variables “Income?” and “Amount you would spend?” using the sample

 descriptive sample statistics sample size 100 sample Slope 7.9331779 sample intercept 4.7981417 sample correlation r 0.9846244

ii) Use the output in part (i) to describe the relationship between the two variables, your discussion must use one of the following sample statistics

The two variables are ‘Income’ and ‘Amount you would spend’ where both variables are quantitative in nature. The above descriptive sample table presents information for correlation and also coefficients for regression equation where sample slope is coefficient for x variable and sample intercept is the intercept value. The r is 0.985 which indicates very high degree of positive correlationship between the two variables. As one variable increases, the other also increases and vice versa.

c) Compare the results in parts (a) and (b)

The value of r is positive for both countries indicate that the two variables, ‘income’ and ‘amount they would spend’ are positively correlated. As value of one variable increases, the value of other also increases and vice versa. This is seen in following scatterplots also where the observations are highly concentrated in upward direction.  Question 4

1. Considering all people in the country 1 dataset
2. What is sample size n:the sample size is n=100
3. What is the sample proportion of people that would buy the model XYZ TV: the sample proportion of people willing to buy the TV is 0.73
4. Use the answers in part (i) and (ii) to find the z-score of the sample proportion if you assume the population proportion p=0.5:

X is a binomial variable (people will either buy or not buy TV) with:

Mean = µ = np = 100*0.5 = 50

SD = σ = √np(1-p) = √100*.5*(1-0.5) = √50*.5 = √25 = 5

Sample proportion () = 0.73

z-score = (-p)/ σ  = (0.73-0.50)/5 = 0.046

1. Considering all people in the country 2 dataset
2. What is sample size n:the sample size is n=100
3. What is the sample proportion of people that would buy the model XYZ TV: the sample proportion of people willing to buy the TV is 0.49
4. Use the answers to parts (i) and (ii) to find zscore of the sample proportion if you assume the population proportion p=0.5:

X is a binomial variable (people will either buy or not buy TV) with:

Mean = µ = np = 100*0.5 = 50

SD = σ = √np(1-p) = √100*.5*(1-0.5) = √50*.5 = √25 = 5

Sample proportion () = 0.49

z-score = (-p)/ σ  = (0.49-0.50)/5 = -0.002

Question 5

Just using the country 1 data set, more specifically the “variables income” and “would they buy” of the country 1 dataset

1. Just considering the people that would buy the TV
2. What is sample size, sample mean and sample standard deviation of income

Using pivot table, the information is as below in highlighted row:

 Row Labels Count of income Average of income2 StdDev of income3 N 27 53.33 17.23 Y 73 69.93 16.18 Grand Total 100 65.45 17.98
1. find a 95% confidence interval for income

Confidence interval for income =

Mean = 69.93, n = 73, SD = 16.18

SD/√n = 16.18/√73 = 1.8937, n-1 = 72, α = 0.05 (2tail)

CI (95%) = 69.93 ± (1.9935)(1.8937)

= 69.93 ± 3.78

95% CI = 66.15 and 73.71

1. Just considering the people would not buy the TV
2. What is sample size, sample mean and sample standard deviation of income

Using pivot table, the information is as below in highlighted row:

 Row Labels Count of income Average of income2 StdDev of income3 N 27 53.33 17.23 Y 73 69.93 16.18 Grand Total 100 65.45 17.98
1. find a 95% confidence interval for income

Confidence interval for income =

Mean = 53.33, n = 27, SD = 17.23

SD/√n = 17.23/√27 = 3.3159, n-1 = 26, α = 0.05 (2tail)

CI (95%) = 53.33 ± (2.0555)(3.3159)

= 53.33 ± 6.82

95% CI = 46.51 and 60.1

Question 6

a) Just using the information for country 1

i) Paste in inferential statistics that measure evidence for the claim there is a relationship between the variables “Gender?” and “Would you buy?” if you consider the whole population

 Inferential statistics n1 n2 phat 1 phat 2 46 54 0.76087 0.703703704 Estimate of the difference between   population proportions phat1-phat2 0.057165862 standard error of estimate test stat two sided pvalue 0.089077397 0.641754964 0.521032295 To calculate the p-value H0:p1=p2 is   assumed to be true since the test is two sided H1 is   H1:p1≠p2

The p-value is 0.52 which is higher than significance level of 0.05. Hence, the p-value is not significant and we do not have enough statistical evidence to reject the null hypothesis.

At given level of sig of 0.05, it is concluded that there is no significant difference between the two proportions.

b) Just using the information for country 2

i) Paste in inferential statistics that measure evidence for the claim there is a relationship between the variables “Gender?” and “Would you buy?” if you consider the whole population

 Inferential statistics n1 n2 phat 1 phat 2 46 54 0.6087 0.388888889 Estimate of the difference between population proportions phat1-phat2 0.219806763 standard error of estimate test stat two sided pvalue 0.100301478 2.191460862 0.028418459 To calculate the p-value H0:p1=p2 is assumed to be true since the test is two sided H1 is H1:p1≠p2

The p-value is 0.03 which is lower than significance level of 0.05. Hence, the p-value is significant and we have enough statistical evidence to reject the null hypothesis.

At given level of sig of 0.05, it is concluded that there is significant difference between the two proportions.

c) Compare the results in parts (a) and (b)

In case of country 1, no significant difference between proportions was found while in case of Country 2, significant difference between proportions was found.

Question 7

a) Just using the information for country 1

i) Paste in computer output that measure evidence for the claim there is a relationship between the variables “would you buy?” and “amount you would spend?” if you consider the whole population

 Inferential statistics Estimate of the difference between   population means xbar1-xbar2 -164.652 standard error of estimate  xbar1-xbar2 39.1474 t test stat df two sided pvalue -4.20596 45 0.00012 To calculate the p-value H0:μ1=μ2 is   assumed to be true since the test is two sided H1 is   H1:μ1≠μ2

The p-value is less than 0.05 so there is strong evidence that there is a difference between population means. We reject the null hypothesis.

We can see that the difference in mean amount of those who are willing to spend and not willing to spend is 164.65 and it is statistically significant as concluded above.

b) Just using the information for country 2

i) Paste in computer output that measures evidence for the claim there is a relationship between the variables “would you buy?” and “amount you would spend?” if you consider the whole population

 Inferential statistics Estimate of the difference between   population means xbar1-xbar2 99.6707 standard error of estimate  xbar1-xbar2 34.4636 t test stat df two sided pvalue 2.89205 92 0.00478 To calculate the p-value H0:μ1=μ2 is   assumed to be true since the test is two sided H1 is   H1:μ1≠μ2

The p-value is less than 0.05 so there is strong evidence that there is a difference between population means. We reject the null hypothesis.

We can see that the difference in mean amount of those who are willing to spend and not willing to spend is 99.67 and it is statistically significant as concluded above.

c) Compare the results in parts (a) and (b)

Both the countries indicate statistically significant evidence for difference in mean amount that people are willing to spend (as categorised basis whether they are willing to buy or not). Country 1 indicates average which is higher by 164.65 for those willing to buy as compared to those not willing to buy. Country 2 indicates average which is lower by 99.67 for those willing to buy as compared to those not willing to buy.

Question 8

a) Just using the information for country 1

i) Paste in computer output that measures evidence for the claim there is a relationship between the variables “Income?” and “amount you would spend?” if you consider the whole population

 Inferential statistics paste this into the word file and add   comments correlation r 0.9467401 R square 0.8963169 standard error of slope 0.3354848 test stat of slope 29.106492 two sided p-value for slope 4.991E-50 To calculate the p-value   H0:population slope =0  is assumed to   be true since the test is two sided H1 is   H1:population slope ≠0

The p-value is less than 0.05 so there is strong evidence that there is relationship between the two variables as also indicated by high value of R2 of 0.896.

b) Just using the information for country 2

i) Paste in computer output that measures evidence for the claim there is a relationship between the variables “Income?” and “amount you would spend?” if you consider the whole population

 Inferential statistics paste this into the word file and add   comments correlation r 0.9846244 R square 0.9694852 standard error of slope 0.1421735 test stat of slope 55.799267 two sided p-value for slope 4.496E-76 To calculate the p-value H0:population   slope =0  is assumed to be true since the test is two sided H1 is   H1:population slope ≠0

The p-value is less than 0.05 so there is strong evidence that there is relationship between the two variables as also indicated by high value of R2 of 0.969.

c) Compare the results in parts (a) and (b)

Both countries indicate very strong positive relationship between income and amount they are willing to spend as indicated by high value of R and R2

Question 9

OPTION 2

The given report is similar in nature to the one that we are currently doing. The report discusses case of two universities where statistics is being taught using an old method and a new method.

The variables analysed include attendance of the student, marks of the student and whether they passed or failed the course. Hence,

1. Attendance which refers to number of classes attended is a quantitativevariable
2. Mark which refers to the students mark is a quantitative variable
3. Did they pass refers to the result and is a categorical variable with output being either pass or fail.
4. Which method refers to the method of teaching and is a categorical variable with output being either old or new.

The report analyses descriptive sample statistics and inferential statistics between various variables.

1. The analysis for two categorical variables, “which method” and “did they pass” uses difference in sample proportions method. P-value method is used under inferential technique.
2. The analysis for two quantitative variables, “attendance” and “marks” uses scatterplot, correlation and regression techniques. P-value method is used under inferential technique.
3. The analysis for one categorical variable, “which method” and one quantitative variable, “attendance” uses difference in sample means method.

The report also compares results for University 1 and University 2 to conclude the inferences based on use of various statistical methods.

Hence, the report is similar in using various techniques and explains how various types of variables can be analysed. The same technique cannot be used for all types of variables and hence, the method varies.