Code
pacman::p_load(ggstatsplot, tidyverse)[Hands On Exercise 4b]
FirGhaz
February 2, 2024
The learning outcomes for this exercise are:
-ggstatsplot package to create visual graphics with rich statistical information,
-performance package to visualise model diagnostics, and
=parameters package to visualise model parameters
ggstatsplot is an extension of ggplot2 package for creating graphics with details from statistical tests included in the information-rich plots themselves. To provide alternative statistical inference methods by default. To follow best practices for statistical reporting. For all statistical tests reported in the plots, the default template abides by the APA gold standard for statistical reporting. For example, here are results from a robust t-test:
In this exercise, ggstatsplot and tidyverse will be used.
For this exercise, the Exam data will be imported:
Rows: 322 Columns: 7
ββ Column specification ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Delimiter: ","
chr (4): ID, CLASS, GENDER, RACE
dbl (3): ENGLISH, MATHS, SCIENCE
βΉ Use `spec()` to retrieve the full column specification for this data.
βΉ Specify the column types or set `show_col_types = FALSE` to quiet this message.
gghistostats() methodIn the code chunk below, gghistostats() is used to to build an visual of one-sample test on English scores.
Default information: - statistical details - Bayes Factor - sample sizes - distribution summary
A Bayes factor is the ratio of the likelihood of one particular hypothesis to the likelihood of another. It can be interpreted as a measure of the strength of evidence in favor of one theory among two competing theories.
Thatβs because the Bayes factor gives us a way to evaluate the data in favor of a null hypothesis, and to use external information to do so. It tells us what the weight of the evidence is in favor of a given hypothesis.
When we are comparing two hypotheses, H1 (the alternate hypothesis) and H0 (the null hypothesis), the Bayes Factor is often written as B10. It can be defined mathematically as
The Schwarz criterion is one of the easiest ways to calculate rough approximation of the Bayes Factor.
A Bayes Factor can be any positive number. One of the most common interpretations is this oneβfirst proposed by Harold Jeffereys (1961) and slightly modified by Lee and Wagenmakers in 2013:
ggbetweenstats()In the code chunk below, ggbetweenstats() is used to build a visual for two-sample mean test of Maths scores by gender.
Warning in min(x): no non-missing arguments to min; returning Inf
Warning in max(x): no non-missing arguments to max; returning -Inf

Default information: - statistical details - Bayes Factor - sample sizes - distribution summary
ggbetweenstats()In the code chunk below, ggbetweenstats() is used to build a visual for One-way ANOVA test on English score by race.
Warning in min(x): no non-missing arguments to min; returning Inf
Warning in max(x): no non-missing arguments to max; returning -Inf

-βnsβ β only non-significant -βsβ β only significant -βallβ β everything
In the code chunk below, ggscatterstats() is used to build a visual for Significant Test of Correlation between Maths scores and English scores.
In the code chunk below, the Maths scores is binned into a 4-class variable by using cut()
In this code chunk below ggbarstats() is used to build a visual for Significant Test of Association
In this section, you will learn how to visualise model diagnostic and model parameters by using parameters package.
-Toyota Corolla case study will be used. The purpose of study is to build a model to discover factors affecting prices of used-cars by taking into consideration a set of explanatory variables.
Installing and importing
In the code chunk below, read_xls() of readxl package is used to import the data worksheet of ToyotaCorolla.xls workbook into R.
# A tibble: 1,436 Γ 38
Id Model Price Age_08_04 Mfg_Month Mfg_Year KM Quarterly_Tax Weight
<dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 81 TOYOTA β¦ 18950 25 8 2002 20019 100 1180
2 1 TOYOTA β¦ 13500 23 10 2002 46986 210 1165
3 2 TOYOTA β¦ 13750 23 10 2002 72937 210 1165
4 3 TOYOTA⦠13950 24 9 2002 41711 210 1165
5 4 TOYOTA β¦ 14950 26 7 2002 48000 210 1165
6 5 TOYOTA β¦ 13750 30 3 2002 38500 210 1170
7 6 TOYOTA β¦ 12950 32 1 2002 61000 210 1170
8 7 TOYOTA⦠16900 27 6 2002 94612 210 1245
9 8 TOYOTA β¦ 18600 30 3 2002 75889 210 1245
10 44 TOYOTA β¦ 16950 27 6 2002 110404 234 1255
# βΉ 1,426 more rows
# βΉ 29 more variables: Guarantee_Period <dbl>, HP_Bin <chr>, CC_bin <chr>,
# Doors <dbl>, Gears <dbl>, Cylinders <dbl>, Fuel_Type <chr>, Color <chr>,
# Met_Color <dbl>, Automatic <dbl>, Mfr_Guarantee <dbl>,
# BOVAG_Guarantee <dbl>, ABS <dbl>, Airbag_1 <dbl>, Airbag_2 <dbl>,
# Airco <dbl>, Automatic_airco <dbl>, Boardcomputer <dbl>, CD_Player <dbl>,
# Central_Lock <dbl>, Powered_Windows <dbl>, Power_Steering <dbl>, β¦
Notice that the output object car_resale is a tibble data frame.
The code chunk below is used to calibrate a multiple linear regression model by using lm() of Base Stats of R.
Call:
lm(formula = Price ~ Age_08_04 + Mfg_Year + KM + Weight + Guarantee_Period,
data = car_resale)
Coefficients:
(Intercept) Age_08_04 Mfg_Year KM
-2.637e+06 -1.409e+01 1.315e+03 -2.323e-02
Weight Guarantee_Period
1.903e+01 2.770e+01
In the code chunk, check_collinearity() of [performance] (https://easystats.github.io/performance/) package.
# Check for Multicollinearity
Low Correlation
Term VIF VIF 95% CI Increased SE Tolerance Tolerance 95% CI
KM 1.46 [ 1.37, 1.57] 1.21 0.68 [0.64, 0.73]
Weight 1.41 [ 1.32, 1.51] 1.19 0.71 [0.66, 0.76]
Guarantee_Period 1.04 [ 1.01, 1.17] 1.02 0.97 [0.86, 0.99]
High Correlation
Term VIF VIF 95% CI Increased SE Tolerance Tolerance 95% CI
Age_08_04 31.07 [28.08, 34.38] 5.57 0.03 [0.03, 0.04]
Mfg_Year 31.16 [28.16, 34.48] 5.58 0.03 [0.03, 0.04]
In the code chunk, check_normality() of performance package.
In the code chunk, check_heteroscedasticity() of performance package.
We can also perform the complete by using check_model().
In the code below, plot() of see package and parameters() of parameters package is used to visualise the parameters of a regression model.
ggcoefstats() methodsIn the code below, ggcoefstats() of ggstatsplot package to visualise the parameters of a regression model.