Reading

streams <-read.csv("../../data/Art, Design, and Vocation are all diff-different(Sheet1).csv")
print(head(streams))

  SN Degree Course Year Letter.Grade Score Gender
1  1  B.Des    CAC    2            A   8.0      F
2  2  B.Des    CAC    2            O   9.6      F
3  3  B.Des   IADP    2           A+   9.2      F
4  4  B.Des     CE    2            O   9.8      F
5  5  B.Des   BSSD    2            P   3.0      M
6  6  B.Des    CAC    2            O   9.5      F

## Research Question: Art, Design, and Vocation are all different?

Munging

streams_modified <- streams %>%
  mutate(Gender = as.factor(Gender)) %>%
  mutate(Degree = as.factor(Degree))%>%
  mutate(Letter.Grade = as.factor(Letter.Grade))%>%
    mutate(Course = as.factor(Course))

EDA

gf_histogram(~Score,
  fill = ~Degree, 
  data = streams_modified, 
  alpha = 0.5,  
  bins = 25  
) %>%
  gf_vline(xintercept = ~ mean(Score, na.rm = TRUE),  
            linetype = "dashed", color = "red") %>%
  gf_labs(
    title = "Histogram of scores across degrees",
    x = "Scores", 
    y = "Count"
  ) %>%
  gf_text(
    label = "Overall Mean", 
    x = mean(streams_modified$Score, na.rm = TRUE),  
    y = 2, 
    color = "red"
  ) %>%
  gf_refine(guides(fill = guide_legend(title = "")))

gf_boxplot(
  data = streams_modified,
  Score ~ Degree,
  fill = ~Degree,
  alpha = 0.5
) %>%
  gf_vline(xintercept = ~ mean(Score, na.rm = TRUE)) %>%
  gf_labs(
    title = "Boxplots of scoress",
    x = "Degree", 
    y = "Scores",
   
  ) %>%
  gf_refine(
    scale_x_discrete(guide = "prism_bracket"),
    guides(fill = guide_legend(title = ""))
  )

Warning: The S3 guide system was deprecated in ggplot2 3.5.0.
ℹ It has been replaced by a ggproto system that can be extended.

- median score for bdes is the highest, lowest is b.voc and bfa lies between the two

outliers for b.voc
bvoc distribution is skewed left and bdes skewed right with high concentration of scores on higher end
bfa distribuion seems to be evenly spread out

Anova

streams_anova <- aov(Score ~ Degree, data = streams_modified)

supernova::pairwise(streams_anova,
  correction = "Bonferroni", # Try "Tukey"
  alpha = 0.05, # 95% CI calculation
  var_equal = TRUE, # We'll see
  plot = TRUE
)

── Pairwise t-tests with Bonferroni correction ─────────────────────────────────

Model: Score ~ Degree

Degree

Levels: 3

Family-wise error-rate: 0.049


  group_1 group_2   diff pooled_se      t    df  lower  upper  p_adj
  <chr>   <chr>    <dbl>     <dbl>  <dbl> <int>  <dbl>  <dbl>  <dbl>
1 B.FA    B.Des   -0.757     0.201 -3.770    87 -1.191 -0.323  .0009
2 B.Voc   B.Des   -0.173     0.201 -0.864    87 -0.607  0.261 1.0000
3 B.Voc   B.FA     0.583     0.201  2.906    87  0.149  1.017  .0139

Inferences

significant difference between the mean scores of B.FA and B.Des.
also a significant difference between the mean scores of B.Voc and B.Des and B.Voc and B.FA.

B.Des > B.FA > B.Voc

Reading

## Research Question: Art, Design, and Vocation are all different?

Munging

EDA

- median score for bdes is the highest, lowest is b.voc and bfa lies between the two

Anova

Inferences

B.Des > B.FA > B.Voc