r markdown

Text Data Analysis - Michael Scott Version

library(tidyverse) library(tidytext) library(textdata) library(pdftools) library(ggwordcloud) library(here) library(pander) library(ggthemes) here::i_am("dinnerparty.txt") The Office - Season 4 x Episode 9: The Dinner Party Read in the text transcript Source: https://www.officequotes.net/no4-09.php dinner_text <- read_delim(here("dinnerparty.txt"), "\t", escape_double = FALSE, col_names = FALSE, trim_ws = TRUE) %>% rename(lines = X1) Each row is a a line of the transcript for each character. Example: First line from the episode is from Stanley Hudson: dinner_line1 <- dinner_text[1,] dinner_line1 %>% pander() lines Stanley: This is ridiculous.

Data Wrangling in R

A series of R functions that could be useful when preparing your dataset for analysis.

Binary Logistic Regression

This markdown follows along the logistic regression lecture presented in a linear models class and provides two examples to demonstrate the models and their interpretation.

Machine Learning Course Project

The final Machine Learning project that covers visualization, exploratory analysis, and classification techniques using R.

Lab 5: Binary Logistic Regression

library(pander) #pander() library(psych) # describe() library(gtsummary) #tbl_summary() library(equatiomatic) # extract_eq() library(sjPlot) # tab_xtab(), tab_model() library(tidyverse) This lab follows along the logistic regression lecture presented in class and provides two examples to demonstrate the models and their interpretation. What is Binary Logistic Regression? It is a regression with an outcome variable (or dependent variable) that is dichotomous/binary (i.e., only two categories, such as Yes or No, 0 or 1, Disorder or No Disorder, Win or Lose).

Lab 4: Mediation

library(haven) #read_sav() library(mediation) # mediate() (Tingley, Yamamoto, Hirose, Keele, & Imai, 2014) library(gvlma) # gvlma() library(kableExtra) #kable() library(corrr) #correlate() library(psych) #mediate() library(tidyverse) Mediation in R This examples comes from a tutorial paper on mediation found here. This is an open source paper that shared the data set used in this example. The following is code and interpretation are based on this paper. Note that they use a different package to estimate the model and you will see that our estimates are very close, though not exact.

Lab 3: Moderation

library(knitr) #include_graphics() library(equatiomatic) # extract_eq() library(psych) #describe() library(gtsummary) #tbl_summary() library(summarytools) #descr() library(stargazer) #stargazer() library(sjPlot) #tab_model() library(interactions) #interact_plot(), sim_slopes() library(jtools) #summ() library(tidyverse) Moderation When the research hypotheses state that different categories, or levels of another variable, may have differing responses to other independent variables, we need to use interaction terms

  • Also called moderation Example: The relationship between discrimination and grades depends on prog. Graph drawn using draw.io. Moderation Example Suppose you are doing a simple study on weight loss and notice that people who spend more time exercising lose more weight.

Lab 2: Simple Linear Regression

library(psych) #describe() library(PerformanceAnalytics) #chart.correlation() library(lm.beta) #lm.beta() library(sjPlot) #tab_model() library(gridExtra) #grid.arrange() library(tidyverse) This lab will be an overview of simple linear regression and was created along side Karen’s lectures and her code. Simple Linear Regression The Equation The equation of simple linear regression is this: $$Y_i = \alpha + \beta{x}_i + \epsilon_i $$ Expressed as Betas: $$ Y_i = \beta_0 + \beta{x}_i + \epsilon_i $$ When we are predicting Y, we express the prediction equation as this:

Lab 1: Data Screening and Cleaning

#Load packages here library(haven) #for read_sav() function library(psych) # describe() library(ggpubr) # ggdensity() and ggqqplot() library(apaTables) # apa.cor.table() library(tidyverse) Data Screening & Cleaning The data sets used in this lab are used in Chapter 4 of Tabachnick & Fidell (2012; I believe there is a new version from 2019 now). The chapter is provided in GauchoSpace and is a useful resource if you need more information on how to clean and screen your data and write up the results.