Title: | A Toolkit for Behavioral Scientists |
---|---|
Description: | A collection of functions for analyzing data typically collected or used by behavioral scientists. Examples of the functions include a function that compares groups in a factorial experimental design, a function that conducts two-way analysis of variance (ANOVA), and a function that cleans a data set generated by Qualtrics surveys. Some of the functions will require installing additional package(s). Such packages and other references are cited within the section describing the relevant functions. Many functions in this package rely heavily on these two popular R packages: Dowle et al. (2021) <https://CRAN.R-project.org/package=data.table>. Wickham et al. (2021) <https://CRAN.R-project.org/package=ggplot2>. |
Authors: | Jin Kim [aut, cre] |
Maintainer: | Jin Kim <[email protected]> |
License: | GPL-3 |
Version: | 0.5.432 |
Built: | 2024-11-08 11:14:15 UTC |
Source: | https://github.com/jinkim3/kim |
Compare adequacy of different models by calculating their Akaike weights and the associated evidence ratio.
akaike_weights(aic_values = NULL, print_output_explanation = TRUE)
akaike_weights(aic_values = NULL, print_output_explanation = TRUE)
aic_values |
a vector of AIC values |
print_output_explanation |
logical. Should an explanation about how to read the output be printed? (default = TRUE). |
Please refer to Wagenmakers & Farrell (2004), doi:10.3758/BF03206482
the output will be a data.table showing AIC weights, their evidence ratio(s), etc.
# default reference AIC value is the minimum AIC value, e.g., 202 below. akaike_weights(c(204, 202, 206, 206, 214))
# default reference AIC value is the minimum AIC value, e.g., 202 below. akaike_weights(c(204, 202, 206, 206, 214))
Take a function and assign all the parameters defined within it as values in the specified environment (e.g., global environment)
assign_fn_parameters_as_vars(fun = NULL, envir = NULL)
assign_fn_parameters_as_vars(fun = NULL, envir = NULL)
fun |
a function |
envir |
an environment in which to assign the parameters as
values (default = |
This function can be useful when you are testing a function and you need to set all the function's parameters in a single operation.
## Not run: assign_fn_parameters_as_vars(pm) assign_fn_parameters_as_vars(mean) assign_fn_parameters_as_vars(sum) assign_fn_parameters_as_vars(lm) assign_fn_parameters_as_vars(floodlight_2_by_continuous) ## End(Not run)
## Not run: assign_fn_parameters_as_vars(pm) assign_fn_parameters_as_vars(mean) assign_fn_parameters_as_vars(sum) assign_fn_parameters_as_vars(lm) assign_fn_parameters_as_vars(floodlight_2_by_continuous) ## End(Not run)
Barplot for counts
barplot_for_counts(data = NULL, x, y)
barplot_for_counts(data = NULL, x, y)
data |
a data object (a data frame or a data.table) |
x |
name of the variable that will be on the x axis of the barplot |
y |
name of the variable that will be on the y axis of the barplot |
barplot_for_counts(x = 1:3, y = 7:9) barplot_for_counts(data = data.frame( cyl = names(table(mtcars$cyl)), count = as.vector(table(mtcars$cyl))), x = "cyl", y = "count")
barplot_for_counts(x = 1:3, y = 7:9) barplot_for_counts(data = data.frame( cyl = names(table(mtcars$cyl)), count = as.vector(table(mtcars$cyl))), x = "cyl", y = "count")
Conduct a binomial test. In other words, test whether an observed proportion of "successes" (e.g., proportion of heads in a series of coin tosses) is greater than the expected proportion (e.g., 0.5). This function uses the 'binom.test' function from the 'stats' package.
binomial_test( x = NULL, success = NULL, failure = NULL, p = 0.5, alternative = "two.sided", ci = 0.95, round_percentages = 0 )
binomial_test( x = NULL, success = NULL, failure = NULL, p = 0.5, alternative = "two.sided", ci = 0.95, round_percentages = 0 )
x |
a vector of values, each of which represents an instance of either a "success" or "failure" (e.g., c("s", "f", "s", "s", "f", "s")) |
success |
which value(s) indicate "successes"? |
failure |
(optional) which value(s) indicate "failures"? If no input is provided for this argument, then all the non-NA values that are not declared to be "successes" will be treated as "failures". |
p |
hypothesized probability of success (default = 0.5) |
alternative |
indicates the alternative hypothesis and must be
one of "two.sided", "greater", or "less". You can specify just the
initial letter. By default, |
ci |
width of the confidence interval (default = 0.95) |
round_percentages |
number of decimal places to which to round the percentages in the summary table (default = 0) |
# sample vector sample_vector <- c(0, 1, 1, 0, 1, 98, 98, 99, NA) binomial_test( x = sample_vector, success = 1, failure = 0) binomial_test( x = sample_vector, success = 1, failure = 0, p = 0.1, alternative = "greater") binomial_test( x = sample_vector, success = c(1, 99), failure = c(0, 98), p = 0.6, alternative = "less")
# sample vector sample_vector <- c(0, 1, 1, 0, 1, 98, 98, 99, NA) binomial_test( x = sample_vector, success = 1, failure = 0) binomial_test( x = sample_vector, success = 1, failure = 0, p = 0.1, alternative = "greater") binomial_test( x = sample_vector, success = c(1, 99), failure = c(0, 98), p = 0.6, alternative = "less")
Draw a square bracket with a label on a ggplot
bracket( xmin = NULL, xmax = NULL, ymin = NULL, ymax = NULL, vertical = NULL, horizontal = NULL, open = NULL, bracket_shape = NULL, thickness = 2, bracket_color = "black", label = NULL, label_hjust = NULL, label_vjust = NULL, label_font_size = 5, label_font_face = "bold", label_color = "black", label_parse = FALSE )
bracket( xmin = NULL, xmax = NULL, ymin = NULL, ymax = NULL, vertical = NULL, horizontal = NULL, open = NULL, bracket_shape = NULL, thickness = 2, bracket_color = "black", label = NULL, label_hjust = NULL, label_vjust = NULL, label_font_size = 5, label_font_face = "bold", label_color = "black", label_parse = FALSE )
xmin |
xmin |
xmax |
xmax |
ymin |
ymin |
ymax |
ymax |
vertical |
vertical |
horizontal |
horizontal |
open |
open |
bracket_shape |
bracket_shape |
thickness |
thickness |
bracket_color |
bracket_color |
label |
label |
label_hjust |
label_hjust |
label_vjust |
label_vjust |
label_font_size |
label_font_size |
label_font_face |
label_font_face |
label_color |
label_font_face |
label_parse |
label_parse |
a ggplot object; there will be no meaningful output from this function. Instead, this function should be used with another ggplot object
library(ggplot2) ggplot(mtcars, aes(x = cyl, y = mpg)) + geom_point() + bracket(6.1, 6.2, 17, 22, bracket_shape = "]", label = "abc")
library(ggplot2) ggplot(mtcars, aes(x = cyl, y = mpg)) + geom_point() + bracket(6.1, 6.2, 17, 22, bracket_shape = "]", label = "abc")
Capitalizes the first letter (by default) or a substring of a given character string or each element of the character vector
capitalize(x, start = 1, end = 1)
capitalize(x, start = 1, end = 1)
x |
a character string or a character vector |
start |
starting position of the susbtring (default = 1) |
end |
ending position of the susbtring (default = 1) |
a character string or a character vector
capitalize("abc") capitalize(c("abc", "xyx"), start = 2, end = 3)
capitalize("abc") capitalize(c("abc", "xyx"), start = 2, end = 3)
Change variable names in a data set
change_var_names( data = NULL, old_var_names = NULL, new_var_names = NULL, skip_absent = FALSE, print_summary = TRUE, output_type = "dt" )
change_var_names( data = NULL, old_var_names = NULL, new_var_names = NULL, skip_absent = FALSE, print_summary = TRUE, output_type = "dt" )
data |
a data object (a data frame or a data.table) |
old_var_names |
a vector of old variable names (i.e., variable names to change) |
new_var_names |
a vector of new variable names |
skip_absent |
If |
print_summary |
If |
output_type |
type of the output. If |
a data.table object with changed variable names
change_var_names( mtcars, old = c("mpg", "cyl"), new = c("mpg_new", "cyl_new"))
change_var_names( mtcars, old = c("mpg", "cyl"), new = c("mpg_new", "cyl_new"))
Check modes of objects
check_modes(..., mode_to_confirm = NULL)
check_modes(..., mode_to_confirm = NULL)
... |
R objects. |
mode_to_confirm |
The function will test whether each input is
of this mode. For example, |
check_modes(1L, mode_to_confirm = "numeric") check_modes( TRUE, FALSE, 1L, 1:3, 1.1, c(1.2, 1.3), "abc", 1 + 2i, intToBits(1L), mode_to_confirm = "numeric")
check_modes(1L, mode_to_confirm = "numeric") check_modes( TRUE, FALSE, 1L, 1:3, 1.1, c(1.2, 1.3), "abc", 1 + 2i, intToBits(1L), mode_to_confirm = "numeric")
Check whether required packages are installed.
check_req_pkg(pkg = NULL)
check_req_pkg(pkg = NULL)
pkg |
a character vector containing names of packages to check |
there will be no output from this function. Rather, the function will check whether the packages given as inputs are installed.
check_req_pkg("data.table") check_req_pkg(c("base", "utils", "ggplot2", "data.table"))
check_req_pkg("data.table") check_req_pkg(c("base", "utils", "ggplot2", "data.table"))
Conduct a chi-squared test and produce a contingency table
chi_squared_test( data = NULL, iv_name = NULL, dv_name = NULL, round_chi_sq_test_stat = 2, round_p = 3, sigfigs_proportion = 2, correct = TRUE, odds_ratio_ci = 0.95, round_odds_ratio_ci_limits = 2, invert = FALSE, notify_na_count = NULL )
chi_squared_test( data = NULL, iv_name = NULL, dv_name = NULL, round_chi_sq_test_stat = 2, round_p = 3, sigfigs_proportion = 2, correct = TRUE, odds_ratio_ci = 0.95, round_odds_ratio_ci_limits = 2, invert = FALSE, notify_na_count = NULL )
data |
a data object (a data frame or a data.table) |
iv_name |
name of the independent variable |
dv_name |
name of the dependent variable (must be a binary variable) |
round_chi_sq_test_stat |
number of decimal places to which to round the chi-squared test statistic (default = 2) |
round_p |
number of decimal places to which to round the p-value from the chi-squared test (default = 3) |
sigfigs_proportion |
number of significant digits to round to
(for the table of proportions). By default |
correct |
logical. Should continuity correction be applied? (default = TRUE) |
odds_ratio_ci |
width of the confidence interval for the odds ratio.
Input can be any value less than 1 and greater than or equal to 0.
By default, |
round_odds_ratio_ci_limits |
number of decimal places to which to round the limits of the odds ratio's confidence interval (default = 2) |
invert |
logical. Whether the inverse of the odds ratio (i.e., 1 / odds ratio) should be returned. |
notify_na_count |
if |
chi_squared_test(data = mtcars, iv_name = "cyl", dv_name = "am") # if the iv has only two levels, odds ratio will also be calculated chi_squared_test(data = mtcars, iv_name = "vs", dv_name = "am")
chi_squared_test(data = mtcars, iv_name = "cyl", dv_name = "am") # if the iv has only two levels, odds ratio will also be calculated chi_squared_test(data = mtcars, iv_name = "vs", dv_name = "am")
Conducts a chi-squared test for every possible pairwise comparison with Bonferroni correction
chi_squared_test_pairwise( data = NULL, iv_name = NULL, dv_name = NULL, focal_dv_value = NULL, contingency_table = TRUE, contingency_table_sigfigs = 2, percent_and_total = FALSE, percentages_only = NULL, counts_only = NULL, sigfigs = 3, chi_sq_test_stats = FALSE, correct = TRUE )
chi_squared_test_pairwise( data = NULL, iv_name = NULL, dv_name = NULL, focal_dv_value = NULL, contingency_table = TRUE, contingency_table_sigfigs = 2, percent_and_total = FALSE, percentages_only = NULL, counts_only = NULL, sigfigs = 3, chi_sq_test_stats = FALSE, correct = TRUE )
data |
a data object (a data frame or a data.table) |
iv_name |
name of the independent variable (must be a categorical variable) |
dv_name |
name of the dependent variable (must be a binary variable) |
focal_dv_value |
focal value of the dependent variable whose frequencies will be calculated (i.e., the value of the dependent variable that will be considered a "success" or a result of interest) |
contingency_table |
If |
contingency_table_sigfigs |
number of significant digits that the contingency table's percentage values should be rounded to (default = 2) |
percent_and_total |
logical. If |
percentages_only |
tabulate percentages of the focal DV value only |
counts_only |
tabulate counts of the focal DV value only |
sigfigs |
number of significant digits to round to |
chi_sq_test_stats |
if |
correct |
logical. Should continuity correction be applied? (default = TRUE) |
chi_squared_test_pairwise(data = mtcars, iv_name = "vs", dv_name = "am") chi_squared_test_pairwise(data = mtcars, iv_name = "vs", dv_name = "am", percentages_only = TRUE) # using 3 mtcars data sets combined chi_squared_test_pairwise( data = rbind(mtcars, rbind(mtcars, mtcars)), iv_name = "cyl", dv_name = "am") # include the total counts chi_squared_test_pairwise( data = rbind(mtcars, rbind(mtcars, mtcars)), iv_name = "cyl", dv_name = "am", percent_and_total = TRUE) # display counts chi_squared_test_pairwise( data = rbind(mtcars, rbind(mtcars, mtcars)), iv_name = "cyl", dv_name = "am", contingency_table = "counts")
chi_squared_test_pairwise(data = mtcars, iv_name = "vs", dv_name = "am") chi_squared_test_pairwise(data = mtcars, iv_name = "vs", dv_name = "am", percentages_only = TRUE) # using 3 mtcars data sets combined chi_squared_test_pairwise( data = rbind(mtcars, rbind(mtcars, mtcars)), iv_name = "cyl", dv_name = "am") # include the total counts chi_squared_test_pairwise( data = rbind(mtcars, rbind(mtcars, mtcars)), iv_name = "cyl", dv_name = "am", percent_and_total = TRUE) # display counts chi_squared_test_pairwise( data = rbind(mtcars, rbind(mtcars, mtcars)), iv_name = "cyl", dv_name = "am", contingency_table = "counts")
Returns the confidence interval of the mean of a numeric vector.
ci_of_mean(x = NULL, confidence_level = 0.95, notify_na_count = NULL)
ci_of_mean(x = NULL, confidence_level = 0.95, notify_na_count = NULL)
x |
a numeric vector |
confidence_level |
What is the desired confidence level expressed as a decimal? (default = 0.95) |
notify_na_count |
if |
the output will be a named numeric vector with the lower and upper limit of the confidence interval.
ci_of_mean(x = 1:100, confidence_level = 0.95) ci_of_mean(mtcars$mpg)
ci_of_mean(x = 1:100, confidence_level = 0.95) ci_of_mean(mtcars$mpg)
Clean a data set downloaded from Qualtrics
clean_data_from_qualtrics( data = NULL, remove_survey_preview_data = TRUE, remove_test_response_data = TRUE, default_cols_by_qualtrics = NULL, default_cols_by_qualtrics_new = NULL, warn_accuracy_loss = FALSE, click_data_cols = "rm", page_submit_cols = "move_to_right" )
clean_data_from_qualtrics( data = NULL, remove_survey_preview_data = TRUE, remove_test_response_data = TRUE, default_cols_by_qualtrics = NULL, default_cols_by_qualtrics_new = NULL, warn_accuracy_loss = FALSE, click_data_cols = "rm", page_submit_cols = "move_to_right" )
data |
a data object (a data frame or a data.table) |
remove_survey_preview_data |
logical. Whether to remove data from survey preview (default = TRUE) |
remove_test_response_data |
logical. Whether to remove data from test response (default = TRUE) |
default_cols_by_qualtrics |
names of columns that Qualtrics
includes in the data set by default (e.g., "StartDate", "Finished").
Accepting the default value |
default_cols_by_qualtrics_new |
new names for columns that
Qualtrics includes in the data set by default
(e.g., "StartDate", "Finished").
Accepting the default value |
warn_accuracy_loss |
logical. whether to warn the user if converting character to numeric leads to loss of accuracy. (default = FALSE) |
click_data_cols |
if |
page_submit_cols |
if |
a data.table object
clean_data_from_qualtrics(mtcars) clean_data_from_qualtrics(mtcars, default_cols_by_qualtrics = "mpg", default_cols_by_qualtrics_new = "mpg2")
clean_data_from_qualtrics(mtcars) clean_data_from_qualtrics(mtcars, default_cols_by_qualtrics = "mpg", default_cols_by_qualtrics_new = "mpg2")
Calculates the (population or sample) coefficient of variation of a given numeric vector
coefficent_of_variation(vector, pop_or_sample = "pop")
coefficent_of_variation(vector, pop_or_sample = "pop")
vector |
a numeric vector |
pop_or_sample |
should coefficient of variation be calculated for a "population" or a "sample"? |
a numeric value
coefficent_of_variation(1:4, pop_or_sample = "sample") coefficent_of_variation(1:4, pop_or_sample = "pop")
coefficent_of_variation(1:4, pop_or_sample = "sample") coefficent_of_variation(1:4, pop_or_sample = "pop")
To run this function, the following package(s) must be installed: Package 'psych' v2.1.9 (or possibly a higher version) by William Revelle (2021), https://cran.r-project.org/package=psych
cohen_d( sample_1 = NULL, sample_2 = NULL, data = NULL, iv_name = NULL, dv_name = NULL, ci_range = 0.95, output_type = "all" )
cohen_d( sample_1 = NULL, sample_2 = NULL, data = NULL, iv_name = NULL, dv_name = NULL, ci_range = 0.95, output_type = "all" )
sample_1 |
a vector of values in the first of two samples |
sample_2 |
a vector of values in the second of two samples |
data |
a data object (a data frame or a data.table) |
iv_name |
name of the independent variable |
dv_name |
name of the dependent variable |
ci_range |
range of the confidence interval for Cohen's d (default = 0.95) |
output_type |
If |
## Not run: cohen_d(sample_1 = 1:10, sample_2 = 3:12) cohen_d(data = mtcars, iv_name = "vs", dv_name = "mpg", ci_range = 0.99) sample_dt <- data.table::data.table(iris)[Species != "setosa"] cohen_d(data = sample_dt, iv_name = "Species", dv_name = "Petal.Width") ## End(Not run)
## Not run: cohen_d(sample_1 = 1:10, sample_2 = 3:12) cohen_d(data = mtcars, iv_name = "vs", dv_name = "mpg", ci_range = 0.99) sample_dt <- data.table::data.table(iris)[Species != "setosa"] cohen_d(data = sample_dt, iv_name = "Species", dv_name = "Petal.Width") ## End(Not run)
Calculates Cohen's d, its standard error, and confidence interval, as illustrated in the Borenstein et al. (2009, ISBN: 978-0-470-05724-7).
cohen_d_borenstein( sample_1 = NULL, sample_2 = NULL, data = NULL, iv_name = NULL, dv_name = NULL, direction = "2_minus_1", ci_range = 0.95, output_type = "all", initial_value = 0 )
cohen_d_borenstein( sample_1 = NULL, sample_2 = NULL, data = NULL, iv_name = NULL, dv_name = NULL, direction = "2_minus_1", ci_range = 0.95, output_type = "all", initial_value = 0 )
sample_1 |
a vector of values in the first of two samples |
sample_2 |
a vector of values in the second of two samples |
data |
a data object (a data frame or a data.table) |
iv_name |
name of the independent variable |
dv_name |
name of the dependent variable |
direction |
If |
ci_range |
range of the confidence interval for Cohen's d (default = 0.95) |
output_type |
If |
initial_value |
initial value of the noncentrality parameter for optimization (default = 0). Adjust this value if confidence interval results look strange. |
cohen_d_borenstein(sample_1 = 1:10, sample_2 = 3:12) cohen_d_borenstein( data = mtcars, iv_name = "vs", dv_name = "mpg", ci_range = 0.99) sample_dt <- data.table::data.table(iris)[Species != "setosa"] cohen_d_borenstein( data = sample_dt, iv_name = "Species", dv_name = "Petal.Width", initial_value = 10)
cohen_d_borenstein(sample_1 = 1:10, sample_2 = 3:12) cohen_d_borenstein( data = mtcars, iv_name = "vs", dv_name = "mpg", ci_range = 0.99) sample_dt <- data.table::data.table(iris)[Species != "setosa"] cohen_d_borenstein( data = sample_dt, iv_name = "Species", dv_name = "Petal.Width", initial_value = 10)
To run this function, the following package(s) must be installed: Package 'psych' v2.1.9 (or possibly a higher version) by William Revelle (2021), https://cran.r-project.org/package=psych
cohen_d_for_one_sample(x = NULL, mu = NULL)
cohen_d_for_one_sample(x = NULL, mu = NULL)
x |
a numeric vector containing values whose mean will be calculated |
mu |
the true mean |
cohen_d_for_one_sample(x = 1:10, mu = 3) cohen_d_for_one_sample(x = c(1:10, NA, NA), mu = 3)
cohen_d_for_one_sample(x = 1:10, mu = 3) cohen_d_for_one_sample(x = c(1:10, NA, NA), mu = 3)
Calculates Cohen's d as described in Jacob Cohen's textbook (1988), Statistical Power Analysis for the Behavioral Sciences, 2nd Edition Cohen, J. (1988) doi:10.4324/9780203771587
cohen_d_from_cohen_textbook( sample_1 = NULL, sample_2 = NULL, data = NULL, iv_name = NULL, dv_name = NULL )
cohen_d_from_cohen_textbook( sample_1 = NULL, sample_2 = NULL, data = NULL, iv_name = NULL, dv_name = NULL )
sample_1 |
a vector of values in the first of two samples |
sample_2 |
a vector of values in the second of two samples |
data |
a data object (a data frame or a data.table) |
iv_name |
name of the independent variable |
dv_name |
name of the dependent variable |
the output will be a Cohen's d value (a numeric vector of length one)
cohen_d_from_cohen_textbook(1:10, 3:12) cohen_d_from_cohen_textbook( data = mtcars, iv_name = "vs", dv_name = "mpg" )
cohen_d_from_cohen_textbook(1:10, 3:12) cohen_d_from_cohen_textbook( data = mtcars, iv_name = "vs", dv_name = "mpg" )
Plot Cohen's d as sample size increases.
cohen_d_over_n( data = NULL, iv_name = NULL, dv_name = NULL, save_as_png = FALSE, png_name = NULL, xlab = NULL, ylab = NULL, width = 16, height = 9 )
cohen_d_over_n( data = NULL, iv_name = NULL, dv_name = NULL, save_as_png = FALSE, png_name = NULL, xlab = NULL, ylab = NULL, width = 16, height = 9 )
data |
a data object (a data frame or a data.table) |
iv_name |
name of the independent variable (grouping variable) |
dv_name |
name of the dependent variable (measure variable of interest) |
save_as_png |
if |
png_name |
name of the PNG file to be saved. By default, the name will be "cohen_d_over_n_" followed by a timestamp of the current time. The timestamp will be in the format, jan_01_2021_1300_10_000001, where "jan_01_2021" would indicate January 01, 2021; 1300 would indicate 13:00 (i.e., 1 PM); and 10_000001 would indicate 10.000001 seconds after the hour. |
xlab |
title of the x-axis for the histogram by group.
If |
ylab |
title of the y-axis for the histogram by group.
If |
width |
width of the plot to be saved. This argument will be
directly entered as the |
height |
height of the plot to be saved. This argument will be
directly entered as the |
the output will be a list of (1) ggplot object (histogram by group) and (2) a data.table with Cohen's d by sample size
## Not run: cohen_d_over_n(data = mtcars, iv_name = "am", dv_name = "mpg") ## End(Not run)
## Not run: cohen_d_over_n(data = mtcars, iv_name = "am", dv_name = "mpg") ## End(Not run)
Convert d (standardized mean difference or Cohen's d) to r (correlation), as illustrated in Borenstein et al. (2009, p. 48, ISBN: 978-0-470-05724-7)
cohen_d_to_r(d = NULL, n1 = NULL, n2 = NULL, d_var = NULL)
cohen_d_to_r(d = NULL, n1 = NULL, n2 = NULL, d_var = NULL)
d |
Cohen's d (the input can be a vector of values) |
n1 |
sample size in the first of two group (the input can be a vector of values) |
n2 |
sample size in the second of two group (the input can be a vector of values) |
d_var |
(optional argument) variance of d (the input can be a vector of values). If this argument receives an input, variance of r will be returned as well. |
the output will be a vector of correlation values (and variances of r if the argument d_var received an input)
## Not run: cohen_d_to_r(1) cohen_d_to_r(d = 1:3) cohen_d_to_r(d = 1:3, n1 = c(100, 200, 300), n2 = c(50, 250, 900)) cohen_d_to_r(1.1547) cohen_d_to_r(d = 1.1547, d_var = .0550) cohen_d_to_r(d = 1:2, d_var = 1:2) ## End(Not run)
## Not run: cohen_d_to_r(1) cohen_d_to_r(d = 1:3) cohen_d_to_r(d = 1:3, n1 = c(100, 200, 300), n2 = c(50, 250, 900)) cohen_d_to_r(1.1547) cohen_d_to_r(d = 1.1547, d_var = .0550) cohen_d_to_r(d = 1:2, d_var = 1:2) ## End(Not run)
To run this function, the following package(s) must be installed: Package 'effsize' v0.8.1 (or possibly a higher version) by Marco Torchiano (2020), https://cran.r-project.org/package=effsize
cohen_d_torchiano( sample_1 = NULL, sample_2 = NULL, data = NULL, iv_name = NULL, dv_name = NULL, ci_range = 0.95 )
cohen_d_torchiano( sample_1 = NULL, sample_2 = NULL, data = NULL, iv_name = NULL, dv_name = NULL, ci_range = 0.95 )
sample_1 |
a vector of values in the first of two samples |
sample_2 |
a vector of values in the second of two samples |
data |
a data object (a data frame or a data.table) |
iv_name |
name of the independent variable |
dv_name |
name of the dependent variable |
ci_range |
range of the confidence interval for Cohen's d (default = 0.95) |
cohen_d_torchiano(1:10, 3:12) cohen_d_torchiano( data = mtcars, iv_name = "vs", dv_name = "mpg", ci_range = 0.99)
cohen_d_torchiano(1:10, 3:12) cohen_d_torchiano( data = mtcars, iv_name = "vs", dv_name = "mpg", ci_range = 0.99)
Combine data across columns. If NA is the only value across all focal columns for given row(s), NA will be returned for those row(s).
combine_data_across_cols(data = NULL, cols = NULL)
combine_data_across_cols(data = NULL, cols = NULL)
data |
a data object (a data frame or a data.table) |
cols |
a character vector containing names of columns, across which to combine data |
the output will be a numeric or character vector.
dt <- data.frame(v1 = c(1, NA), v2 = c(NA, 2)) dt combine_data_across_cols(data = dt, cols = c("v1", "v2")) dt <- data.frame(v1 = c(1, 2, NA), v2 = c(NA, 4, 3)) dt combine_data_across_cols(data = dt, cols = c("v1", "v2")) dt <- data.frame(v1 = c(1, NA, NA), v2 = c(NA, 2, NA)) dt combine_data_across_cols(data = dt, cols = c("v1", "v2"))
dt <- data.frame(v1 = c(1, NA), v2 = c(NA, 2)) dt combine_data_across_cols(data = dt, cols = c("v1", "v2")) dt <- data.frame(v1 = c(1, 2, NA), v2 = c(NA, 4, 3)) dt combine_data_across_cols(data = dt, cols = c("v1", "v2")) dt <- data.frame(v1 = c(1, NA, NA), v2 = c(NA, 2, NA)) dt combine_data_across_cols(data = dt, cols = c("v1", "v2"))
Convert a comma-separated string of numbers
comma_sep_string_to_numbers(string)
comma_sep_string_to_numbers(string)
string |
a character string consisting of numbers separated by commas |
a character string
comma_sep_string_to_numbers("1, 2, 3,4, 5 6")
comma_sep_string_to_numbers("1, 2, 3,4, 5 6")
Compares whether or not data sets are identical
compare_datasets(dataset_1 = NULL, dataset_2 = NULL, dataset_list = NULL)
compare_datasets(dataset_1 = NULL, dataset_2 = NULL, dataset_list = NULL)
dataset_1 |
a data object (a data frame or a data.table) |
dataset_2 |
another data object (a data frame or a data.table) |
dataset_list |
list of data objects (data.frame or data.table) |
the output will be a data.table showing differences in data sets
# catch differences in class attributes of the data sets compare_datasets( dataset_1 = data.frame(a = 1:2, b = 3:4), dataset_2 = data.table::data.table(a = 1:2, b = 3:4)) # catch differences in number of columns compare_datasets( dataset_1 = data.frame(a = 1:2, b = 3:4, c = 5:6), dataset_2 = data.frame(a = 1:2, b = 3:4)) # catch differences in number of rows compare_datasets( dataset_1 = data.frame(a = 1:2, b = 3:4), dataset_2 = data.frame(a = 1:10, b = 11:20)) # catch differences in column names compare_datasets( dataset_1 = data.frame(A = 1:2, B = 3:4), dataset_2 = data.frame(a = 1:2, b = 3:4)) # catch differences in values within corresponding columns compare_datasets( dataset_1 = data.frame(a = 1:2, b = c(3, 400)), dataset_2 = data.frame(a = 1:2, b = 3:4)) compare_datasets( dataset_1 = data.frame(a = 1:2, b = 3:4, c = 5:6), dataset_2 = data.frame(a = 1:2, b = c(3, 4), c = c(5, 6))) # check if data sets in a list are identical compare_datasets( dataset_list = list( dt1 = data.frame(a = 1:2, b = 3:4, c = 5:6), dt2 = data.frame(a = 1:2, b = 3:4), dt3 = data.frame(a = 1:2, b = 3:4, c = 5:6)))
# catch differences in class attributes of the data sets compare_datasets( dataset_1 = data.frame(a = 1:2, b = 3:4), dataset_2 = data.table::data.table(a = 1:2, b = 3:4)) # catch differences in number of columns compare_datasets( dataset_1 = data.frame(a = 1:2, b = 3:4, c = 5:6), dataset_2 = data.frame(a = 1:2, b = 3:4)) # catch differences in number of rows compare_datasets( dataset_1 = data.frame(a = 1:2, b = 3:4), dataset_2 = data.frame(a = 1:10, b = 11:20)) # catch differences in column names compare_datasets( dataset_1 = data.frame(A = 1:2, B = 3:4), dataset_2 = data.frame(a = 1:2, b = 3:4)) # catch differences in values within corresponding columns compare_datasets( dataset_1 = data.frame(a = 1:2, b = c(3, 400)), dataset_2 = data.frame(a = 1:2, b = 3:4)) compare_datasets( dataset_1 = data.frame(a = 1:2, b = 3:4, c = 5:6), dataset_2 = data.frame(a = 1:2, b = c(3, 4), c = c(5, 6))) # check if data sets in a list are identical compare_datasets( dataset_list = list( dt1 = data.frame(a = 1:2, b = 3:4, c = 5:6), dt2 = data.frame(a = 1:2, b = 3:4), dt3 = data.frame(a = 1:2, b = 3:4, c = 5:6)))
Compares whether two dependent correlations from the same sample are significantly different each other.
compare_dependent_rs( data = NULL, var_1_name = NULL, var_2_name = NULL, var_3_name = NULL, one_tailed = FALSE, round_r = 3, round_p = 3, round_t = 2, print_summary = TRUE, return_dt = FALSE )
compare_dependent_rs( data = NULL, var_1_name = NULL, var_2_name = NULL, var_3_name = NULL, one_tailed = FALSE, round_r = 3, round_p = 3, round_t = 2, print_summary = TRUE, return_dt = FALSE )
data |
a data object (a data frame or a data.table) |
var_1_name |
name of the variable whose correlations with two other variables will be compared. |
var_2_name |
name of the first of the two variables whose
correlations with |
var_3_name |
name of the second of the two variables whose
correlations with |
one_tailed |
logical. Should the p value based on a one-tailed t-test? (default = FALSE) |
round_r |
number of decimal places to which to round correlation coefficients (default = 2) |
round_p |
number of decimal places to which to round p-values (default = 3) |
round_t |
number of decimal places to which to round the t-statistic (default = 2) |
print_summary |
logical. Should the summary be printed? (default = TRUE) |
return_dt |
logical. Should the function return a summary table as an output, as opposed to returning the output through the "invisible" function? (default = FALSE) |
Suppose that Variables A, B, and C are measured from a group of subjects. This function tests whether A is related to B differently than to C. Put differently, this function tests H0: r(A, B) = r(A, C)
For more information on formulas used in this function, please refer to Steiger (1980) doi:10.1037/0033-2909.87.2.245 and Chen & Popovich (2002) doi:10.4135/9781412983808
the output will be a summary of the test comparing two dependent correlations
compare_dependent_rs( data = mtcars, var_1_name = "mpg", var_2_name = "hp", var_3_name = "wt")
compare_dependent_rs( data = mtcars, var_1_name = "mpg", var_2_name = "hp", var_3_name = "wt")
Compares effect sizes See p. 156 of Borenstein et al. (2009, ISBN: 978-0-470-05724-7).
compare_effect_sizes( effect_sizes = NULL, effect_size_variances = NULL, round_stats = TRUE, round_p = 3, round_se = 2, round_z = 2, pretty_round_p_value = TRUE )
compare_effect_sizes( effect_sizes = NULL, effect_size_variances = NULL, round_stats = TRUE, round_p = 3, round_se = 2, round_z = 2, pretty_round_p_value = TRUE )
effect_sizes |
a vector of estimated effect sizes |
effect_size_variances |
a vector of variances of the effect sizes |
round_stats |
logical. Should the statistics be rounded? (default = TRUE) |
round_p |
number of decimal places to which to round p-values (default = 3) |
round_se |
number of decimal places to which to round the standard errors of the difference (default = 2) |
round_z |
number of decimal places to which to round the z-statistic (default = 2) |
pretty_round_p_value |
logical. Should the p-values be rounded
in a pretty format (i.e., lower threshold: "<.001").
By default, |
compare_effect_sizes( effect_sizes = c(0.6111, 0.3241, 0.5), effect_size_variances = c(.0029, 0.0033, 0.01))
compare_effect_sizes( effect_sizes = c(0.6111, 0.3241, 0.5), effect_size_variances = c(.0029, 0.0033, 0.01))
Compares groups by (1) creating histogram by group; (2) summarizing descriptive statistics by group; and (3) conducting pairwise comparisons (t-tests and Mann-Whitney tests).
compare_groups( data = NULL, iv_name = NULL, dv_name = NULL, sigfigs = 3, stats = "basic", welch = TRUE, cohen_d = TRUE, cohen_d_w_ci = TRUE, adjust_p = "holm", bonferroni = NULL, mann_whitney = TRUE, t_test_stats = TRUE, round_p = 3, anova = FALSE, round_f = 2, round_t = 2, round_t_test_df = 2, save_as_png = FALSE, png_name = NULL, xlab = NULL, ylab = NULL, x_limits = NULL, x_breaks = NULL, x_labels = NULL, width = 5000, height = 3600, units = "px", res = 300, layout_matrix = NULL, col_names_nicer = TRUE, convert_dv_to_numeric = TRUE )
compare_groups( data = NULL, iv_name = NULL, dv_name = NULL, sigfigs = 3, stats = "basic", welch = TRUE, cohen_d = TRUE, cohen_d_w_ci = TRUE, adjust_p = "holm", bonferroni = NULL, mann_whitney = TRUE, t_test_stats = TRUE, round_p = 3, anova = FALSE, round_f = 2, round_t = 2, round_t_test_df = 2, save_as_png = FALSE, png_name = NULL, xlab = NULL, ylab = NULL, x_limits = NULL, x_breaks = NULL, x_labels = NULL, width = 5000, height = 3600, units = "px", res = 300, layout_matrix = NULL, col_names_nicer = TRUE, convert_dv_to_numeric = TRUE )
data |
a data object (a data frame or a data.table) |
iv_name |
name of the independent variable (grouping variable) |
dv_name |
name of the dependent variable (measure variable of interest) |
sigfigs |
number of significant digits to round to |
stats |
statistics to calculate for each group.
If |
welch |
Should Welch's t-tests be conducted?
By default, |
cohen_d |
if |
cohen_d_w_ci |
if |
adjust_p |
the name of the method to use to adjust p-values.
If |
bonferroni |
The use of this argument is deprecated.
Use the 'adjust_p' argument instead.
If |
mann_whitney |
if |
t_test_stats |
if |
round_p |
number of decimal places to which to round p-values (default = 3) |
anova |
Should a one-way ANOVA be conducted and reported?
By default, |
round_f |
number of decimal places to which to round the f statistic (default = 2) |
round_t |
number of decimal places to which to round the t statistic (default = 2) |
round_t_test_df |
number of decimal places to which to round the degrees of freedom for t tests (default = 2) |
save_as_png |
if |
png_name |
name of the PNG file to be saved. By default, the name will be "compare_groups_results_" followed by a timestamp of the current time. The timestamp will be in the format, jan_01_2021_1300_10_000001, where "jan_01_2021" would indicate January 01, 2021; 1300 would indicate 13:00 (i.e., 1 PM); and 10_000001 would indicate 10.000001 seconds after the hour. |
xlab |
title of the x-axis for the histogram by group.
If |
ylab |
title of the y-axis for the histogram by group.
If |
x_limits |
a numeric vector with values of the endpoints of the x axis. |
x_breaks |
a numeric vector indicating the points at which to place tick marks on the x axis. |
x_labels |
a vector containing labels for the place tick marks on the x axis. |
width |
width of the PNG file (default = 5000) |
height |
height of the PNG file (default = 3600) |
units |
the units for the |
res |
The nominal resolution in ppi which will be recorded
in the png file, if a positive integer. Used for units
other than the default. By default, |
layout_matrix |
The layout argument for arranging plots and tables
using the |
col_names_nicer |
if |
convert_dv_to_numeric |
logical. Should the values in the dependent variable be converted to numeric for plotting the histograms? (default = TRUE) |
holm |
if |
the output will be a list of (1) ggplot object
(histogram by group) (2) a data.table with descriptive statistics by
group; and (3) a data.table with pairwise comparison results.
If save_as_png = TRUE
, the plot and tables will be also saved
on local drive as a PNG file.
## Not run: compare_groups(data = iris, iv_name = "Species", dv_name = "Sepal.Length") compare_groups(data = iris, iv_name = "Species", dv_name = "Sepal.Length", x_breaks = 4:8) # Welch's t-test compare_groups( data = mtcars, iv_name = "am", dv_name = "hp") # A Student's t-test compare_groups( data = mtcars, iv_name = "am", dv_name = "hp", welch = FALSE) ## End(Not run)
## Not run: compare_groups(data = iris, iv_name = "Species", dv_name = "Sepal.Length") compare_groups(data = iris, iv_name = "Species", dv_name = "Sepal.Length", x_breaks = 4:8) # Welch's t-test compare_groups( data = mtcars, iv_name = "am", dv_name = "hp") # A Student's t-test compare_groups( data = mtcars, iv_name = "am", dv_name = "hp", welch = FALSE) ## End(Not run)
Compares whether two correlations from two independent samples are significantly different each other. See Field et al. (2012, ISBN: 978-1-4462-0045-2).
compare_independent_rs( r1 = NULL, n1 = NULL, r2 = NULL, n2 = NULL, one_tailed = FALSE, round_p = 3, round_z_diff = 2, round_r = 2, print_summary = TRUE, output_type = NULL )
compare_independent_rs( r1 = NULL, n1 = NULL, r2 = NULL, n2 = NULL, one_tailed = FALSE, round_p = 3, round_z_diff = 2, round_r = 2, print_summary = TRUE, output_type = NULL )
r1 |
correlation in the first sample |
n1 |
size of the first sample |
r2 |
correlation in the second sample |
n2 |
size of the first sample |
one_tailed |
logical. Should the p value based on a one-tailed t-test? (default = FALSE) |
round_p |
(only for displaying purposes) number of decimal places to which to round the p-value (default = 3) |
round_z_diff |
(only for displaying purposes) number of decimal places to which to round the z-score (default = 2) |
round_r |
(only for displaying purposes) number of decimal places to which to round correlation coefficients (default = 2) |
print_summary |
logical. Should the summary be printed? (default = TRUE) |
output_type |
type of the output. If |
the output will be the results of a test comparing two independent correlations.
compare_independent_rs(r1 = .1, n1 = 100, r2 = .2, n2 = 200) compare_independent_rs( r1 = .1, n1 = 100, r2 = .2, n2 = 200, one_tailed = TRUE) compare_independent_rs(r1 = .506, n1 = 52, r2 = .381, n2 = 51)
compare_independent_rs(r1 = .1, n1 = 100, r2 = .2, n2 = 200) compare_independent_rs( r1 = .1, n1 = 100, r2 = .2, n2 = 200, one_tailed = TRUE) compare_independent_rs(r1 = .506, n1 = 52, r2 = .381, n2 = 51)
Create a contingency table that takes two variables as inputs
contingency_table( data = NULL, row_var_name = NULL, col_var_name = NULL, row = NULL, col = NULL, output_type = "table" )
contingency_table( data = NULL, row_var_name = NULL, col_var_name = NULL, row = NULL, col = NULL, output_type = "table" )
data |
a data object (a data frame or a data.table) |
row_var_name |
name of the variable whose values will fill the rows of the contingency table |
col_var_name |
name of the variable whose values will fill the columns of the contingency table |
row |
a vector whose values will fill the rows of the contingency table |
col |
a vector whose values will fill the columns of the contingency table |
output_type |
If |
contingency_table( data = mtcars, row_var_name = "am", col_var_name = "cyl") contingency_table(row = mtcars$cyl, col = mtcars$am) contingency_table(mtcars, "am", "cyl", output_type = "dt")
contingency_table( data = mtcars, row_var_name = "am", col_var_name = "cyl") contingency_table(row = mtcars$cyl, col = mtcars$am) contingency_table(mtcars, "am", "cyl", output_type = "dt")
Check whether each column in a data.table can be converted to numeric, and if so, convert every such column.
convert_cols_to_numeric( data = NULL, classes = "character", warn_accuracy_loss = TRUE, print_summary = TRUE, silent = FALSE )
convert_cols_to_numeric( data = NULL, classes = "character", warn_accuracy_loss = TRUE, print_summary = TRUE, silent = FALSE )
data |
a data object (a data frame or a data.table) |
classes |
a character vector specifying classes of columns
that will be converted. For example, if |
warn_accuracy_loss |
logical. whether to warn the user if converting character to numeric leads to loss of accuracy. (default = TRUE) |
print_summary |
If |
silent |
If |
data_frame_1 <- data.frame(a = c("1", "2"), b = c("1", "b"), c = 1:2) convert_cols_to_numeric(data = data_frame_1) data_table_1 <- data.table::data.table( a = c("1", "2"), b = c("1", "b"), c = 1:2) convert_cols_to_numeric(data = data_table_1)
data_frame_1 <- data.frame(a = c("1", "2"), b = c("1", "b"), c = 1:2) convert_cols_to_numeric(data = data_frame_1) data_table_1 <- data.table::data.table( a = c("1", "2"), b = c("1", "b"), c = 1:2) convert_cols_to_numeric(data = data_table_1)
Convert elements of a character vector to Excel formulas to preserve the character (string) format when opened in an Excel file.
convert_to_excel_formula(vector = NULL)
convert_to_excel_formula(vector = NULL)
vector |
a character vector |
the output will be a character vector formatted as an Excel
formula. For example, if an element in the input vector was ".500"
,
this element will be converted to =".500"
, which will show up as
".500" in Excel, rather than as "0.5"
## Not run: # compare the two csv files below # example 1 dt <- data.table::data.table(a = ".500") data.table::fwrite(dt, "example1.csv") # the csv will show "0.5" # example 2 dt <- data.table::data.table(a = convert_to_excel_formula(".500")) data.table::fwrite(dt, "example2.csv") # the csv will show ".500" ## End(Not run)
## Not run: # compare the two csv files below # example 1 dt <- data.table::data.table(a = ".500") data.table::fwrite(dt, "example1.csv") # the csv will show "0.5" # example 2 dt <- data.table::data.table(a = convert_to_excel_formula(".500")) data.table::fwrite(dt, "example2.csv") # the csv will show ".500" ## End(Not run)
Estimate the correlation between two variables
correlation_kim( x = NULL, y = NULL, data = NULL, x_var_name = NULL, y_var_name = NULL, ci_range = 0.95, round_r = 2, round_p = 3, output_type = "summary" )
correlation_kim( x = NULL, y = NULL, data = NULL, x_var_name = NULL, y_var_name = NULL, ci_range = 0.95, round_r = 2, round_p = 3, output_type = "summary" )
x |
a numeric vector of data values |
y |
a numeric vector of data values |
data |
(optional) a data object (a data frame or a data.table) |
x_var_name |
(optional) name of the first variable (if using a data set as an input) |
y_var_name |
(optional) name of the second variable (if using a data set as an input) |
ci_range |
range of the confidence interval for the correlation
coefficient. If |
round_r |
number of decimal places to which to round correlation coefficients (default = 2) |
round_p |
number of decimal places to which to round p-values (default = 3) |
output_type |
type of the output. If |
## Not run: correlation_kim(x = 1:4, y = c(1, 3, 2, 4)) correlation_kim(x = 1:4, y = c(1, 3, 2, 4), ci_range = FALSE) # output as a data table correlation_kim(x = 1:4, y = c(1, 3, 2, 4), output_type = "dt") ## End(Not run)
## Not run: correlation_kim(x = 1:4, y = c(1, 3, 2, 4)) correlation_kim(x = 1:4, y = c(1, 3, 2, 4), ci_range = FALSE) # output as a data table correlation_kim(x = 1:4, y = c(1, 3, 2, 4), output_type = "dt") ## End(Not run)
Creates a correlation matrix
correlation_matrix( data = NULL, var_names = NULL, row_var_names = NULL, col_var_names = NULL, round_r = 2, round_p = 3, output_type = "rp", numbered_cols = NULL )
correlation_matrix( data = NULL, var_names = NULL, row_var_names = NULL, col_var_names = NULL, round_r = 2, round_p = 3, output_type = "rp", numbered_cols = NULL )
data |
a data object (a data frame or a data.table) |
var_names |
names of the variables for which to calculate all pairwise correlations |
row_var_names |
names of the variables that will go on the rows of the correlation matrix |
col_var_names |
names of the variables that will go on the columns of the correlation matrix |
round_r |
number of decimal places to which to round correlation coefficients (default = 2) |
round_p |
number of decimal places to which to round p-values (default = 3) |
output_type |
which value should be filled in cells of the
correlation matrix? If |
numbered_cols |
logical. If |
the output will be a correlation matrix in a data.table format
correlation_matrix(data = mtcars, var_names = c("mpg", "cyl", "wt")) correlation_matrix(data = mtcars, row_var_names = c("mpg", "cyl", "hp"), col_var_names = c("wt", "am")) correlation_matrix( data = mtcars, var_names = c("mpg", "cyl", "wt"), numbered_cols = FALSE) correlation_matrix( data = mtcars, var_names = c("mpg", "cyl", "wt"), output_type = "r")
correlation_matrix(data = mtcars, var_names = c("mpg", "cyl", "wt")) correlation_matrix(data = mtcars, row_var_names = c("mpg", "cyl", "hp"), col_var_names = c("wt", "am")) correlation_matrix( data = mtcars, var_names = c("mpg", "cyl", "wt"), numbered_cols = FALSE) correlation_matrix( data = mtcars, var_names = c("mpg", "cyl", "wt"), output_type = "r")
Plots or tabulates cumulative percentages associated with elements in a vector
cum_percent_plot(vector, output_type = "plot")
cum_percent_plot(vector, output_type = "plot")
vector |
a numeric vector |
output_type |
if |
cum_percent_plot(c(1:100, NA, NA)) cum_percent_plot(mtcars$mpg) cum_percent_plot(vector= mtcars$mpg, output_type = "dt")
cum_percent_plot(c(1:100, NA, NA)) cum_percent_plot(mtcars$mpg) cum_percent_plot(vector= mtcars$mpg, output_type = "dt")
Returns descriptive statistics for a numeric vector.
desc_stats( vector = NULL, output_type = "vector", sigfigs = 3, se_of_mean = FALSE, ci = FALSE, pi = FALSE, skewness = FALSE, kurtosis = FALSE, notify_na_count = NULL, print_dt = FALSE )
desc_stats( vector = NULL, output_type = "vector", sigfigs = 3, se_of_mean = FALSE, ci = FALSE, pi = FALSE, skewness = FALSE, kurtosis = FALSE, notify_na_count = NULL, print_dt = FALSE )
vector |
a numeric vector |
output_type |
if |
sigfigs |
number of significant digits to round to (default = 3) |
se_of_mean |
logical. Should the standard errors around the mean be included in the descriptive stats? (default = FALSE) |
ci |
logical. Should 95% CI be included in the descriptive stats? (default = FALSE) |
pi |
logical. Should 95% PI be included in the descriptive stats? (default = FALSE) |
skewness |
logical. Should the skewness statistic be included in the descriptive stats? (default = FALSE) |
kurtosis |
logical. Should the kurtosis statistic be included in the descriptive stats? (default = FALSE) |
notify_na_count |
if |
print_dt |
if |
if output_type = "vector"
, the output will be a
named numeric vector of descriptive statistics;
if output_type = "dt"
, the output will be data.table of
descriptive statistics.
desc_stats(1:100) desc_stats(1:100, ci = TRUE, pi = TRUE, sigfigs = 2) desc_stats(1:100, se_of_mean = TRUE, ci = TRUE, pi = TRUE, sigfigs = 2, skewness = TRUE, kurtosis = TRUE) desc_stats(c(1:100, NA)) example_dt <- desc_stats(vector = c(1:100, NA), output_type = "dt") example_dt
desc_stats(1:100) desc_stats(1:100, ci = TRUE, pi = TRUE, sigfigs = 2) desc_stats(1:100, se_of_mean = TRUE, ci = TRUE, pi = TRUE, sigfigs = 2, skewness = TRUE, kurtosis = TRUE) desc_stats(c(1:100, NA)) example_dt <- desc_stats(vector = c(1:100, NA), output_type = "dt") example_dt
Returns descriptive statistics by group
desc_stats_by_group( data = NULL, var_for_stats = NULL, grouping_vars = NULL, stats = "all", sigfigs = NULL, cols_to_round = NULL )
desc_stats_by_group( data = NULL, var_for_stats = NULL, grouping_vars = NULL, stats = "all", sigfigs = NULL, cols_to_round = NULL )
data |
a data object (a data frame or a data.table) |
var_for_stats |
name of the variable for which descriptive statistics will be calculated |
grouping_vars |
name(s) of grouping variables |
stats |
statistics to calculate. If |
sigfigs |
number of significant digits to round to |
cols_to_round |
names of columns whose values will be rounded |
the output will be a data.table showing descriptive statistics of the variable for each of the groups formed by the grouping variables.
desc_stats_by_group(data = mtcars, var_for_stats = "mpg", grouping_vars = c("vs", "am")) desc_stats_by_group(data = mtcars, var_for_stats = "mpg", grouping_vars = c("vs", "am"), sigfigs = 3) desc_stats_by_group(data = mtcars, var_for_stats = "mpg", grouping_vars = c("vs", "am"), stats = "basic", sigfigs = 2) desc_stats_by_group(data = mtcars, var_for_stats = "mpg", grouping_vars = c("vs", "am"), stats = "basic", sigfigs = 2, cols_to_round = "all") desc_stats_by_group(data = mtcars, var_for_stats = "mpg", grouping_vars = c("vs", "am"), stats = c("mean", "median"), sigfigs = 2, cols_to_round = "all")
desc_stats_by_group(data = mtcars, var_for_stats = "mpg", grouping_vars = c("vs", "am")) desc_stats_by_group(data = mtcars, var_for_stats = "mpg", grouping_vars = c("vs", "am"), sigfigs = 3) desc_stats_by_group(data = mtcars, var_for_stats = "mpg", grouping_vars = c("vs", "am"), stats = "basic", sigfigs = 2) desc_stats_by_group(data = mtcars, var_for_stats = "mpg", grouping_vars = c("vs", "am"), stats = "basic", sigfigs = 2, cols_to_round = "all") desc_stats_by_group(data = mtcars, var_for_stats = "mpg", grouping_vars = c("vs", "am"), stats = c("mean", "median"), sigfigs = 2, cols_to_round = "all")
Detach all user-installed packages
detach_user_installed_pkgs(exceptions = NULL, force = FALSE, keep_kim = TRUE)
detach_user_installed_pkgs(exceptions = NULL, force = FALSE, keep_kim = TRUE)
exceptions |
a character vector of names of packages to keep attached |
force |
logical. Should a package be detached even though other
attached packages depend on it? By default, |
keep_kim |
logical. If |
## Not run: detach_user_installed_pkgs() ## End(Not run)
## Not run: detach_user_installed_pkgs() ## End(Not run)
Return all duplicated values in a vector. This function is a copy of the earlier function, find_duplicates, in Package 'kim'
duplicated_values(vector = NULL, na.rm = TRUE, sigfigs = 2, output = "summary")
duplicated_values(vector = NULL, na.rm = TRUE, sigfigs = 2, output = "summary")
vector |
a vector whose elements will be checked for duplicates |
na.rm |
logical. If |
sigfigs |
number of significant digits to round to in the percent column of the summary (default = 2) |
output |
type of output. If |
the output will be a data.table object (summary), a vector of duplicated values, or a vector non-duplicated values.
duplicated_values(mtcars$cyl) duplicated_values(mtcars$cyl, output = "duplicated_values") duplicated_values(vector = c(mtcars$cyl, 11:20, NA, NA)) duplicated_values(vector = c(mtcars$cyl, 11:20, NA, NA), na.rm = FALSE) duplicated_values(vector = c(mtcars$cyl, 11:20, NA, NA), na.rm = FALSE, sigfigs = 4, output = "duplicated_values")
duplicated_values(mtcars$cyl) duplicated_values(mtcars$cyl, output = "duplicated_values") duplicated_values(vector = c(mtcars$cyl, 11:20, NA, NA)) duplicated_values(vector = c(mtcars$cyl, 11:20, NA, NA), na.rm = FALSE) duplicated_values(vector = c(mtcars$cyl, 11:20, NA, NA), na.rm = FALSE, sigfigs = 4, output = "duplicated_values")
Alias for the 'convert_to_excel_formula' function. Convert elements of a character vector to Excel formulas to preserve the character (string) format when opened in an Excel file.
excel_formula_convert(vector = NULL)
excel_formula_convert(vector = NULL)
vector |
a character vector |
the output will be a character vector formatted as an Excel
formula. For example, if an element in the input vector was ".500"
,
this element will be converted to =".500"
, which will show up as
".500" in Excel, rather than as "0.5"
## Not run: # compare the two csv files below # example 1 dt <- data.table::data.table(a = ".500") data.table::fwrite(dt, "example1.csv") # the csv will show "0.5" # example 2 dt <- data.table::data.table(a = excel_formula_convert(".500")) data.table::fwrite(dt, "example2.csv") # the csv will show ".500" ## End(Not run)
## Not run: # compare the two csv files below # example 1 dt <- data.table::data.table(a = ".500") data.table::fwrite(dt, "example1.csv") # the csv will show "0.5" # example 2 dt <- data.table::data.table(a = excel_formula_convert(".500")) data.table::fwrite(dt, "example2.csv") # the csv will show ".500" ## End(Not run)
Exit from a Parent Function
exit_from_parent_function( n = 1, silent = FALSE, message = "Exiting from a parent function" )
exit_from_parent_function( n = 1, silent = FALSE, message = "Exiting from a parent function" )
n |
the number of generations to go back (default = 1) |
silent |
logical. If |
message |
message to print |
fn1 <- function() { print(1) print(2) } fn1() fn2 <- function() { print(1) exit_from_parent_function() print(2) } fn2()
fn1 <- function() { print(1) print(2) } fn1() fn2 <- function() { print(1) exit_from_parent_function() print(2) } fn2()
Conduct a two-way factorial analysis of variance (ANOVA).
factorial_anova_2_way( data = NULL, dv_name = NULL, iv_1_name = NULL, iv_2_name = NULL, iv_1_values = NULL, iv_2_values = NULL, sigfigs = 3, robust = FALSE, iterations = 2000, plot = TRUE, error_bar = "ci", error_bar_range = 0.95, error_bar_tip_width = 0.13, error_bar_thickness = 1, error_bar_caption = TRUE, line_colors = NULL, line_types = NULL, line_thickness = 1, dot_size = 3, position_dodge = 0.13, x_axis_title = NULL, y_axis_title = NULL, y_axis_title_vjust = 0.85, legend_title = NULL, legend_position = "right", output = "anova_table", png_name = NULL, width = 7000, height = 4000, units = "px", res = 300, layout_matrix = NULL )
factorial_anova_2_way( data = NULL, dv_name = NULL, iv_1_name = NULL, iv_2_name = NULL, iv_1_values = NULL, iv_2_values = NULL, sigfigs = 3, robust = FALSE, iterations = 2000, plot = TRUE, error_bar = "ci", error_bar_range = 0.95, error_bar_tip_width = 0.13, error_bar_thickness = 1, error_bar_caption = TRUE, line_colors = NULL, line_types = NULL, line_thickness = 1, dot_size = 3, position_dodge = 0.13, x_axis_title = NULL, y_axis_title = NULL, y_axis_title_vjust = 0.85, legend_title = NULL, legend_position = "right", output = "anova_table", png_name = NULL, width = 7000, height = 4000, units = "px", res = 300, layout_matrix = NULL )
data |
a data object (a data frame or a data.table) |
dv_name |
name of the dependent variable |
iv_1_name |
name of the first independent variable |
iv_2_name |
name of the second independent variable |
iv_1_values |
restrict all analyses to observations having these values for the first independent variable |
iv_2_values |
restrict all analyses to observations having these values for the second independent variable |
sigfigs |
number of significant digits to which to round values in anova table (default = 3) |
robust |
if |
iterations |
number of bootstrap samples for robust ANOVA. The default is set at 2000, but consider increasing the number of samples to 5000, 10000, or an even larger number, if slower handling time is not an issue. |
plot |
if |
error_bar |
if |
error_bar_range |
width of the confidence interval
(default = 0.95 for 95 percent confidence interval).
This argument will not apply when |
error_bar_tip_width |
graphically, width of the segments at the end of error bars (default = 0.13) |
error_bar_thickness |
thickness of the error bars (default = 1) |
error_bar_caption |
should a caption be included to indicate the width of the error bars? (default = TRUE). |
line_colors |
colors of the lines connecting means (default = NULL)
If the second IV has two levels, then by default,
|
line_types |
types of the lines connecting means (default = NULL)
If the second IV has two levels, then by default,
|
line_thickness |
thickness of the lines connecting group means, (default = 1) |
dot_size |
size of the dots indicating group means (default = 3) |
position_dodge |
by how much should the group means and error bars be horizontally offset from each other so as not to overlap? (default = 0.13) |
x_axis_title |
a character string for the x-axis title. If no
input is entered, then, by default, the first value of
|
y_axis_title |
a character string for the y-axis title. If no
input is entered, then, by default, |
y_axis_title_vjust |
position of the y axis title (default = 0.85).
By default, |
legend_title |
a character for the legend title. If no input
is entered, then, by default, the second value of |
legend_position |
position of the legend:
|
output |
output type can be one of the following:
|
png_name |
name of the PNG file to be saved.
If |
width |
width of the PNG file (default = 7000) |
height |
height of the PNG file (default = 4000) |
units |
the units for the |
res |
The nominal resolution in ppi which will be recorded in the png file, if a positive integer. Used for units other than the default. If not specified, taken as 300 ppi to set the size of text and line widths. |
layout_matrix |
The layout argument for arranging plots and tables
using the |
The following package(s) must be installed prior to running this function: Package 'car' v3.0.9 (or possibly a higher version) by Fox et al. (2020), https://cran.r-project.org/package=car
If robust ANOVA is to be conducted, the following package(s) must be installed prior to running the function: Package 'WRS2' v1.1-1 (or possibly a higher version) by Mair & Wilcox (2021), https://cran.r-project.org/package=WRS2
by default, the output will be "anova_table"
factorial_anova_2_way( data = mtcars, dv_name = "mpg", iv_1_name = "vs", iv_2_name = "am", iterations = 100) anova_results <- factorial_anova_2_way( data = mtcars, dv_name = "mpg", iv_1_name = "vs", iv_2_name = "am", output = "all") anova_results
factorial_anova_2_way( data = mtcars, dv_name = "mpg", iv_1_name = "vs", iv_2_name = "am", iterations = 100) anova_results <- factorial_anova_2_way( data = mtcars, dv_name = "mpg", iv_1_name = "vs", iv_2_name = "am", output = "all") anova_results
Find duplicated values in a vector
find_duplicates(vector = NULL, na.rm = TRUE, sigfigs = 2, output = "summary")
find_duplicates(vector = NULL, na.rm = TRUE, sigfigs = 2, output = "summary")
vector |
a vector whose elements will be checked for duplicates |
na.rm |
logical. If |
sigfigs |
number of significant digits to round to in the percent column of the summary (default = 2) |
output |
type of output. If |
the output will be a data.table object (summary), a vector of duplicated values, or a vector non-duplicated values.
find_duplicates(mtcars$cyl) find_duplicates(mtcars$cyl, output = "duplicated_values") find_duplicates(vector = c(mtcars$cyl, 11:20, NA, NA)) find_duplicates(vector = c(mtcars$cyl, 11:20, NA, NA), na.rm = FALSE) find_duplicates(vector = c(mtcars$cyl, 11:20, NA, NA), na.rm = FALSE, sigfigs = 4, output = "duplicated_values")
find_duplicates(mtcars$cyl) find_duplicates(mtcars$cyl, output = "duplicated_values") find_duplicates(vector = c(mtcars$cyl, 11:20, NA, NA)) find_duplicates(vector = c(mtcars$cyl, 11:20, NA, NA), na.rm = FALSE) find_duplicates(vector = c(mtcars$cyl, 11:20, NA, NA), na.rm = FALSE, sigfigs = 4, output = "duplicated_values")
Perform Fisher's r-to-Z transformation for given correlation coefficient(s).
fisher_z_transform(r = NULL)
fisher_z_transform(r = NULL)
r |
a (vector of) correlation coefficient(s) |
the output will be a vector of Z values which were transformed from the given r values.
fisher_z_transform(0.99) fisher_z_transform(r = seq(0.1, 0.5, 0.1))
fisher_z_transform(0.99) fisher_z_transform(r = seq(0.1, 0.5, 0.1))
Conduct a floodlight analysis for 2 x Continuous design.
floodlight_2_by_continuous( data = NULL, iv_name = NULL, dv_name = NULL, mod_name = NULL, covariate_name = NULL, interaction_p_include = TRUE, iv_level_order = NULL, output = "reg_lines_plot", jitter_x_y_percent = 0, jitter_x_percent = 0, jitter_y_percent = 0, dot_alpha = 0.5, dot_size = 4, interaction_p_value_font_size = 8, jn_point_label_add = TRUE, jn_point_font_size = 8, jn_point_label_hjust = NULL, lines_at_mod_extremes = FALSE, interaction_p_vjust = -3, plot_margin = ggplot2::unit(c(75, 7, 7, 7), "pt"), legend_position = "right", reg_line_types = c("solid", "dashed"), jn_line_types = c("solid", "solid"), jn_line_thickness = 1.5, colors_for_iv = c("red", "blue"), sig_region_color = "green", sig_region_alpha = 0.08, nonsig_region_color = "gray", nonsig_region_alpha = 0.08, x_axis_title = NULL, y_axis_title = NULL, legend_title = NULL, round_decimals_int_p_value = 3, line_of_fit_thickness = 1, round_jn_point_labels = 2 )
floodlight_2_by_continuous( data = NULL, iv_name = NULL, dv_name = NULL, mod_name = NULL, covariate_name = NULL, interaction_p_include = TRUE, iv_level_order = NULL, output = "reg_lines_plot", jitter_x_y_percent = 0, jitter_x_percent = 0, jitter_y_percent = 0, dot_alpha = 0.5, dot_size = 4, interaction_p_value_font_size = 8, jn_point_label_add = TRUE, jn_point_font_size = 8, jn_point_label_hjust = NULL, lines_at_mod_extremes = FALSE, interaction_p_vjust = -3, plot_margin = ggplot2::unit(c(75, 7, 7, 7), "pt"), legend_position = "right", reg_line_types = c("solid", "dashed"), jn_line_types = c("solid", "solid"), jn_line_thickness = 1.5, colors_for_iv = c("red", "blue"), sig_region_color = "green", sig_region_alpha = 0.08, nonsig_region_color = "gray", nonsig_region_alpha = 0.08, x_axis_title = NULL, y_axis_title = NULL, legend_title = NULL, round_decimals_int_p_value = 3, line_of_fit_thickness = 1, round_jn_point_labels = 2 )
data |
a data object (a data frame or a data.table) |
iv_name |
name of the binary independent variable |
dv_name |
name of the dependent variable |
mod_name |
name of the continuous moderator variable |
covariate_name |
name of the variables to control for |
interaction_p_include |
logical. Should the plot include a p-value for the interaction term? |
iv_level_order |
order of levels in the independent
variable for legend. By default, it will be set as levels of the
independent variable ordered using R's base function |
output |
type of output (default = "reg_lines_plot"). Possible inputs: "interactions_pkg_results", "simple_effects_plot", "jn_points", "regions", "reg_lines_plot" |
jitter_x_y_percent |
horizontally and vertically jitter dots by a percentage of the respective ranges of x and y values. |
jitter_x_percent |
horizontally jitter dots by a percentage of the range of x values |
jitter_y_percent |
vertically jitter dots by a percentage of the range of y values |
dot_alpha |
opacity of the dots (0 = completely transparent,
1 = completely opaque). By default, |
dot_size |
size of the dots (default = 4) |
interaction_p_value_font_size |
font size for the interaction p value (default = 8) |
jn_point_label_add |
logical. Should the labels for Johnson-Neyman point labels be added to the plot? (default = TRUE) |
jn_point_font_size |
font size for Johnson-Neyman point labels (default = 8) |
jn_point_label_hjust |
a vector of hjust values for Johnson-Neyman point labels. By default, the hjust value will be 0.5 for all the points. |
lines_at_mod_extremes |
logical. Should vertical lines be drawn at the observed extreme values of the moderator if those values lie in siginificant region(s)? (default = FALSE) |
interaction_p_vjust |
By how much should the label for the
interaction p-value be adjusted vertically?
By default, |
plot_margin |
margin for the plot
By default |
legend_position |
position of the legend (default = "right").
If |
reg_line_types |
types of the regression lines for the two levels
of the independent variable.
By default, |
jn_line_types |
types of the lines for Johnson-Neyman points.
By default, |
jn_line_thickness |
thickness of the lines at Johnson-Neyman points (default = 1.5) |
colors_for_iv |
colors for the two values of the independent variable (default = c("red", "blue")) |
sig_region_color |
color of the significant region, i.e., range(s) of the moderator variable for which simple effect of the independent variable on the dependent variable is statistically significant. |
sig_region_alpha |
opacity for |
nonsig_region_color |
color of the non-significant region, i.e., range(s) of the moderator variable for which simple effect of the independent variable on the dependent variable is not statistically significant. |
nonsig_region_alpha |
opacity for |
x_axis_title |
title of the x axis. By default, it will be set
as input for |
y_axis_title |
title of the y axis. By default, it will be set
as input for |
legend_title |
title of the legend. By default, it will be set
as input for |
round_decimals_int_p_value |
To how many digits after the decimal point should the p value for the interaction term be rounded? (default = 3) |
line_of_fit_thickness |
thickness of the lines of fit (default = 1) |
round_jn_point_labels |
To how many digits after the decimal point should the jn point labels be rounded? (default = 2) |
The following package(s) must be installed prior to running this function: Package 'interactions' v1.1.1 (or possibly a higher version) by Jacob A. Long (2020), https://cran.r-project.org/package=interactions See the following references: Spiller et al. (2013) doi:10.1509/jmr.12.0420 Kim (2021) doi:10.5281/zenodo.4445388
# typical example floodlight_2_by_continuous( data = mtcars, iv_name = "am", dv_name = "mpg", mod_name = "qsec") # add covariates floodlight_2_by_continuous( data = mtcars, iv_name = "am", dv_name = "mpg", mod_name = "qsec", covariate_name = c("cyl", "hp")) # adjust the jn point label positions floodlight_2_by_continuous( data = mtcars, iv_name = "am", dv_name = "mpg", mod_name = "qsec", jn_point_label_hjust = c(1, 0)) # return regions of significance and nonsignificance floodlight_2_by_continuous( data = mtcars, iv_name = "am", dv_name = "mpg", mod_name = "qsec", output = "regions") # draw lines at the extreme values of the moderator # if they are included in the significant region floodlight_2_by_continuous( data = mtcars, iv_name = "am", dv_name = "mpg", mod_name = "qsec", lines_at_mod_extremes = TRUE) #' # remove the labels for jn points floodlight_2_by_continuous( data = mtcars, iv_name = "am", dv_name = "mpg", mod_name = "qsec", jn_point_label_add = FALSE)
# typical example floodlight_2_by_continuous( data = mtcars, iv_name = "am", dv_name = "mpg", mod_name = "qsec") # add covariates floodlight_2_by_continuous( data = mtcars, iv_name = "am", dv_name = "mpg", mod_name = "qsec", covariate_name = c("cyl", "hp")) # adjust the jn point label positions floodlight_2_by_continuous( data = mtcars, iv_name = "am", dv_name = "mpg", mod_name = "qsec", jn_point_label_hjust = c(1, 0)) # return regions of significance and nonsignificance floodlight_2_by_continuous( data = mtcars, iv_name = "am", dv_name = "mpg", mod_name = "qsec", output = "regions") # draw lines at the extreme values of the moderator # if they are included in the significant region floodlight_2_by_continuous( data = mtcars, iv_name = "am", dv_name = "mpg", mod_name = "qsec", lines_at_mod_extremes = TRUE) #' # remove the labels for jn points floodlight_2_by_continuous( data = mtcars, iv_name = "am", dv_name = "mpg", mod_name = "qsec", jn_point_label_add = FALSE)
Conduct a floodlight analysis for a logistic regression with a 2 x Continuous design involving a binary dependent variable.
floodlight_2_by_continuous_logistic( data = NULL, iv_name = NULL, dv_name = NULL, mod_name = NULL, interaction_p_include = TRUE, iv_level_order = NULL, dv_level_order = NULL, jn_points_disregard_threshold = NULL, output = "reg_lines_plot", num_of_spotlights = 20, jitter_x_percent = 0, jitter_y_percent = 5, dot_alpha = 0.3, dot_size = 6, interaction_p_value_font_size = 8, jn_point_label_add = TRUE, jn_point_font_size = 8, jn_point_label_hjust = NULL, interaction_p_vjust = -3, plot_margin = ggplot2::unit(c(75, 7, 7, 7), "pt"), legend_position = "right", line_types_for_pred_values = c("solid", "dashed"), line_thickness_for_pred_values = 2.5, jn_line_types = c("solid", "solid"), jn_line_thickness = 1.5, sig_region_color = "green", sig_region_alpha = 0.08, nonsig_region_color = "gray", nonsig_region_alpha = 0.08, x_axis_title = NULL, y_axis_title = NULL, legend_title = NULL, round_decimals_int_p_value = 3, round_jn_point_labels = 2 )
floodlight_2_by_continuous_logistic( data = NULL, iv_name = NULL, dv_name = NULL, mod_name = NULL, interaction_p_include = TRUE, iv_level_order = NULL, dv_level_order = NULL, jn_points_disregard_threshold = NULL, output = "reg_lines_plot", num_of_spotlights = 20, jitter_x_percent = 0, jitter_y_percent = 5, dot_alpha = 0.3, dot_size = 6, interaction_p_value_font_size = 8, jn_point_label_add = TRUE, jn_point_font_size = 8, jn_point_label_hjust = NULL, interaction_p_vjust = -3, plot_margin = ggplot2::unit(c(75, 7, 7, 7), "pt"), legend_position = "right", line_types_for_pred_values = c("solid", "dashed"), line_thickness_for_pred_values = 2.5, jn_line_types = c("solid", "solid"), jn_line_thickness = 1.5, sig_region_color = "green", sig_region_alpha = 0.08, nonsig_region_color = "gray", nonsig_region_alpha = 0.08, x_axis_title = NULL, y_axis_title = NULL, legend_title = NULL, round_decimals_int_p_value = 3, round_jn_point_labels = 2 )
data |
a data object (a data frame or a data.table) |
iv_name |
name of the binary independent variable |
dv_name |
name of the binary dependent variable |
mod_name |
name of the continuous moderator variable |
interaction_p_include |
logical. Should the plot include a p-value for the interaction term? |
iv_level_order |
order of levels in the independent
variable for legend. By default, it will be set as levels of the
independent variable ordered using R's base function |
dv_level_order |
order of levels in the dependent variable.
By default, it will be set as levels of the
dependent variable ordered using R's base function |
jn_points_disregard_threshold |
the Minimum Distance in
the unit of the moderator variable that will be used for various purposes,
such as (1) to disregard the second Johnson-Neyman point
that is different from the first Johnson-Neyman (JN) point by
less than the Minimum Distance; (2) to determine regions of
significance, which will calculate the p-value of the IV's effect
(the focal dummy variable's effect) on DV at a candidate
JN point + / - the Minimum Distance.
This input is hard to explain, but a user can enter a really low value
for this argument (e.g., |
output |
type of output (default = "reg_lines_plot"). Possible inputs: "interactions_pkg_results", "simple_effects_plot", "jn_points", "regions", "reg_lines_plot" |
num_of_spotlights |
How many spotlight analyses should be conducted to plot the predicted values at various values of the moderator? (default = 20) |
jitter_x_percent |
horizontally jitter dots by a percentage of the range of x values (default = 0) |
jitter_y_percent |
vertically jitter dots by a percentage of the range of y values (default = 5) |
dot_alpha |
opacity of the dots (0 = completely transparent,
1 = completely opaque). By default, |
dot_size |
size of the dots (default = 6) |
interaction_p_value_font_size |
font size for the interaction p value (default = 8) |
jn_point_label_add |
logical. Should the labels for Johnson-Neyman point labels be added to the plot? (default = TRUE) |
jn_point_font_size |
font size for Johnson-Neyman point labels (default = 8) |
jn_point_label_hjust |
a vector of hjust values for Johnson-Neyman point labels. By default, the hjust value will be 0.5 for all the points. |
interaction_p_vjust |
By how much should the label for the
interaction p-value be adjusted vertically?
By default, |
plot_margin |
margin for the plot
By default |
legend_position |
position of the legend (default = "right").
If |
line_types_for_pred_values |
types of the lines for plotting
the predicted values
By default, |
line_thickness_for_pred_values |
thickness of the lines for plotting the predicted values (default = 2.5) |
jn_line_types |
types of the lines for Johnson-Neyman points.
By default, |
jn_line_thickness |
thickness of the lines at Johnson-Neyman points (default = 1.5) |
sig_region_color |
color of the significant region, i.e., range(s) of the moderator variable for which simple effect of the independent variable on the dependent variable is statistically significant. |
sig_region_alpha |
opacity for |
nonsig_region_color |
color of the non-significant region, i.e., range(s) of the moderator variable for which simple effect of the independent variable on the dependent variable is not statistically significant. |
nonsig_region_alpha |
opacity for |
x_axis_title |
title of the x axis. By default, it will be set
as input for |
y_axis_title |
title of the y axis. By default, it will be set
as input for |
legend_title |
title of the legend. By default, it will be set
as input for |
round_decimals_int_p_value |
To how many digits after the decimal point should the p value for the interaction term be rounded? (default = 3) |
round_jn_point_labels |
To how many digits after the decimal point should the jn point labels be rounded? (default = 2) |
See the following reference(s): Spiller et al. (2013) doi:10.1509/jmr.12.0420 Kim (2023) https://jinkim.science/docs/floodlight.pdf
floodlight_2_by_continuous_logistic( data = mtcars, iv_name = "am", dv_name = "vs", mod_name = "mpg") # adjust the number of spotlights # (i.e., predict values at only 4 values of the moderator) floodlight_2_by_continuous_logistic( data = mtcars, iv_name = "am", dv_name = "vs", mod_name = "mpg", num_of_spotlights = 4)
floodlight_2_by_continuous_logistic( data = mtcars, iv_name = "am", dv_name = "vs", mod_name = "mpg") # adjust the number of spotlights # (i.e., predict values at only 4 values of the moderator) floodlight_2_by_continuous_logistic( data = mtcars, iv_name = "am", dv_name = "vs", mod_name = "mpg", num_of_spotlights = 4)
Conduct a floodlight analysis for a multilevellogistic regression with a 2 x Continuous design involving a binary dependent variable.
floodlight_2_by_continuous_mlm_logistic( data = NULL, iv_name = NULL, dv_name = NULL, mod_name = NULL, interaction_p_include = TRUE, iv_level_order = NULL, dv_level_order = NULL, jn_points_disregard_threshold = NULL, output = "reg_lines_plot", num_of_spotlights = 20, jitter_x_percent = 0, jitter_y_percent = 5, dot_alpha = 0.3, dot_size = 6, interaction_p_value_font_size = 8, jn_point_font_size = 8, jn_point_label_hjust = NULL, interaction_p_vjust = -3, plot_margin = ggplot2::unit(c(75, 7, 7, 7), "pt"), legend_position = "right", line_types_for_pred_values = c("solid", "dashed"), line_thickness_for_pred_values = 2.5, jn_line_types = c("solid", "solid"), jn_line_thickness = 1.5, sig_region_color = "green", sig_region_alpha = 0.08, nonsig_region_color = "gray", nonsig_region_alpha = 0.08, x_axis_title = NULL, y_axis_title = NULL, legend_title = NULL, round_decimals_int_p_value = 3, round_jn_point_labels = 2 )
floodlight_2_by_continuous_mlm_logistic( data = NULL, iv_name = NULL, dv_name = NULL, mod_name = NULL, interaction_p_include = TRUE, iv_level_order = NULL, dv_level_order = NULL, jn_points_disregard_threshold = NULL, output = "reg_lines_plot", num_of_spotlights = 20, jitter_x_percent = 0, jitter_y_percent = 5, dot_alpha = 0.3, dot_size = 6, interaction_p_value_font_size = 8, jn_point_font_size = 8, jn_point_label_hjust = NULL, interaction_p_vjust = -3, plot_margin = ggplot2::unit(c(75, 7, 7, 7), "pt"), legend_position = "right", line_types_for_pred_values = c("solid", "dashed"), line_thickness_for_pred_values = 2.5, jn_line_types = c("solid", "solid"), jn_line_thickness = 1.5, sig_region_color = "green", sig_region_alpha = 0.08, nonsig_region_color = "gray", nonsig_region_alpha = 0.08, x_axis_title = NULL, y_axis_title = NULL, legend_title = NULL, round_decimals_int_p_value = 3, round_jn_point_labels = 2 )
data |
a data object (a data frame or a data.table) |
iv_name |
name of the binary independent variable |
dv_name |
name of the binary dependent variable |
mod_name |
name of the continuous moderator variable |
interaction_p_include |
logical. Should the plot include a p-value for the interaction term? |
iv_level_order |
order of levels in the independent
variable for legend. By default, it will be set as levels of the
independent variable ordered using R's base function |
dv_level_order |
order of levels in the dependent variable.
By default, it will be set as levels of the
dependent variable ordered using R's base function |
jn_points_disregard_threshold |
the Minimum Distance in
the unit of the moderator variable that will be used for various purposes,
such as (1) to disregard the second Johnson-Neyman point
that is different from the first Johnson-Neyman (JN) point by
less than the Minimum Distance; (2) to determine regions of
significance, which will calculate the p-value of the IV's effect
(the focal dummy variable's effect) on DV at a candidate
JN point + / - the Minimum Distance.
This input is hard to explain, but a user can enter a really low value
for this argument (e.g., |
output |
type of output (default = "reg_lines_plot"). Possible inputs: "interactions_pkg_results", "simple_effects_plot", "jn_points", "regions", "reg_lines_plot" |
num_of_spotlights |
How many spotlight analyses should be conducted to plot the predicted values at various values of the moderator? (default = 20) |
jitter_x_percent |
horizontally jitter dots by a percentage of the range of x values (default = 0) |
jitter_y_percent |
vertically jitter dots by a percentage of the range of y values (default = 5) |
dot_alpha |
opacity of the dots (0 = completely transparent,
1 = completely opaque). By default, |
dot_size |
size of the dots (default = 6) |
interaction_p_value_font_size |
font size for the interaction p value (default = 8) |
jn_point_font_size |
font size for Johnson-Neyman point labels (default = 8) |
jn_point_label_hjust |
a vector of hjust values for Johnson-Neyman point labels. By default, the hjust value will be 0.5 for all the points. |
interaction_p_vjust |
By how much should the label for the
interaction p-value be adjusted vertically?
By default, |
plot_margin |
margin for the plot
By default |
legend_position |
position of the legend (default = "right").
If |
line_types_for_pred_values |
types of the lines for plotting
the predicted values
By default, |
line_thickness_for_pred_values |
thickness of the lines for plotting the predicted values (default = 2.5) |
jn_line_types |
types of the lines for Johnson-Neyman points.
By default, |
jn_line_thickness |
thickness of the lines at Johnson-Neyman points (default = 1.5) |
sig_region_color |
color of the significant region, i.e., range(s) of the moderator variable for which simple effect of the independent variable on the dependent variable is statistically significant. |
sig_region_alpha |
opacity for |
nonsig_region_color |
color of the non-significant region, i.e., range(s) of the moderator variable for which simple effect of the independent variable on the dependent variable is not statistically significant. |
nonsig_region_alpha |
opacity for |
x_axis_title |
title of the x axis. By default, it will be set
as input for |
y_axis_title |
title of the y axis. By default, it will be set
as input for |
legend_title |
title of the legend. By default, it will be set
as input for |
round_decimals_int_p_value |
To how many digits after the decimal point should the p value for the interaction term be rounded? (default = 3) |
round_jn_point_labels |
To how many digits after the decimal point should the jn point labels be rounded? (default = 2) |
See the following reference(s): Spiller et al. (2013) doi:10.1509/jmr.12.0420 Kim (2023) https://jinkim.science/docs/floodlight.pdf
floodlight_2_by_continuous_logistic( data = mtcars, iv_name = "am", dv_name = "vs", mod_name = "mpg") # adjust the number of spotlights # (i.e., predict values at only 4 values of the moderator) floodlight_2_by_continuous_logistic( data = mtcars, iv_name = "am", dv_name = "vs", mod_name = "mpg", num_of_spotlights = 4)
floodlight_2_by_continuous_logistic( data = mtcars, iv_name = "am", dv_name = "vs", mod_name = "mpg") # adjust the number of spotlights # (i.e., predict values at only 4 values of the moderator) floodlight_2_by_continuous_logistic( data = mtcars, iv_name = "am", dv_name = "vs", mod_name = "mpg", num_of_spotlights = 4)
Conduct a floodlight analysis for a set of contrasts with a continuous moderator variable.
floodlight_for_contrasts( data = NULL, iv_name = NULL, dv_name = NULL, mod_name = NULL, contrasts = NULL, contrasts_for_floodlight = NULL, covariate_name = NULL, interaction_p_include = TRUE, iv_category_order = NULL, heteroskedasticity_consistent_se = "HC4", round_r_squared = 3, round_f = 2, sigfigs = 2, jn_points_disregard_threshold = NULL, print_floodlight_plots = TRUE, output = "reg_lines_plot", jitter_x_percent = 0, jitter_y_percent = 0, dot_alpha = 0.5, dot_size = 4, interaction_p_value_font_size = 6, jn_point_font_size = 6, jn_point_label_hjust = NULL, interaction_p_vjust = -3, plot_margin = ggplot2::unit(c(75, 7, 7, 7), "pt"), legend_position = "right", line_of_fit_types = c("solid", "dashed"), line_of_fit_thickness = 1.5, jn_line_types = c("solid", "solid"), jn_line_thickness = 1.5, sig_region_color = "green", sig_region_alpha = 0.08, nonsig_region_color = "gray", nonsig_region_alpha = 0.08, x_axis_title = NULL, y_axis_title = NULL, legend_title = NULL, round_decimals_int_p_value = 3, round_jn_point_labels = 2 )
floodlight_for_contrasts( data = NULL, iv_name = NULL, dv_name = NULL, mod_name = NULL, contrasts = NULL, contrasts_for_floodlight = NULL, covariate_name = NULL, interaction_p_include = TRUE, iv_category_order = NULL, heteroskedasticity_consistent_se = "HC4", round_r_squared = 3, round_f = 2, sigfigs = 2, jn_points_disregard_threshold = NULL, print_floodlight_plots = TRUE, output = "reg_lines_plot", jitter_x_percent = 0, jitter_y_percent = 0, dot_alpha = 0.5, dot_size = 4, interaction_p_value_font_size = 6, jn_point_font_size = 6, jn_point_label_hjust = NULL, interaction_p_vjust = -3, plot_margin = ggplot2::unit(c(75, 7, 7, 7), "pt"), legend_position = "right", line_of_fit_types = c("solid", "dashed"), line_of_fit_thickness = 1.5, jn_line_types = c("solid", "solid"), jn_line_thickness = 1.5, sig_region_color = "green", sig_region_alpha = 0.08, nonsig_region_color = "gray", nonsig_region_alpha = 0.08, x_axis_title = NULL, y_axis_title = NULL, legend_title = NULL, round_decimals_int_p_value = 3, round_jn_point_labels = 2 )
data |
a data object (a data frame or a data.table) |
iv_name |
name of the multicategorical independent variable; this variable must have three or more categories. |
dv_name |
name of the dependent variable |
mod_name |
name of the continuous moderator variable |
contrasts |
names of the contrast variables |
contrasts_for_floodlight |
names of the contrast variables for which floodlight analyses will be conducted |
covariate_name |
name of the variables to control for |
interaction_p_include |
logical. Should the plot include a p-value for the interaction term? |
iv_category_order |
order of levels in the independent
variable for legend. By default, it will be set as levels of the
independent variable ordered using R's base function |
heteroskedasticity_consistent_se |
which kind of heteroskedasticity-consistent (robust) standard errors should be calculated? (default = "HC4") |
round_r_squared |
number of decimal places to which to round r-squared values (default = 3) |
round_f |
number of decimal places to which to round the f statistic for model comparison (default = 2) |
sigfigs |
number of significant digits to round to
(for values in the regression tables, except for p values).
By default |
jn_points_disregard_threshold |
the Minimum Distance in
the unit of the moderator variable that will be used for various purposes,
such as (1) to disregard the second Johnson-Neyman point
that is different from the first Johnson-Neyman (JN) point by
less than the Minimum Distance; (2) to determine regions of
significance, which will calculate the p-value of the IV's effect
(the focal dummy variable's effect) on DV at a candidate
JN point + / - the Minimum Distance.
This input is hard to explain, but a user can enter a really low value
for this argument (e.g., |
print_floodlight_plots |
If |
output |
output of the function (default = "all"). Possible inputs: "reg_models", "reg_tables", "reg_tables_rounded", "all" |
jitter_x_percent |
horizontally jitter dots by a percentage of the range of x values |
jitter_y_percent |
vertically jitter dots by a percentage of the range of y values |
dot_alpha |
opacity of the dots (0 = completely transparent,
1 = completely opaque). By default, |
dot_size |
size of the dots (default = 4) |
interaction_p_value_font_size |
font size for the interaction p value (default = 8) |
jn_point_font_size |
font size for Johnson-Neyman point labels (default = 6) |
jn_point_label_hjust |
a vector of hjust values for Johnson-Neyman point labels. By default, the hjust value will be 0.5 for all the points. |
interaction_p_vjust |
By how much should the label for the
interaction p-value be adjusted vertically?
By default, |
plot_margin |
margin for the plot
By default |
legend_position |
position of the legend (default = "right").
If |
line_of_fit_types |
types of the lines of fit for the two levels
of the independent variable.
By default, |
line_of_fit_thickness |
thickness of the lines of fit (default = 1.5) |
jn_line_types |
types of the lines for Johnson-Neyman points.
By default, |
jn_line_thickness |
thickness of the lines at Johnson-Neyman points (default = 1.5) |
sig_region_color |
color of the significant region, i.e., range(s) of the moderator variable for which simple effect of the independent variable on the dependent variable is statistically significant. |
sig_region_alpha |
opacity for |
nonsig_region_color |
color of the non-significant region, i.e., range(s) of the moderator variable for which simple effect of the independent variable on the dependent variable is not statistically significant. |
nonsig_region_alpha |
opacity for |
x_axis_title |
title of the x axis. By default, it will be set
as input for |
y_axis_title |
title of the y axis. By default, it will be set
as input for |
legend_title |
title of the legend. By default, it will be set
as input for |
round_decimals_int_p_value |
To how many digits after the decimal point should the p value for the interaction term be rounded? (default = 3) |
round_jn_point_labels |
To how many digits after the decimal point should the jn point labels be rounded? (default = 2) |
See the following reference, which covers a related topic: Hayes & Montoya (2017) doi:10.1080/19312458.2016.1271116
## Not run: # typical example # copy and modify the 'mtcars' data mtcars2 <- setDT(data.table::copy(mtcars)) # make sure the data table package is attached mtcars2[, contrast_1 := fcase(cyl == 4, -2, cyl %in% c(6, 8), 1)] mtcars2[, contrast_2 := fcase(cyl == 4, 0, cyl == 6, 1, cyl == 8, -1)] floodlight_for_contrasts( data = mtcars2, iv_name = "cyl", dv_name = "mpg", mod_name = "qsec", contrasts = paste0("contrast_", 1:2), contrasts_for_floodlight = "contrast_2") ## End(Not run)
## Not run: # typical example # copy and modify the 'mtcars' data mtcars2 <- setDT(data.table::copy(mtcars)) # make sure the data table package is attached mtcars2[, contrast_1 := fcase(cyl == 4, -2, cyl %in% c(6, 8), 1)] mtcars2[, contrast_2 := fcase(cyl == 4, 0, cyl == 6, 1, cyl == 8, -1)] floodlight_for_contrasts( data = mtcars2, iv_name = "cyl", dv_name = "mpg", mod_name = "qsec", contrasts = paste0("contrast_", 1:2), contrasts_for_floodlight = "contrast_2") ## End(Not run)
Conduct a floodlight analysis for a Multicategorical IV x Continuous Moderator design.
floodlight_multi_by_continuous( data = NULL, iv_name = NULL, dv_name = NULL, mod_name = NULL, coding = "indicator", baseline_category = NULL, covariate_name = NULL, interaction_p_include = TRUE, iv_category_order = NULL, heteroskedasticity_consistent_se = "HC4", round_r_squared = 3, round_f = 2, sigfigs = 2, jn_points_disregard_threshold = NULL, print_floodlight_plots = TRUE, output = "all", jitter_x_percent = 0, jitter_y_percent = 0, dot_alpha = 0.5, dot_size = 4, interaction_p_value_font_size = 8, jn_point_font_size = 8, jn_point_label_hjust = NULL, interaction_p_vjust = -3, plot_margin = ggplot2::unit(c(75, 7, 7, 7), "pt"), legend_position = "right", line_of_fit_types = c("solid", "dashed"), line_of_fit_thickness = 1.5, jn_line_types = c("solid", "solid"), jn_line_thickness = 1.5, colors_for_iv = c("red", "blue"), sig_region_color = "green", sig_region_alpha = 0.08, nonsig_region_color = "gray", nonsig_region_alpha = 0.08, x_axis_title = NULL, y_axis_title = NULL, legend_title = NULL, round_decimals_int_p_value = 3, round_jn_point_labels = 2 )
floodlight_multi_by_continuous( data = NULL, iv_name = NULL, dv_name = NULL, mod_name = NULL, coding = "indicator", baseline_category = NULL, covariate_name = NULL, interaction_p_include = TRUE, iv_category_order = NULL, heteroskedasticity_consistent_se = "HC4", round_r_squared = 3, round_f = 2, sigfigs = 2, jn_points_disregard_threshold = NULL, print_floodlight_plots = TRUE, output = "all", jitter_x_percent = 0, jitter_y_percent = 0, dot_alpha = 0.5, dot_size = 4, interaction_p_value_font_size = 8, jn_point_font_size = 8, jn_point_label_hjust = NULL, interaction_p_vjust = -3, plot_margin = ggplot2::unit(c(75, 7, 7, 7), "pt"), legend_position = "right", line_of_fit_types = c("solid", "dashed"), line_of_fit_thickness = 1.5, jn_line_types = c("solid", "solid"), jn_line_thickness = 1.5, colors_for_iv = c("red", "blue"), sig_region_color = "green", sig_region_alpha = 0.08, nonsig_region_color = "gray", nonsig_region_alpha = 0.08, x_axis_title = NULL, y_axis_title = NULL, legend_title = NULL, round_decimals_int_p_value = 3, round_jn_point_labels = 2 )
data |
a data object (a data frame or a data.table) |
iv_name |
name of the multicategorical independent variable; this variable must have three or more categories. |
dv_name |
name of the dependent variable |
mod_name |
name of the continuous moderator variable |
coding |
name of the coding scheme to use; the current version
of the function allows only the "indicator" coding scheme.
By default, |
baseline_category |
value of the independent variable that will be the reference value against which other values of the independent variable will be compared |
covariate_name |
name of the variables to control for |
interaction_p_include |
logical. Should the plot include a p-value for the interaction term? |
iv_category_order |
order of levels in the independent
variable for legend. By default, it will be set as levels of the
independent variable ordered using R's base function |
heteroskedasticity_consistent_se |
which kind of heteroskedasticity-consistent (robust) standard errors should be calculated? (default = "HC4") |
round_r_squared |
number of decimal places to which to round r-squared values (default = 3) |
round_f |
number of decimal places to which to round the f statistic for model comparison (default = 2) |
sigfigs |
number of significant digits to round to
(for values in the regression tables, except for p values).
By default |
jn_points_disregard_threshold |
the Minimum Distance in
the unit of the moderator variable that will be used for various purposes,
such as (1) to disregard the second Johnson-Neyman point
that is different from the first Johnson-Neyman (JN) point by
less than the Minimum Distance; (2) to determine regions of
significance, which will calculate the p-value of the IV's effect
(the focal dummy variable's effect) on DV at a candidate
JN point + / - the Minimum Distance.
This input is hard to explain, but a user can enter a really low value
for this argument (e.g., |
print_floodlight_plots |
If |
output |
output of the function (default = "all"). Possible inputs: "reg_models", "reg_tables", "reg_tables_rounded", "all" |
jitter_x_percent |
horizontally jitter dots by a percentage of the range of x values |
jitter_y_percent |
vertically jitter dots by a percentage of the range of y values |
dot_alpha |
opacity of the dots (0 = completely transparent,
1 = completely opaque). By default, |
dot_size |
size of the dots (default = 4) |
interaction_p_value_font_size |
font size for the interaction p value (default = 8) |
jn_point_font_size |
font size for Johnson-Neyman point labels (default = 8) |
jn_point_label_hjust |
a vector of hjust values for Johnson-Neyman point labels. By default, the hjust value will be 0.5 for all the points. |
interaction_p_vjust |
By how much should the label for the
interaction p-value be adjusted vertically?
By default, |
plot_margin |
margin for the plot
By default |
legend_position |
position of the legend (default = "right").
If |
line_of_fit_types |
types of the lines of fit for the two levels
of the independent variable.
By default, |
line_of_fit_thickness |
thickness of the lines of fit (default = 1.5) |
jn_line_types |
types of the lines for Johnson-Neyman points.
By default, |
jn_line_thickness |
thickness of the lines at Johnson-Neyman points (default = 1.5) |
colors_for_iv |
colors for the two values of the independent variable (default = c("red", "blue")) |
sig_region_color |
color of the significant region, i.e., range(s) of the moderator variable for which simple effect of the independent variable on the dependent variable is statistically significant. |
sig_region_alpha |
opacity for |
nonsig_region_color |
color of the non-significant region, i.e., range(s) of the moderator variable for which simple effect of the independent variable on the dependent variable is not statistically significant. |
nonsig_region_alpha |
opacity for |
x_axis_title |
title of the x axis. By default, it will be set
as input for |
y_axis_title |
title of the y axis. By default, it will be set
as input for |
legend_title |
title of the legend. By default, it will be set
as input for |
round_decimals_int_p_value |
To how many digits after the decimal point should the p value for the interaction term be rounded? (default = 3) |
round_jn_point_labels |
To how many digits after the decimal point should the jn point labels be rounded? (default = 2) |
See the following reference: Hayes & Montoya (2017) doi:10.1080/19312458.2016.1271116 Williams (2004) on r-squared values when calculating robust standard errors https://web.archive.org/web/20230627025457/https://www.stata.com/statalist/archive/2004-05/msg00107.html
## Not run: # typical example floodlight_multi_by_continuous( data = mtcars, iv_name = "cyl", dv_name = "mpg", mod_name = "qsec") ## End(Not run)
## Not run: # typical example floodlight_multi_by_continuous( data = mtcars, iv_name = "cyl", dv_name = "mpg", mod_name = "qsec") ## End(Not run)
Create a forest plot using outputs from 'metafor' package
forest_plot( estimates = NULL, estimate_ci_ll = NULL, estimate_ci_ul = NULL, point_size_range = c(2, 10), error_bar_size = 1, error_bar_tip_height = 0.3, weights = NULL, diamond_x = NULL, diamond_ci_ll = NULL, diamond_ci_ul = NULL, diamond_height = 1.2, diamond_gap_height = 0.3, diamond_1_tip_at_top_y = -0.5, diamond_colors = "black", study_labels = NULL, diamond_labels = NULL, diamond_label_size = 6, diamond_label_hjust = 0, diamond_label_fontface = "bold", diamond_estimate_label_hjust = 0, diamond_estimate_label_size = 6, diamond_estimate_label_fontface = "bold", round_estimates = 2, x_axis_title = "Observed Outcome", vline_size = 1, vline_intercept = 0, vline_type = "dotted", study_label_hjust = 0, study_label_begin_x = NULL, study_label_begin_x_perc = 60, study_label_size = 6, study_label_fontface = "plain", estimate_label_begin_x = NULL, estimate_label_begin_x_perc = 25, estimate_label_hjust = 0, estimate_label_size = 6, estimate_label_fontface = "plain", x_axis_tick_marks = NULL, x_axis_tick_mark_label_size = 6, legend_position = "none", plot_margin = NULL )
forest_plot( estimates = NULL, estimate_ci_ll = NULL, estimate_ci_ul = NULL, point_size_range = c(2, 10), error_bar_size = 1, error_bar_tip_height = 0.3, weights = NULL, diamond_x = NULL, diamond_ci_ll = NULL, diamond_ci_ul = NULL, diamond_height = 1.2, diamond_gap_height = 0.3, diamond_1_tip_at_top_y = -0.5, diamond_colors = "black", study_labels = NULL, diamond_labels = NULL, diamond_label_size = 6, diamond_label_hjust = 0, diamond_label_fontface = "bold", diamond_estimate_label_hjust = 0, diamond_estimate_label_size = 6, diamond_estimate_label_fontface = "bold", round_estimates = 2, x_axis_title = "Observed Outcome", vline_size = 1, vline_intercept = 0, vline_type = "dotted", study_label_hjust = 0, study_label_begin_x = NULL, study_label_begin_x_perc = 60, study_label_size = 6, study_label_fontface = "plain", estimate_label_begin_x = NULL, estimate_label_begin_x_perc = 25, estimate_label_hjust = 0, estimate_label_size = 6, estimate_label_fontface = "plain", x_axis_tick_marks = NULL, x_axis_tick_mark_label_size = 6, legend_position = "none", plot_margin = NULL )
estimates |
default = NULL |
estimate_ci_ll |
default = NULL |
estimate_ci_ul |
default = NULL |
point_size_range |
default = c(2, 10) |
error_bar_size |
default = 1 |
error_bar_tip_height |
default = 0.3 |
weights |
default = NULL |
diamond_x |
default = NULL |
diamond_ci_ll |
default = NULL |
diamond_ci_ul |
default = NULL |
diamond_height |
default = 1.2 |
diamond_gap_height |
default = 0.3 |
diamond_1_tip_at_top_y |
default = -0.5 |
diamond_colors |
default = "black" |
study_labels |
default = NULL |
diamond_labels |
default = NULL |
diamond_label_size |
default = 6 |
diamond_label_hjust |
default = 0 |
diamond_label_fontface |
default = "bold" |
diamond_estimate_label_hjust |
default = 0 |
diamond_estimate_label_size |
default = 6 |
diamond_estimate_label_fontface |
default = "bold" |
round_estimates |
default = 2 |
x_axis_title |
default = "Observed Outcome" |
vline_size |
default = 1 |
vline_intercept |
default = 0 |
vline_type |
default = "dotted" |
study_label_hjust |
default = 0 |
study_label_begin_x |
default = NULL |
study_label_begin_x_perc |
default = 60 |
study_label_size |
default = 6 |
study_label_fontface |
default = "plain" |
estimate_label_begin_x |
default = NULL |
estimate_label_begin_x_perc |
default = 25 |
estimate_label_hjust |
default = 0 |
estimate_label_size |
default = 6 |
estimate_label_fontface |
default = "plain" |
x_axis_tick_marks |
default = NULL |
x_axis_tick_mark_label_size |
default = 6 |
legend_position |
default = "none" |
plot_margin |
default = NULL |
forest_plot( estimates = c(2, 3, 4), estimate_ci_ll = c(1, 2, 3), estimate_ci_ul = c(3, 4, 6), weights = 1:3, diamond_x = 2, diamond_labels = "RE", diamond_ci_ll = 1.8, diamond_ci_ul = 2.2, estimate_label_begin_x_perc = 40, x_axis_tick_marks = seq(-2, 6, 2))
forest_plot( estimates = c(2, 3, 4), estimate_ci_ll = c(1, 2, 3), estimate_ci_ul = c(3, 4, 6), weights = 1:3, diamond_x = 2, diamond_labels = "RE", diamond_ci_ll = 1.8, diamond_ci_ul = 2.2, estimate_label_begin_x_perc = 40, x_axis_tick_marks = seq(-2, 6, 2))
Calculate the geometric mean of a numeric vector
geomean(x = NULL, zero_or_neg_convert_to = NA)
geomean(x = NULL, zero_or_neg_convert_to = NA)
x |
a numeric vector |
zero_or_neg_convert_to |
the value to which zero or negative
values will be converted to. If |
## Not run: geomean(c(1, 4)) geomean(c(1, 100)) geomean(c(1, 100, NA)) geomean(c(1, 100, NA, 0, -1, -2)) geomean( x = c(1, 100, NA, 0, -1, -2), zero_or_neg_convert_to = 1) geomean(c(1, 100, NA, 1, 1, 1)) ## End(Not run)
## Not run: geomean(c(1, 4)) geomean(c(1, 100)) geomean(c(1, 100, NA)) geomean(c(1, 100, NA, 0, -1, -2)) geomean( x = c(1, 100, NA, 0, -1, -2), zero_or_neg_convert_to = 1) geomean(c(1, 100, NA, 1, 1, 1)) ## End(Not run)
quickly save the current plot with a timestamp
ggsave_quick( name = NULL, file_name_extension = "png", timestamp = NULL, width = 16, height = 9 )
ggsave_quick( name = NULL, file_name_extension = "png", timestamp = NULL, width = 16, height = 9 )
name |
a character string of the png file name.
By default, if no input is given ( |
file_name_extension |
file name extension (default = "png").
If |
timestamp |
if |
width |
width of the plot to be saved. This argument will be
directly entered as the |
height |
height of the plot to be saved. This argument will be
directly entered as the |
the output will be a .png image file in the working directory.
## Not run: kim::histogram(rep(1:30, 3)) ggsave_quick() ## End(Not run)
## Not run: kim::histogram(rep(1:30, 3)) ggsave_quick() ## End(Not run)
Create a histogram based on the output of the hist
function
in the graphics
package.
histogram( vector = NULL, breaks = NULL, counts = NULL, percent = FALSE, bin_fill_color = "green4", bin_border_color = "black", bin_border_thickness = 1, notify_na_count = NULL, x_axis_tick_marks = NULL, y_axis_tick_marks = NULL, cap_axis_lines = TRUE, x_axis_title = "Value", y_axis_title = NULL, y_axis_title_vjust = 0.85 )
histogram( vector = NULL, breaks = NULL, counts = NULL, percent = FALSE, bin_fill_color = "green4", bin_border_color = "black", bin_border_thickness = 1, notify_na_count = NULL, x_axis_tick_marks = NULL, y_axis_tick_marks = NULL, cap_axis_lines = TRUE, x_axis_title = "Value", y_axis_title = NULL, y_axis_title_vjust = 0.85 )
vector |
a numeric vector |
breaks |
a numeric vector indicating breaks for the bins. By default, no input is required for this argument. |
counts |
a numeric vector containing counts for the bins (i.e., heights of the bins). By default, no input is required for this argument. |
percent |
logical. If |
bin_fill_color |
color of the area inside each bin (default = "green4") |
bin_border_color |
color of the border around each bin (default = "black") |
bin_border_thickness |
thickness of the border around each bin (default = 1) |
notify_na_count |
if |
x_axis_tick_marks |
a vector of values at which to place tick marks
on the x axis (e.g., setting |
y_axis_tick_marks |
a vector of values at which to place tick marks
on the y axis (e.g., setting |
cap_axis_lines |
logical. Should the axis lines be capped at the outer tick marks? (default = FALSE) |
x_axis_title |
title for x axis (default = "Value") |
y_axis_title |
title for y axis (default = "Count" or "Percentage",
depending on the value of |
y_axis_title_vjust |
position of the y axis title (default = 0.85). |
the output will be a histogram, a ggplot object.
histogram(1:100) histogram(c(1:100, NA)) histogram(vector = mtcars[["mpg"]]) histogram(vector = mtcars[["mpg"]], percent = TRUE) histogram(vector = mtcars[["mpg"]], x_axis_tick_marks = c(10, 25, 35), y_axis_title_vjust = 0.5, y_axis_title = "Freq", x_axis_title = "Values of mpg")
histogram(1:100) histogram(c(1:100, NA)) histogram(vector = mtcars[["mpg"]]) histogram(vector = mtcars[["mpg"]], percent = TRUE) histogram(vector = mtcars[["mpg"]], x_axis_tick_marks = c(10, 25, 35), y_axis_title_vjust = 0.5, y_axis_title = "Freq", x_axis_title = "Values of mpg")
Creates histograms by group to compare distributions.
histogram_by_group( data = NULL, iv_name = NULL, dv_name = NULL, order_of_groups_top_to_bot = NULL, number_of_bins = 40, space_between_histograms = 0.15, draw_baseline = FALSE, xlab = NULL, ylab = NULL, x_limits = NULL, x_breaks = NULL, x_labels = NULL, sigfigs = 3, convert_dv_to_numeric = TRUE )
histogram_by_group( data = NULL, iv_name = NULL, dv_name = NULL, order_of_groups_top_to_bot = NULL, number_of_bins = 40, space_between_histograms = 0.15, draw_baseline = FALSE, xlab = NULL, ylab = NULL, x_limits = NULL, x_breaks = NULL, x_labels = NULL, sigfigs = 3, convert_dv_to_numeric = TRUE )
data |
a data object (a data frame or a data.table) |
iv_name |
name of the independent variable |
dv_name |
name of the dependent variable |
order_of_groups_top_to_bot |
a character vector indicating the desired presentation order of levels in the independent variable (from the top to bottom). Omitting a group in this argument will remove the group in the set of histograms. |
number_of_bins |
number of bins for the histograms (default = 40) |
space_between_histograms |
space between histograms (minimum = 0, maximum = 1, default = 0.15) |
draw_baseline |
logical. Should the baseline and the trailing lines to either side of the histogram be drawn? (default = FALSE) |
xlab |
title of the x-axis for the histogram by group.
If |
ylab |
title of the y-axis for the histogram by group.
If |
x_limits |
a numeric vector with values of the endpoints of the x axis. |
x_breaks |
a numeric vector indicating the points at which to place tick marks on the x axis. |
x_labels |
a vector containing labels for the place tick marks on the x axis. |
sigfigs |
number of significant digits to round to (default = 3) |
convert_dv_to_numeric |
logical. Should the values in the dependent variable be converted to numeric for plotting the histograms? (default = TRUE) |
The following package(s) must be installed prior to running this function: Package 'ggridges' v0.5.3 (or possibly a higher version) by Claus O. Wilke (2021), https://cran.r-project.org/package=ggridges
the output will be a set of vertically arranged histograms (a ggplot object), i.e., one histogram for each level of the independent variable.
histogram_by_group(data = mtcars, iv_name = "cyl", dv_name = "mpg") histogram_by_group( data = mtcars, iv_name = "cyl", dv_name = "mpg", order_of_groups_top_to_bot = c("8", "4"), number_of_bins = 10, space_between_histograms = 0.5 ) histogram_by_group( data = iris, iv_name = "Species", dv_name = "Sepal.Length", x_breaks = 4:8, x_limits = c(4, 8))
histogram_by_group(data = mtcars, iv_name = "cyl", dv_name = "mpg") histogram_by_group( data = mtcars, iv_name = "cyl", dv_name = "mpg", order_of_groups_top_to_bot = c("8", "4"), number_of_bins = 10, space_between_histograms = 0.5 ) histogram_by_group( data = iris, iv_name = "Species", dv_name = "Sepal.Length", x_breaks = 4:8, x_limits = c(4, 8))
Create a histogram
histogram_deprecated_1( vector = NULL, number_of_bins = 30, x_tick_marks = NULL, y_tick_marks = NULL, fill_color = "cyan4", border_color = "black", y_axis_title_vjust = 0.85, x_axis_title = NULL, y_axis_title = NULL, cap_axis_lines = FALSE, notify_na_count = NULL )
histogram_deprecated_1( vector = NULL, number_of_bins = 30, x_tick_marks = NULL, y_tick_marks = NULL, fill_color = "cyan4", border_color = "black", y_axis_title_vjust = 0.85, x_axis_title = NULL, y_axis_title = NULL, cap_axis_lines = FALSE, notify_na_count = NULL )
vector |
a numeric vector |
number_of_bins |
number of bins for the histogram (default = 30) |
x_tick_marks |
a vector of values at which to place tick marks
on the x axis (e.g., setting |
y_tick_marks |
a vector of values at which to place tick marks
on the y axis (e.g., setting |
fill_color |
color for inside of the bins (default = "cyan4") |
border_color |
color for borders of the bins (default = "black") |
y_axis_title_vjust |
position of the y axis title (default = 0.85). |
x_axis_title |
title for x axis (default = "Value") |
y_axis_title |
title for y axis (default = "Count") |
cap_axis_lines |
logical. Should the axis lines be capped at the outer tick marks? (default = FALSE) |
notify_na_count |
if |
the output will be a histogram, a ggplot object.
histogram_deprecated_1(1:100) histogram_deprecated_1(c(1:100, NA)) histogram_deprecated_1(vector = mtcars[["mpg"]]) histogram_deprecated_1( vector = mtcars[["mpg"]], x_tick_marks = seq(10, 36, 2)) histogram_deprecated_1( vector = mtcars[["mpg"]], x_tick_marks = seq(10, 36, 2), y_tick_marks = seq(0, 8, 2), y_axis_title_vjust = 0.5, y_axis_title = "Freq", x_axis_title = "Values of mpg")
histogram_deprecated_1(1:100) histogram_deprecated_1(c(1:100, NA)) histogram_deprecated_1(vector = mtcars[["mpg"]]) histogram_deprecated_1( vector = mtcars[["mpg"]], x_tick_marks = seq(10, 36, 2)) histogram_deprecated_1( vector = mtcars[["mpg"]], x_tick_marks = seq(10, 36, 2), y_tick_marks = seq(0, 8, 2), y_axis_title_vjust = 0.5, y_axis_title = "Freq", x_axis_title = "Values of mpg")
Create a histogram based on the output of the hist
function
in the graphics
package.
histogram_from_hist( vector = NULL, breaks = NULL, counts = NULL, percent = FALSE, bin_fill_color = "green4", bin_border_color = "black", bin_border_thickness = 1, notify_na_count = NULL, x_axis_tick_marks = NULL, y_axis_tick_marks = NULL, cap_axis_lines = TRUE, x_axis_title = "Value", y_axis_title = NULL, y_axis_title_vjust = 0.85 )
histogram_from_hist( vector = NULL, breaks = NULL, counts = NULL, percent = FALSE, bin_fill_color = "green4", bin_border_color = "black", bin_border_thickness = 1, notify_na_count = NULL, x_axis_tick_marks = NULL, y_axis_tick_marks = NULL, cap_axis_lines = TRUE, x_axis_title = "Value", y_axis_title = NULL, y_axis_title_vjust = 0.85 )
vector |
a numeric vector |
breaks |
a numeric vector indicating breaks for the bins. By default, no input is required for this argument. |
counts |
a numeric vector containing counts for the bins (i.e., heights of the bins). By default, no input is required for this argument. |
percent |
logical. If |
bin_fill_color |
color of the area inside each bin (default = "green4") |
bin_border_color |
color of the border around each bin (default = "black") |
bin_border_thickness |
thickness of the border around each bin (default = 1) |
notify_na_count |
if |
x_axis_tick_marks |
a vector of values at which to place tick marks
on the x axis (e.g., setting |
y_axis_tick_marks |
a vector of values at which to place tick marks
on the y axis (e.g., setting |
cap_axis_lines |
logical. Should the axis lines be capped at the outer tick marks? (default = FALSE) |
x_axis_title |
title for x axis (default = "Value") |
y_axis_title |
title for y axis (default = "Count" or "Percentage",
depending on the value of |
y_axis_title_vjust |
position of the y axis title (default = 0.85). |
the output will be a histogram, a ggplot object.
histogram_from_hist(1:100) histogram_from_hist(c(1:100, NA)) histogram_from_hist(vector = mtcars[["mpg"]]) histogram_from_hist(vector = mtcars[["mpg"]], percent = TRUE) histogram_from_hist(vector = mtcars[["mpg"]], x_axis_tick_marks = c(10, 25, 35), y_axis_title_vjust = 0.5, y_axis_title = "Freq", x_axis_title = "Values of mpg")
histogram_from_hist(1:100) histogram_from_hist(c(1:100, NA)) histogram_from_hist(vector = mtcars[["mpg"]]) histogram_from_hist(vector = mtcars[["mpg"]], percent = TRUE) histogram_from_hist(vector = mtcars[["mpg"]], x_axis_tick_marks = c(10, 25, 35), y_axis_title_vjust = 0.5, y_axis_title = "Freq", x_axis_title = "Values of mpg")
Create a histogram with outlier bins
histogram_w_outlier_bins( vector = NULL, bin_cutoffs = NULL, outlier_bin_left = TRUE, outlier_bin_right = TRUE, x_tick_marks = NULL, x_tick_mark_labels = NULL, y_tick_marks = NULL, outlier_bin_fill_color = "coral", non_outlier_bin_fill_color = "cyan4", border_color = "black", y_axis_title_vjust = 0.85, x_axis_title = NULL, y_axis_title = NULL, notify_na_count = NULL, plot_proportion = TRUE, plot_frequency = FALSE, mean = TRUE, ci = TRUE, median = TRUE, median_position = 15, error_bar_size = 3 )
histogram_w_outlier_bins( vector = NULL, bin_cutoffs = NULL, outlier_bin_left = TRUE, outlier_bin_right = TRUE, x_tick_marks = NULL, x_tick_mark_labels = NULL, y_tick_marks = NULL, outlier_bin_fill_color = "coral", non_outlier_bin_fill_color = "cyan4", border_color = "black", y_axis_title_vjust = 0.85, x_axis_title = NULL, y_axis_title = NULL, notify_na_count = NULL, plot_proportion = TRUE, plot_frequency = FALSE, mean = TRUE, ci = TRUE, median = TRUE, median_position = 15, error_bar_size = 3 )
vector |
a numeric vector |
bin_cutoffs |
cutoff points for bins |
outlier_bin_left |
logical. Should the leftmost bin treated as an outlier bin? (default = TRUE) |
outlier_bin_right |
logical. Should the rightmost bin treated as an outlier bin? (default = TRUE) |
x_tick_marks |
a vector of values at which to place tick marks on the x axis. Note that the first bar spans from 0.5 to 1.5, second bar from 1.5 to 2.5, ... nth bar from n - 0.5 to n + 0.5. See the example. By default, tick marks will be placed at every cutoff point for bins |
x_tick_mark_labels |
a character vector to label tick marks. By default, the vector of cutoff points for bins will also be used as labels. |
y_tick_marks |
a vector of values at which to place tick marks
on the y axis (e.g., setting |
outlier_bin_fill_color |
color to fill inside of the outlier bins (default = "coral") |
non_outlier_bin_fill_color |
color to fill inside of the non-outlier bins (default = "cyan4") |
border_color |
color for borders of the bins (default = "black") |
y_axis_title_vjust |
position of the y axis title (default = 0.85). |
x_axis_title |
title for x axis (default = "Value"). If
|
y_axis_title |
title for y axis. By default, it will be either "Proportion" or "Count". |
notify_na_count |
if |
plot_proportion |
logical. Should proportions be plotted, as opposed to frequencies? (default = TRUE) |
plot_frequency |
logical. Should frequencies be plotted,
as opposed to proportions? (default = FALSE).
If |
mean |
logical. Should mean marked on the histogram? (default = TRUE) |
ci |
logical. Should 95% confidence interval marked on the histogram? (default = TRUE) |
median |
logical. Should median marked on the histogram? (default = TRUE) |
median_position |
position of the median label as a percentage of height of the tallest bin (default = 15) |
error_bar_size |
size of the error bars (default = 3) |
a ggplot object
histogram_w_outlier_bins(vector = 1:100, bin_cutoffs = seq(0, 100, 10)) histogram_w_outlier_bins(vector = 0:89, bin_cutoffs = seq(0, 90, 10), x_tick_marks = seq(0.5, 9.5, 3), x_tick_mark_labels = seq(0, 90, 30)) histogram_w_outlier_bins(vector = 1:10, bin_cutoffs = seq(0, 10, 2.5)) histogram_w_outlier_bins(vector = 1:5, bin_cutoffs = seq(0, 10, 2.5)) histogram_w_outlier_bins(vector = 1:15, bin_cutoffs = c(5.52, 10.5))
histogram_w_outlier_bins(vector = 1:100, bin_cutoffs = seq(0, 100, 10)) histogram_w_outlier_bins(vector = 0:89, bin_cutoffs = seq(0, 90, 10), x_tick_marks = seq(0.5, 9.5, 3), x_tick_mark_labels = seq(0, 90, 30)) histogram_w_outlier_bins(vector = 1:10, bin_cutoffs = seq(0, 10, 2.5)) histogram_w_outlier_bins(vector = 1:5, bin_cutoffs = seq(0, 10, 2.5)) histogram_w_outlier_bins(vector = 1:15, bin_cutoffs = c(5.52, 10.5))
Adjust a vector of p-values using the method proposed by Holm
holm_adjusted_p(p = NULL)
holm_adjusted_p(p = NULL)
p |
a numeric vector of p-values |
See the following reference: Holm 1979 https://www.jstor.org/stable/4615733 Manual for the 'p.adjust' function in the 'stats' package https://stat.ethz.ch/R-manual/R-devel/library/stats/html/p.adjust.html
holm_adjusted_p(c(.05, .01)) holm_adjusted_p(c(.05, .05, .05))
holm_adjusted_p(c(.05, .01)) holm_adjusted_p(c(.05, .05, .05))
Create an ID column in each of the data sets. The ID values will span across the data sets.
id_across_datasets( dt_list = NULL, id_col_name = "id", id_col_position = "first", silent = FALSE )
id_across_datasets( dt_list = NULL, id_col_name = "id", id_col_position = "first", silent = FALSE )
dt_list |
a list of data.table objects |
id_col_name |
name of the column that will contain ID values.
By default, |
id_col_position |
position of the newly created ID column.
If |
silent |
If |
the output will be a list of data.table objects.
# running the examples below requires importing the data.table package. prep(data.table) id_across_datasets( dt_list = list(setDT(copy(mtcars)), setDT(copy(iris)))) id_across_datasets( dt_list = list(setDT(copy(mtcars)), setDT(copy(iris)), setDT(copy(women))), id_col_name = "newly_created_id_col", id_col_position = "last")
# running the examples below requires importing the data.table package. prep(data.table) id_across_datasets( dt_list = list(setDT(copy(mtcars)), setDT(copy(iris)))) id_across_datasets( dt_list = list(setDT(copy(mtcars)), setDT(copy(iris)), setDT(copy(women))), id_col_name = "newly_created_id_col", id_col_position = "last")
Check whether all inputs are identical
identical_all(...)
identical_all(...)
... |
two or more R objects. If a vector or list is entered as an input, the function will test whether the vector's or list's elements are identical. |
the output will be TRUE
if all inputs are identical
or FALSE
if not
identical_all(1:3, 1:3) # should return TRUE identical_all(1:3, 1:3, 1:3, 1:3, 1:3) # should return TRUE identical_all(1:3, 1:3, 1:3, 1:3, 1:3, 1:4) # should return FALSE identical_all(1:10) # should return FALSE identical_all(rep(1, 100)) # should return TRUE identical_all(list(1, 1, 1)) # should return TRUE identical_all(TRUE, FALSE) # should return FALSE identical_all(FALSE, TRUE) # should return FALSE
identical_all(1:3, 1:3) # should return TRUE identical_all(1:3, 1:3, 1:3, 1:3, 1:3) # should return TRUE identical_all(1:3, 1:3, 1:3, 1:3, 1:3, 1:4) # should return FALSE identical_all(1:10) # should return FALSE identical_all(rep(1, 100)) # should return TRUE identical_all(list(1, 1, 1)) # should return TRUE identical_all(TRUE, FALSE) # should return FALSE identical_all(FALSE, TRUE) # should return FALSE
Install all dependencies for all functions in Package 'kim'.
install_all_dependencies()
install_all_dependencies()
there will be no output from this function. Rather, dependencies of all functions in Package 'kim' will be installed.
## Not run: install_all_dependencies() ## End(Not run)
## Not run: install_all_dependencies() ## End(Not run)
Calculate kurtosis of the sample using a formula for either the (1) biased estimator or (2) an unbiased estimator of the population kurtosis. Formulas were taken from DeCarlo (1997), doi:10.1037/1082-989X.2.3.292
kurtosis(vector = NULL, unbiased = TRUE)
kurtosis(vector = NULL, unbiased = TRUE)
vector |
a numeric vector |
unbiased |
logical. If |
a numeric value, i.e., kurtosis of the given vector
# calculate the unbiased estimator (e.g., kurtosis value that # Excel 2016 will produce) kim::kurtosis(c(1, 2, 3, 4, 5, 10)) # calculate the biased estimator (e.g., kurtosis value that # R Package 'moments' will produce) kim::kurtosis(c(1, 2, 3, 4, 5, 10), unbiased = FALSE) # compare with kurtosis from 'moments' package moments::kurtosis(c(1, 2, 3, 4, 5, 10))
# calculate the unbiased estimator (e.g., kurtosis value that # Excel 2016 will produce) kim::kurtosis(c(1, 2, 3, 4, 5, 10)) # calculate the biased estimator (e.g., kurtosis value that # R Package 'moments' will produce) kim::kurtosis(c(1, 2, 3, 4, 5, 10), unbiased = FALSE) # compare with kurtosis from 'moments' package moments::kurtosis(c(1, 2, 3, 4, 5, 10))
Extract unique elements and get the length of those elements
lenu(x = NULL)
lenu(x = NULL)
x |
a vector or a data frame or an array or NULL. |
a vector, data frame, or array-like 'x' but with duplicate elements/rows removed.
unique(c(10, 3, 7, 10)) lenu(c(10, 3, 7, 10)) unique(c(10, 3, 7, 10, NA)) lenu(c(10, 3, 7, 10, NA)) lenu(c("b", "z", "b", "a", NA, NA, NA))
unique(c(10, 3, 7, 10)) lenu(c(10, 3, 7, 10)) unique(c(10, 3, 7, 10, NA)) lenu(c(10, 3, 7, 10, NA)) lenu(c("b", "z", "b", "a", NA, NA, NA))
Conduct Levene's test (i.e., test the null hypothesis that the variances in different gorups are equal)
levene_test( data = NULL, dv_name = NULL, iv_1_name = NULL, iv_2_name = NULL, round_f = 2, round_p = 3, output_type = "text" )
levene_test( data = NULL, dv_name = NULL, iv_1_name = NULL, iv_2_name = NULL, round_f = 2, round_p = 3, output_type = "text" )
data |
a data object (a data frame or a data.table) |
dv_name |
name of the dependent variable |
iv_1_name |
name of the first independent variable |
iv_2_name |
name of the second independent variable |
round_f |
number of decimal places to which to round the F-statistic from Levene's test (default = 2) |
round_p |
number of decimal places to which to round the p-value from Levene's test (default = 3) |
output_type |
If |
the output of the function depends on the input for
output_type
. By default, the output will be the
results of Levene's test in a text format (i.e., character).
## Not run: levene_test( data = mtcars, dv_name = "mpg", iv_1_name = "vs", iv_2_name = "am") ## End(Not run)
## Not run: levene_test( data = mtcars, dv_name = "mpg", iv_1_name = "vs", iv_2_name = "am") ## End(Not run)
Calculate log odds ratio (i.e., ln of odds ratio), as illustrated in Borenstein et al. (2009, p. 36, ISBN: 978-0-470-05724-7)
log_odds_ratio( data = NULL, iv_name = NULL, dv_name = NULL, contingency_table = NULL, ci = 0.95, var_include = FALSE, invert = FALSE )
log_odds_ratio( data = NULL, iv_name = NULL, dv_name = NULL, contingency_table = NULL, ci = 0.95, var_include = FALSE, invert = FALSE )
data |
a data object (a data frame or a data.table) |
iv_name |
name of the independent variable (grouping variable) |
dv_name |
name of the dependent variable (binary outcome) |
contingency_table |
a contingency table, which can be directly entered as an input for calculating the odds ratio |
ci |
width of the confidence interval. Input can be any value
less than 1 and greater than or equal to 0. By default, |
var_include |
logical. Should the output include variance of the log of odds ratio? (default = FALSE) |
invert |
logical. Whether the inverse of the odds ratio (i.e., 1 / odds ratio) should be returned. |
## Not run: log_odds_ratio(data = mtcars, iv_name = "vs", dv_name = "am") log_odds_ratio(contingency_table = matrix(c(5, 10, 95, 90), nrow = 2)) log_odds_ratio(contingency_table = matrix(c(5, 10, 95, 90), nrow = 2), invert = TRUE) log_odds_ratio(contingency_table = matrix(c(34, 39, 16, 11), nrow = 2)) log_odds_ratio(contingency_table = matrix(c(34, 39, 16, 11), nrow = 2), var_include = TRUE) ## End(Not run)
## Not run: log_odds_ratio(data = mtcars, iv_name = "vs", dv_name = "am") log_odds_ratio(contingency_table = matrix(c(5, 10, 95, 90), nrow = 2)) log_odds_ratio(contingency_table = matrix(c(5, 10, 95, 90), nrow = 2), invert = TRUE) log_odds_ratio(contingency_table = matrix(c(34, 39, 16, 11), nrow = 2)) log_odds_ratio(contingency_table = matrix(c(34, 39, 16, 11), nrow = 2), var_include = TRUE) ## End(Not run)
Convert log odds ratio to Cohen'd (standardized mean difference), as illustrated in Borenstein et al. (2009, p. 47, ISBN: 978-0-470-05724-7)
log_odds_ratio_to_d(log_odds_ratio = NULL, unname = TRUE)
log_odds_ratio_to_d(log_odds_ratio = NULL, unname = TRUE)
log_odds_ratio |
log odds ratio (the input can be a vector of values), which will be converted to Cohen's d |
unname |
logical. Should the names from the input be removed? (default = TRUE) |
## Not run: log_odds_ratio_to_d(log(1)) log_odds_ratio_to_d(log(2)) ## End(Not run)
## Not run: log_odds_ratio_to_d(log(1)) log_odds_ratio_to_d(log(2)) ## End(Not run)
Conduct logistic regression for a model with an interaction between two predictor variables
logistic_reg_w_interaction( data = NULL, dv_name = NULL, iv_1_name = NULL, iv_2_name = NULL, round_p = 3, round_chi_sq = 2, dv_ordered_levels = NULL, iv_1_ordered_levels = NULL, iv_2_ordered_levels = NULL, one_line_summary_only = FALSE, p_value_interaction_only = FALSE, return_dt_w_binary = FALSE )
logistic_reg_w_interaction( data = NULL, dv_name = NULL, iv_1_name = NULL, iv_2_name = NULL, round_p = 3, round_chi_sq = 2, dv_ordered_levels = NULL, iv_1_ordered_levels = NULL, iv_2_ordered_levels = NULL, one_line_summary_only = FALSE, p_value_interaction_only = FALSE, return_dt_w_binary = FALSE )
data |
a data object (a data frame or a data.table) |
dv_name |
name of the dependent variable (must be a binary variable) |
iv_1_name |
name of the first independent variable |
iv_2_name |
name of the second independent variable |
round_p |
number of decimal places to which to round p-values (default = 3) |
round_chi_sq |
number of decimal places to which to round chi square statistics (default = 2) |
dv_ordered_levels |
a vector with the ordered levels of the
dependent variable, the first and second elements of which will be
coded as 0 and 1, respectively, to run logistic regression.
E.g., |
iv_1_ordered_levels |
(only if the first independent variable
is a binary variable) a vector with the ordered levels of the first
independent variable, the first and second elements of which will be
coded as 0 and 1, respectively, to run logistic regression.
E.g., |
iv_2_ordered_levels |
(only if the second independent variable
is a binary variable) a vector with the ordered levels of the first
independent variable, the first and second elements of which will be
coded as 0 and 1, respectively, to run logistic regression.
E.g., |
one_line_summary_only |
logical. Should the output simply be a printout of a one-line summary on the interaction term? (default = FALSE) |
p_value_interaction_only |
logical. Should the output simply be a p-value of the interaction term in the logistic regression model? (default = FALSE) |
return_dt_w_binary |
logical. If |
the output will be a summary of logistic regression results, unless set otherwise by arguments to the function.
logistic_reg_w_interaction(data = mtcars, dv_name = "vs", iv_1_name = "mpg", iv_2_name = "am")
logistic_reg_w_interaction(data = mtcars, dv_name = "vs", iv_1_name = "mpg", iv_2_name = "am")
Conduct a logistic regression analysis
logistic_regression( data = NULL, formula = NULL, formula_1 = NULL, formula_2 = NULL, z_values_keep = FALSE, constant_row_clean = TRUE, odds_ratio_cols_combine = TRUE, round_b_and_se = 3, round_z = 3, round_p = 3, round_odds_ratio = 3, round_r_sq = 3, round_model_chi_sq = 3, pretty_round_p_value = TRUE, print_glm_default_summary = FALSE, print_summary_dt_list = TRUE, print_model_comparison = TRUE, output_type = "summary_dt_list" )
logistic_regression( data = NULL, formula = NULL, formula_1 = NULL, formula_2 = NULL, z_values_keep = FALSE, constant_row_clean = TRUE, odds_ratio_cols_combine = TRUE, round_b_and_se = 3, round_z = 3, round_p = 3, round_odds_ratio = 3, round_r_sq = 3, round_model_chi_sq = 3, pretty_round_p_value = TRUE, print_glm_default_summary = FALSE, print_summary_dt_list = TRUE, print_model_comparison = TRUE, output_type = "summary_dt_list" )
data |
a data object (a data frame or a data.table) |
formula |
formula for estimating a single logistic regression model |
formula_1 |
formula for estimating logistic regression model 1 of 2 |
formula_2 |
formula for estimating logistic regression model 2 of 2 |
z_values_keep |
logical. Should the z values be kept in the table? (default = FALSE) |
constant_row_clean |
logical. Should the row for the constant be cleared except for b and standard error of b? (default = TRUE) |
odds_ratio_cols_combine |
logical. Should the odds ratio columns be combined? (default = TRUE) |
round_b_and_se |
number of decimal places to which to round b and standard error of b (default = 3) |
round_z |
number of decimal places to which to round z values (default = 3) |
round_p |
number of decimal places to which to round p-values (default = 3) |
round_odds_ratio |
number of decimal places to which to round odds ratios (default = 3) |
round_r_sq |
number of decimal places to which to round R-squared values (default = 3) |
round_model_chi_sq |
number of decimal places to which to round model chi-squared values (default = 3) |
pretty_round_p_value |
logical. Should the p-values be rounded
in a pretty format (i.e., lower threshold: "<.001").
By default, |
print_glm_default_summary |
logical. Should the default summary output of the glm objects be printed? (default = FALSE) |
print_summary_dt_list |
logical. Should the summaries of logistic regressions in a data table format be printed? (default = TRUE) |
print_model_comparison |
logical. Should the comparison of two logistic regression models be printed? (default = TRUE) |
output_type |
If |
the output will be a summary of logistic regression results,
unless set otherwise by the output_type
argument to the function.
logistic_regression(data = mtcars, formula = am ~ mpg) logistic_regression( data = mtcars, formula_1 = am ~ mpg, formula_2 = am ~ mpg + wt)
logistic_regression(data = mtcars, formula = am ~ mpg) logistic_regression( data = mtcars, formula_1 = am ~ mpg, formula_2 = am ~ mpg + wt)
Construct a table of logistic regression results from the given glm object estimating a logistic regression model.
logistic_regression_table( logistic_reg_glm_object = NULL, z_values_keep = FALSE, constant_row_clean = TRUE, odds_ratio_cols_combine = TRUE, round_b_and_se = 3, round_z = 3, round_p = 3, round_odds_ratio = 3, round_r_sq = 3, round_model_chi_sq = 3, pretty_round_p_value = TRUE )
logistic_regression_table( logistic_reg_glm_object = NULL, z_values_keep = FALSE, constant_row_clean = TRUE, odds_ratio_cols_combine = TRUE, round_b_and_se = 3, round_z = 3, round_p = 3, round_odds_ratio = 3, round_r_sq = 3, round_model_chi_sq = 3, pretty_round_p_value = TRUE )
logistic_reg_glm_object |
a glm object estimating a logistic regression model |
z_values_keep |
logical. Should the z values be kept in the table? (default = FALSE) |
constant_row_clean |
logical. Should the row for the constant be cleared except for b and standard error of b? (default = TRUE) |
odds_ratio_cols_combine |
logical. Should the odds ratio columns be combined? (default = TRUE) |
round_b_and_se |
number of decimal places to which to round b and standard error of b (default = 3) |
round_z |
number of decimal places to which to round z values (default = 3) |
round_p |
number of decimal places to which to round p-values (default = 3) |
round_odds_ratio |
number of decimal places to which to round odds ratios (default = 3) |
round_r_sq |
number of decimal places to which to round R-squared values (default = 3) |
round_model_chi_sq |
number of decimal places to which to round model chi-squared values (default = 3) |
pretty_round_p_value |
logical. Should the p-values be rounded
in a pretty format (i.e., lower threshold: "<.001").
By default, |
the output will be a summary of logistic regression results.
logistic_regression_table(logistic_reg_glm_object = glm(formula = am ~ mpg, family = binomial(), data = mtcars)) logistic_regression_table(logistic_reg_glm_object = glm(formula = am ~ mpg, family = binomial(), data = mtcars), z_values_keep = TRUE, constant_row_clean = FALSE, odds_ratio_cols_combine = FALSE)
logistic_regression_table(logistic_reg_glm_object = glm(formula = am ~ mpg, family = binomial(), data = mtcars)) logistic_regression_table(logistic_reg_glm_object = glm(formula = am ~ mpg, family = binomial(), data = mtcars), z_values_keep = TRUE, constant_row_clean = FALSE, odds_ratio_cols_combine = FALSE)
Conduct a loglinear analysis
loglinear_analysis( data = NULL, dv_name = NULL, iv_1_name = NULL, iv_2_name = NULL, iv_1_values = NULL, iv_2_values = NULL, output = "all", round_p = 3, round_chi_sq = 2, mosaic_plot = TRUE, report_as_field = FALSE )
loglinear_analysis( data = NULL, dv_name = NULL, iv_1_name = NULL, iv_2_name = NULL, iv_1_values = NULL, iv_2_values = NULL, output = "all", round_p = 3, round_chi_sq = 2, mosaic_plot = TRUE, report_as_field = FALSE )
data |
a data object (a data frame or a data.table) |
dv_name |
name of the dependent variable |
iv_1_name |
name of the first independent variable |
iv_2_name |
name of the second independent variable |
iv_1_values |
restrict all analyses to observations having these values for the first independent variable |
iv_2_values |
restrict all analyses to observations having these values for the second independent variable |
output |
type of the output. If |
round_p |
number of decimal places to which to round p-values (default = 3) |
round_chi_sq |
number of decimal places to which to round chi-squared test statistics (default = 2) |
mosaic_plot |
If |
report_as_field |
If |
loglinear_analysis(data = data.frame(Titanic), "Survived", "Sex", "Age")
loglinear_analysis(data = data.frame(Titanic), "Survived", "Sex", "Age")
Detect outliers in a numeric vector using the Median Absolute Deviation (MAD) method and remove or convert them. For more information on MAD, see Leys et al. (2013) doi:10.1016/j.jesp.2013.03.013
mad_remove_outliers( x = NULL, threshold = 2.5, constant = 1.4826, convert_outliers_to = NA, output_type = "converted_vector" )
mad_remove_outliers( x = NULL, threshold = 2.5, constant = 1.4826, convert_outliers_to = NA, output_type = "converted_vector" )
x |
a numeric vector |
threshold |
the threshold value for determining outliers.
If |
constant |
scale factor for the 'mad' function in the 'stats'
package. It is the constant linked to the assumed distribution.
In case of normality, constant = 1.4826.
By default, |
convert_outliers_to |
the value to which outliers will be converted.
For example, if |
output_type |
type of the output.
If |
## Not run: mad_remove_outliers(x = c(1, 3, 3, 6, 8, 10, 10, 1000)) mad_remove_outliers(x = c(1, 3, 3, 6, 8, 10, 10, 1000, -10000)) # return the vector with the outlier converted to NA values mad_remove_outliers( x = c(1, 3, 3, 6, 8, 10, 10, 1000, -10000), output_type = "converted_vector") # return the cutoff values for determining outliers mad_remove_outliers( x = c(1, 3, 3, 6, 8, 10, 10, 1000, -10000), output_type = "cutoff_values") # return the outliers mad_remove_outliers( x = c(1, 3, 3, 6, 8, 10, 10, 1000, -10000), output_type = "outliers") mad_remove_outliers( x = c(1, 3, 3, 6, 8, 10, 10, 1000, -10000), output_type = "non_outlier_values") ## End(Not run)
## Not run: mad_remove_outliers(x = c(1, 3, 3, 6, 8, 10, 10, 1000)) mad_remove_outliers(x = c(1, 3, 3, 6, 8, 10, 10, 1000, -10000)) # return the vector with the outlier converted to NA values mad_remove_outliers( x = c(1, 3, 3, 6, 8, 10, 10, 1000, -10000), output_type = "converted_vector") # return the cutoff values for determining outliers mad_remove_outliers( x = c(1, 3, 3, 6, 8, 10, 10, 1000, -10000), output_type = "cutoff_values") # return the outliers mad_remove_outliers( x = c(1, 3, 3, 6, 8, 10, 10, 1000, -10000), output_type = "outliers") mad_remove_outliers( x = c(1, 3, 3, 6, 8, 10, 10, 1000, -10000), output_type = "non_outlier_values") ## End(Not run)
A nonparametric equivalent of the independent t-test
mann_whitney( data = NULL, iv_name = NULL, dv_name = NULL, iv_level_order = NULL, sigfigs = 3 )
mann_whitney( data = NULL, iv_name = NULL, dv_name = NULL, iv_level_order = NULL, sigfigs = 3 )
data |
a data object (a data frame or a data.table) |
iv_name |
name of the independent variable (grouping variable) |
dv_name |
name of the dependent variable (measure variable of interest) |
iv_level_order |
order of levels in the independent
variable. By default, it will be set as levels of the
independent variable ordered using R's base function |
sigfigs |
number of significant digits to round to |
the output will be a data.table object with all pairwise Mann-Whitney test results
mann_whitney(data = iris, iv_name = "Species", dv_name = "Sepal.Length")
mann_whitney(data = iris, iv_name = "Species", dv_name = "Sepal.Length")
Prepare a two-column data.table that will be used to fill values in a matrix
matrix_prep_dt(row_var_names = NULL, col_var_names = NULL)
matrix_prep_dt(row_var_names = NULL, col_var_names = NULL)
row_var_names |
a vector of variable names, each of which will be header of a row in the eventual matrix |
col_var_names |
a vector of variable names, each of which will be header of a column in the eventual matrix |
matrix_prep_dt( row_var_names = c("mpg", "cyl"), col_var_names = c("hp", "gear") )
matrix_prep_dt( row_var_names = c("mpg", "cyl"), col_var_names = c("hp", "gear") )
Mean-center a variable, i.e., subtract the mean of a numeric vector from each value in the numeric vector
mean_center(x)
mean_center(x)
x |
a numeric vector; though not thoroughly tested, the function can accept a matrix as an input. |
mean_center(1:5) mean_center(1:6) # if the input is a matrix matrix(1:9, nrow = 3) mean_center(matrix(1:9, nrow = 3))
mean_center(1:5) mean_center(1:6) # if the input is a matrix matrix(1:9, nrow = 3) mean_center(matrix(1:9, nrow = 3))
Conducts a mediation analysis to estimate an independent variable's indirect effect on dependent variable through a mediator variable. The current version of the package only supports a simple mediation model consisting of one independent variable, one mediator variable, and one dependent variable.
mediation_analysis( data = NULL, iv_name = NULL, mediator_name = NULL, dv_name = NULL, covariates_names = NULL, robust_se = TRUE, iterations = 1000, sigfigs = 3, output_type = "summary_dt", silent = FALSE )
mediation_analysis( data = NULL, iv_name = NULL, mediator_name = NULL, dv_name = NULL, covariates_names = NULL, robust_se = TRUE, iterations = 1000, sigfigs = 3, output_type = "summary_dt", silent = FALSE )
data |
a data object (a data frame or a data.table) |
iv_name |
name of the independent variable |
mediator_name |
name of the mediator variable |
dv_name |
name of the dependent variable |
covariates_names |
names of covariates to control for |
robust_se |
if |
iterations |
number of bootstrap samples. The default is set at 1000, but consider increasing the number of samples to 5000, 10000, or an even larger number, if slower handling time is not an issue. |
sigfigs |
number of significant digits to round to |
output_type |
if |
silent |
if |
This function requires installing Package 'mediation' v4.5.0 (or possibly a higher version) by Tingley et al. (2019), and uses the source code from a function in the package. https://cran.r-project.org/package=mediation
if output_type = "summary_dt"
, which is the default,
the output will be a data.table showing a summary of mediation
analysis results; if output_type = "mediate_output"
,
the output will be the output from the mediate
function
in the 'mediate' package; if output_type = "indirect_effect_p"
,
the output will be the p-value associated with the indirect effect
estimated in the mediation model (a numeric vector of length one).
mediation_analysis( data = mtcars, iv_name = "cyl", mediator_name = "disp", dv_name = "mpg", iterations = 100 ) mediation_analysis( data = iris, iv_name = "Sepal.Length", mediator_name = "Sepal.Width", dv_name = "Petal.Length", iterations = 100 )
mediation_analysis( data = mtcars, iv_name = "cyl", mediator_name = "disp", dv_name = "mpg", iterations = 100 ) mediation_analysis( data = iris, iv_name = "Sepal.Length", mediator_name = "Sepal.Width", dv_name = "Petal.Length", iterations = 100 )
Successively merge a list of data.table objects in a recursive fashion. That is, merge the (second data table in the list) around the first data table in the list; then, around this resulting data table, merge the third data table in the list; and so on.
merge_data_table_list(dt_list = NULL, id = NULL, silent = TRUE)
merge_data_table_list(dt_list = NULL, id = NULL, silent = TRUE)
dt_list |
a list of data.table objects |
id |
name(s) of the column(s) that will contain the ID values in the two data tables. The name(s) of the ID column(s) must be identical in the two data tables. |
silent |
If |
If there are any duplicated ID values and column names across the data tables, the cell values in the earlier data table will remain intact and the cell values in the later data table will be discarded for the resulting merged data table in each recursion.
a data.table object, which successively merges (joins) a data table around (i.e., outside) the previous data table in the list of data tables.
data_1 <- data.table::data.table( id_col = c(4, 2, 1, 3), a = 3:6, b = 5:8, c = c("w", "x", "y", "z")) data_2 <- data.table::data.table( id_col = c(1, 4, 99), d = 6:8, b = c("p", "q", "r"), e = c(TRUE, FALSE, FALSE)) data_3 <- data.table::data.table( id_col = c(200, 3), f = 11:12, b = c(300, "abc")) merge_data_table_list( dt_list = list(data_1, data_2, data_3), id = "id_col")
data_1 <- data.table::data.table( id_col = c(4, 2, 1, 3), a = 3:6, b = 5:8, c = c("w", "x", "y", "z")) data_2 <- data.table::data.table( id_col = c(1, 4, 99), d = 6:8, b = c("p", "q", "r"), e = c(TRUE, FALSE, FALSE)) data_3 <- data.table::data.table( id_col = c(200, 3), f = 11:12, b = c(300, "abc")) merge_data_table_list( dt_list = list(data_1, data_2, data_3), id = "id_col")
Merge two data.table objects. If there are any duplicated ID values and column names across the two data tables, the cell values in the first data.table will remain intact and the cell values in the second data.table will be discarded for the resulting merged data table.
merge_data_tables(dt1 = NULL, dt2 = NULL, id = NULL, silent = TRUE)
merge_data_tables(dt1 = NULL, dt2 = NULL, id = NULL, silent = TRUE)
dt1 |
the first data.table which will remain intact |
dt2 |
the second data.table which will be joined outside of (around) the first data.table. If there are any duplicated ID values and column names across the two data tables, the cell values in the first data.table will remain intact and the cell values in the second data.table will be discarded for the resulting merged data table. |
id |
name(s) of the column(s) that will contain the ID values in the two data tables. The name(s) of the ID column(s) must be identical in the two data tables. |
silent |
If |
a data.table object, which merges (joins) the second data.table around the first data.table.
## Example 1: Typical Usage data_1 <- data.table::data.table( id_col = c(4, 2, 1, 3), a = 3:6, b = 5:8, c = c("w", "x", "y", "z")) data_2 <- data.table::data.table( id_col = c(1, 99, 4), e = 6:8, b = c("p", "q", "r"), d = c(TRUE, FALSE, FALSE)) # check the two example data tables data_1 data_2 # check the result of merging the two data tables above and # note how data_1 (the upper left portion) is intact in the resulting # data table merge_data_tables(dt1 = data_1, dt2 = data_2, id = "id_col") # compare the result with above with the result from the `merge` function merge(data_1, data_2, by = "id_col", all = TRUE) ## Example 2: Some values can be converted data_3 <- data.table::data.table( id_col = 99, a = "abc", b = TRUE, c = TRUE) data_1 data_3 merge_data_tables(data_1, data_3, id = "id_col") # In the example above, note how the value of TRUE gets # converted to 1 in the last row of Column 'b' in the resulting data table ## Example 3: A simpler case data_4 <- data.table::data.table( id_col = c(5, 3), a = c("a", NA)) data_5 <- data.table::data.table( id_col = 1, a = 2) # check the two example data tables data_4 data_5 merge_data_tables(data_4, data_5, id = "id_col") ## Example 4: Merging data tables using multiple ID columns data_6 <- data.table::data.table( id_col_1 = 3:1, id_col_2 = c("a", "b", "c"), id_col_3 = 4:6, a = 7:9, b = 10:12) data_7 <- data.table::data.table( id_col_1 = c(3, 2), id_col_3 = c(3, 5), id_col_2 = c("a", "b"), c = 13:14, a = 15:16) # check the example data sets data_6 data_7 # merge data sets using the three id columns suppressWarnings(merge_data_tables( dt1 = data_6, dt2 = data_7, id = c("id_col_1", "id_col_2", "id_col_3")))
## Example 1: Typical Usage data_1 <- data.table::data.table( id_col = c(4, 2, 1, 3), a = 3:6, b = 5:8, c = c("w", "x", "y", "z")) data_2 <- data.table::data.table( id_col = c(1, 99, 4), e = 6:8, b = c("p", "q", "r"), d = c(TRUE, FALSE, FALSE)) # check the two example data tables data_1 data_2 # check the result of merging the two data tables above and # note how data_1 (the upper left portion) is intact in the resulting # data table merge_data_tables(dt1 = data_1, dt2 = data_2, id = "id_col") # compare the result with above with the result from the `merge` function merge(data_1, data_2, by = "id_col", all = TRUE) ## Example 2: Some values can be converted data_3 <- data.table::data.table( id_col = 99, a = "abc", b = TRUE, c = TRUE) data_1 data_3 merge_data_tables(data_1, data_3, id = "id_col") # In the example above, note how the value of TRUE gets # converted to 1 in the last row of Column 'b' in the resulting data table ## Example 3: A simpler case data_4 <- data.table::data.table( id_col = c(5, 3), a = c("a", NA)) data_5 <- data.table::data.table( id_col = 1, a = 2) # check the two example data tables data_4 data_5 merge_data_tables(data_4, data_5, id = "id_col") ## Example 4: Merging data tables using multiple ID columns data_6 <- data.table::data.table( id_col_1 = 3:1, id_col_2 = c("a", "b", "c"), id_col_3 = 4:6, a = 7:9, b = 10:12) data_7 <- data.table::data.table( id_col_1 = c(3, 2), id_col_3 = c(3, 5), id_col_2 = c("a", "b"), c = 13:14, a = 15:16) # check the example data sets data_6 data_7 # merge data sets using the three id columns suppressWarnings(merge_data_tables( dt1 = data_6, dt2 = data_7, id = c("id_col_1", "id_col_2", "id_col_3")))
Conduct a two-way mixed analysis of variance (ANOVA).
mixed_anova_2_way( data = NULL, iv_name_bw_group = NULL, repeated_measures_col_names = NULL, iv_name_bw_group_values = NULL, colors = NULL, error_bar = "ci", position_dodge = 0.13, legend_title = NULL, x_axis_expansion_add = c(0.2, 0.03), x_axis_title = NULL, y_axis_title = "Mean", output = "all" )
mixed_anova_2_way( data = NULL, iv_name_bw_group = NULL, repeated_measures_col_names = NULL, iv_name_bw_group_values = NULL, colors = NULL, error_bar = "ci", position_dodge = 0.13, legend_title = NULL, x_axis_expansion_add = c(0.2, 0.03), x_axis_title = NULL, y_axis_title = "Mean", output = "all" )
data |
a data object (a data frame or a data.table) |
iv_name_bw_group |
name of the between-group independent variable |
repeated_measures_col_names |
names of the columns containing the repeated measures |
iv_name_bw_group_values |
restrict all analyses to observations having these values for the between-group independent variable |
colors |
colors of the dots and lines connecting means
(default = NULL) If there are exactly two repeated measures,
then, by default, |
error_bar |
if |
position_dodge |
by how much should the group means and error bars be horizontally offset from each other so as not to overlap? (default = 0.13) |
legend_title |
a character for the legend title. If no input is entered, then, by default, the legend title will be removed. |
x_axis_expansion_add |
inputs for the |
x_axis_title |
a character string for the x-axis title.
If |
y_axis_title |
a character string for the y-axis title
(default = "Mean"). If |
output |
output type can be one of the following:
|
The following package(s) must be installed prior to running this function: Package 'afex' v3.0.9 (or possibly a higher version) by Fox et al. (2020), https://cran.r-project.org/package=car
mixed_anova_2_way( data = iris, iv_name_bw_group = "Species", repeated_measures_col_names = c("Sepal.Length", "Petal.Length")) g1 <- mixed_anova_2_way( data = iris, iv_name_bw_group = "Species", repeated_measures_col_names = c("Sepal.Length", "Petal.Length"), error_bar = "se", output = "plot")
mixed_anova_2_way( data = iris, iv_name_bw_group = "Species", repeated_measures_col_names = c("Sepal.Length", "Petal.Length")) g1 <- mixed_anova_2_way( data = iris, iv_name_bw_group = "Species", repeated_measures_col_names = c("Sepal.Length", "Petal.Length"), error_bar = "se", output = "plot")
Find modes of objects
modes_of_objects(...)
modes_of_objects(...)
... |
R objects. |
the output will be a data.table listing objects and their mods.
modes_of_objects( TRUE, FALSE, 1L, 1:3, 1.1, c(1.2, 1.3), "abc", 1 + 2i, intToBits(1L))
modes_of_objects( TRUE, FALSE, 1L, 1:3, 1.1, c(1.2, 1.3), "abc", 1 + 2i, intToBits(1L))
Conduct multiple regression analysis and summarize the results in a data.table.
multiple_regression( data = NULL, formula = NULL, vars_to_mean_center = NULL, mean_center_vars = NULL, sigfigs = NULL, round_digits_after_decimal = NULL, round_p = NULL, pretty_round_p_value = TRUE, return_table_upper_half = FALSE, round_r_squared = 3, round_f_stat = 2, prettify_reg_table_col_names = TRUE, silent = FALSE, save_as_png = FALSE, png_name = NULL, width = 1600, height = 1200, units = "px", res = 200 )
multiple_regression( data = NULL, formula = NULL, vars_to_mean_center = NULL, mean_center_vars = NULL, sigfigs = NULL, round_digits_after_decimal = NULL, round_p = NULL, pretty_round_p_value = TRUE, return_table_upper_half = FALSE, round_r_squared = 3, round_f_stat = 2, prettify_reg_table_col_names = TRUE, silent = FALSE, save_as_png = FALSE, png_name = NULL, width = 1600, height = 1200, units = "px", res = 200 )
data |
a data object (a data frame or a data.table) |
formula |
a formula object for the regression equation |
vars_to_mean_center |
(deprecated) a character vector specifying names of variables that will be mean-centered before the regression model is estimated |
mean_center_vars |
a character vector specifying names of variables that will be mean-centered before the regression model is estimated |
sigfigs |
number of significant digits to round to |
round_digits_after_decimal |
round to nth digit after decimal
(alternative to |
round_p |
number of decimal places to round p values (overrides all other rounding arguments) |
pretty_round_p_value |
logical. Should the p-values be rounded
in a pretty format (i.e., lower threshold: "<.001").
By default, |
return_table_upper_half |
logical. Should only the upper part
of the table be returned?
By default, |
round_r_squared |
number of digits after the decimal both r-squared and adjusted r-squared values should be rounded to (default 3) |
round_f_stat |
number of digits after the decimal the f statistic of the regression model should be rounded to (default 2) |
prettify_reg_table_col_names |
logical. Should the column names
of the regression table be made pretty (e.g., change "std_beta" to
"Std. Beta")? (Default = |
silent |
If |
save_as_png |
if |
png_name |
name of the PNG file to be saved. By default, the name will be "mult_reg_" followed by a timestamp of the current time. The timestamp will be in the format, jan_01_2021_1300_10_000001, where "jan_01_2021" would indicate January 01, 2021; 1300 would indicate 13:00 (i.e., 1 PM); and 10_000001 would indicate 10.000001 seconds after the hour. |
width |
width of the PNG file (default = 1600) |
height |
height of the PNG file (default = 1200) |
units |
the units for the |
res |
The nominal resolution in ppi which will be recorded
in the png file, if a positive integer. Used for units
other than the default. By default, |
To include standardized beta(s) in the regression results table, the following package(s) must be installed prior to running the function: Package 'lm.beta' v1.5-1 (or possibly a higher version) by Stefan Behrendt (2014), https://cran.r-project.org/package=lm.beta
the output will be a data.table showing multiple regression results.
multiple_regression(data = mtcars, formula = mpg ~ gear * cyl) multiple_regression( data = mtcars, formula = mpg ~ gear * cyl, mean_center_vars = "gear", round_digits_after_decimal = 2) multiple_regression( data = mtcars, formula = mpg ~ gear * cyl, png_name = "mtcars reg table 1")
multiple_regression(data = mtcars, formula = mpg ~ gear * cyl) multiple_regression( data = mtcars, formula = mpg ~ gear * cyl, mean_center_vars = "gear", round_digits_after_decimal = 2) multiple_regression( data = mtcars, formula = mpg ~ gear * cyl, png_name = "mtcars reg table 1")
Find noncentrality parameter
noncentrality_parameter(t_stat, df, initial_value = 0, ci = 0.95)
noncentrality_parameter(t_stat, df, initial_value = 0, ci = 0.95)
t_stat |
the t-statistic associated with the noncentrality parameters |
df |
degrees of freedom associated with the noncentrality parameters |
initial_value |
initial value of the noncentrality parameter for optimization (default = 0). Adjust this value if results look strange. |
ci |
width of the confidence interval associated with the noncentrality parameters (default = 0.95) |
noncentrality_parameter(4.29, 9)
noncentrality_parameter(4.29, 9)
Calculate odds ratio, as illustrated in Borenstein et al. (2009, pp. 33-36, ISBN: 978-0-470-05724-7)
odds_ratio( data = NULL, iv_name = NULL, dv_name = NULL, contingency_table = NULL, ci = 0.95, round_ci_limits = 2, invert = FALSE )
odds_ratio( data = NULL, iv_name = NULL, dv_name = NULL, contingency_table = NULL, ci = 0.95, round_ci_limits = 2, invert = FALSE )
data |
a data object (a data frame or a data.table) |
iv_name |
name of the independent variable (grouping variable) |
dv_name |
name of the dependent variable (binary outcome) |
contingency_table |
a contingency table, which can be directly entered as an input for calculating the odds ratio |
ci |
width of the confidence interval. Input can be any value
less than 1 and greater than or equal to 0. By default, |
round_ci_limits |
number of decimal places to which to round the limits of the confidence interval (default = 2) |
invert |
logical. Whether the inverse of the odds ratio (i.e., 1 / odds ratio) should be returned. |
## Not run: odds_ratio(data = mtcars, iv_name = "vs", dv_name = "am") odds_ratio(data = mtcars, iv_name = "vs", dv_name = "am", ci = 0.9) odds_ratio(contingency_table = matrix(c(5, 10, 95, 90), nrow = 2)) odds_ratio(contingency_table = matrix(c(5, 10, 95, 90), nrow = 2), invert = TRUE) odds_ratio(contingency_table = matrix(c(34, 39, 16, 11), nrow = 2)) ## End(Not run)
## Not run: odds_ratio(data = mtcars, iv_name = "vs", dv_name = "am") odds_ratio(data = mtcars, iv_name = "vs", dv_name = "am", ci = 0.9) odds_ratio(contingency_table = matrix(c(5, 10, 95, 90), nrow = 2)) odds_ratio(contingency_table = matrix(c(5, 10, 95, 90), nrow = 2), invert = TRUE) odds_ratio(contingency_table = matrix(c(34, 39, 16, 11), nrow = 2)) ## End(Not run)
Order rows in a data.table in a specific order
order_rows_specifically_in_dt( dt = NULL, col_to_order_by = NULL, specific_order = NULL )
order_rows_specifically_in_dt( dt = NULL, col_to_order_by = NULL, specific_order = NULL )
dt |
a data.table object |
col_to_order_by |
a character value indicating the name of the column by which to order the data.table |
specific_order |
a vector indicating a specific order of the values in the column by which to order the data.table. |
the output will be a data.table object whose rows will be ordered as specified.
order_rows_specifically_in_dt(mtcars, "carb", c(3, 2, 1, 4, 8, 6))
order_rows_specifically_in_dt(mtcars, "carb", c(3, 2, 1, 4, 8, 6))
Return outliers in a vector
outlier(x = NULL, iqr = 1.5, na.rm = TRUE, type = 7, unique_outliers = FALSE)
outlier(x = NULL, iqr = 1.5, na.rm = TRUE, type = 7, unique_outliers = FALSE)
x |
a numeric vector |
iqr |
a nonnegative constant by which interquartile range (IQR)
will be multiplied to build a "fence," outside which observations
will be considered outliers. For example, if |
na.rm |
logical. |
type |
|
unique_outliers |
logical. If |
the output will be a numeric vector with outliers removed.
# Example 1 outlier(c(1:10, 100)) # The steps below show how the outlier, 100, was obtained # v1 is the vector of interest v1 <- c(1:10, 100) # quantile stats::quantile(v1) # first and third quartiles q1 <- stats::quantile(v1, 0.25) q3 <- stats::quantile(v1, 0.75) # interquartile range interquartile_range <- unname(q3 - q1) # fence, using the default 1.5 as the factor to multiply the IQR cutoff_low <- unname(q1 - 1.5 * interquartile_range) cutoff_high <- unname(q3 + 1.5 * interquartile_range) v1[v1 < cutoff_low | v1 > cutoff_high]
# Example 1 outlier(c(1:10, 100)) # The steps below show how the outlier, 100, was obtained # v1 is the vector of interest v1 <- c(1:10, 100) # quantile stats::quantile(v1) # first and third quartiles q1 <- stats::quantile(v1, 0.25) q3 <- stats::quantile(v1, 0.75) # interquartile range interquartile_range <- unname(q3 - q1) # fence, using the default 1.5 as the factor to multiply the IQR cutoff_low <- unname(q1 - 1.5 * interquartile_range) cutoff_high <- unname(q3 + 1.5 * interquartile_range) v1[v1 < cutoff_low | v1 > cutoff_high]
This function should be applied to cases where the two ranges are inclusive of both endpoints. For example, the function can work for a pair of ranges like [0, 1] and [3, 4] but not for pairs like [0, 1\) and \(3, 5\)
overlapping_interval( interval_1_begin = NULL, interval_1_end = NULL, interval_2_begin = NULL, interval_2_end = NULL )
overlapping_interval( interval_1_begin = NULL, interval_1_end = NULL, interval_2_begin = NULL, interval_2_end = NULL )
interval_1_begin |
a number at which the first interval begins (the left INCLUSIVE endpoint of interval 1) |
interval_1_end |
a number at which the first interval ends (the right INCLUSIVE endpoint of interval 1) |
interval_2_begin |
a number at which the second interval begins (the left INCLUSIVE endpoint of interval 2) |
interval_2_end |
a number at which the second interval ends (the right INCLUSIVE endpoint of interval 2) |
the output will be NULL
if there is no overlapping
region or a vector of the endpoints of the overlapping interval.
overlapping_interval(1, 3, 2, 4) overlapping_interval(1, 2.22, 2.22, 3)
overlapping_interval(1, 3, 2, 4) overlapping_interval(1, 2.22, 2.22, 3)
A shorthand for the function paste0
Concatenate vectors after converting to character.
p0(..., collapse = NULL, recycle0 = FALSE)
p0(..., collapse = NULL, recycle0 = FALSE)
... |
one or more R objects, to be converted to character vectors.
This is the same argument that would be used in the |
collapse |
an optional character string to separate the results.
Not NA_character_.
This is the same argument that would be used in the |
recycle0 |
logical indicating if zero-length character
arguments should lead to the zero-length character(0)
after the sep-phase (which turns into "" in the
collapse-phase, i.e., when collapse is not NULL).
This is the same argument that would be used in the |
paste0("a", "b") p0("a", "b")
paste0("a", "b") p0("a", "b")
List the default packages in R
package_list_default(package_type = c("base", "recommended"))
package_list_default(package_type = c("base", "recommended"))
package_type |
a vector of package types. By default,
|
package_list_default() package_list_default(package_type = "base")
package_list_default() package_list_default(package_type = "base")
Conducts a parallel analysis to determine how many factors to retain in a factor analysis.
parallel_analysis( data = NULL, names_of_vars = NULL, iterations = NULL, percentile_for_eigenvalue = 95, line_types = c("dashed", "solid"), colors = c("red", "blue"), eigenvalue_random_label_x_pos = NULL, eigenvalue_random_label_y_pos = NULL, unadj_eigenvalue_label_x_pos = NULL, unadj_eigenvalue_label_y_pos = NULL, label_offset_percent = 2, label_size = 6, dot_size = 5, line_thickness = 1.5, y_axis_title_vjust = 0.8, title_text_size = 26, axis_text_size = 22 )
parallel_analysis( data = NULL, names_of_vars = NULL, iterations = NULL, percentile_for_eigenvalue = 95, line_types = c("dashed", "solid"), colors = c("red", "blue"), eigenvalue_random_label_x_pos = NULL, eigenvalue_random_label_y_pos = NULL, unadj_eigenvalue_label_x_pos = NULL, unadj_eigenvalue_label_y_pos = NULL, label_offset_percent = 2, label_size = 6, dot_size = 5, line_thickness = 1.5, y_axis_title_vjust = 0.8, title_text_size = 26, axis_text_size = 22 )
data |
a data object (a data frame or a data.table) |
names_of_vars |
names of the variables |
iterations |
number of random data sets. If no input is entered, this value will be set as 30 * number of variables. |
percentile_for_eigenvalue |
percentile used in estimating bias (default = 95). |
line_types |
types of the lines connecting eigenvalues.
By default, |
colors |
size of the dots denoting eigenvalues (default = 5). |
eigenvalue_random_label_x_pos |
(optional) x coordinate of the label for eigenvalues from randomly generated data. |
eigenvalue_random_label_y_pos |
(optional) y coordinate of the label for eigenvalues from randomly generated data. |
unadj_eigenvalue_label_x_pos |
(optional) x coordinate of the label for unadjusted eigenvalues |
unadj_eigenvalue_label_y_pos |
(optional) y coordinate of the label for unadjusted eigenvalues |
label_offset_percent |
How much should labels for the eigenvalue curves be offset, as a percentage of the plot's x and y range? (default = 2) |
label_size |
size of the labels for the eigenvalue curves (default = 6). |
dot_size |
size of the dots denoting eigenvalues (default = 5). |
line_thickness |
thickness of the eigenvalue curves (default = 1.5). |
y_axis_title_vjust |
position of the y axis title as a proportion of the range (default = 0.8). |
title_text_size |
size of the plot title (default = 26). |
axis_text_size |
size of the text on the axes (default = 22). |
The following package(s) must be installed prior to running the function: Package 'paran' v1.5.2 (or possibly a higher version) by Alexis Dinno (2018), https://cran.r-project.org/package=paran
parallel_analysis( data = mtcars, names_of_vars = c("disp", "hp", "drat")) # parallel_analysis( # data = mtcars, names_of_vars = c("carb", "vs", "gear", "am"))
parallel_analysis( data = mtcars, names_of_vars = c("disp", "hp", "drat")) # parallel_analysis( # data = mtcars, names_of_vars = c("carb", "vs", "gear", "am"))
Calculate percentile rank of each value in a vector
percentile_rank(vector)
percentile_rank(vector)
vector |
a numeric vector |
percentile_rank(1:5) percentile_rank(1:10) percentile_rank(1:100)
percentile_rank(1:5) percentile_rank(1:10) percentile_rank(1:100)
Create a pivot table.
pivot_table( data = NULL, row_names = NULL, col_names = NULL, function_as_character = NULL, sigfigs = 3, output = "dt", remove_col_names = TRUE )
pivot_table( data = NULL, row_names = NULL, col_names = NULL, function_as_character = NULL, sigfigs = 3, output = "dt", remove_col_names = TRUE )
data |
a data object (a data frame or a data.table) |
row_names |
names of variables for constructing rows |
col_names |
names of variables for constructing columns independent variables |
function_as_character |
function to perform for each cell in the pivot table |
sigfigs |
number of significant digits to which to round values in the pivot table (default = 3) |
output |
type of output. If |
remove_col_names |
logical. Should the column names (i.e., v1, v2, ...) be removed in the data table output? |
the output will be a contingency table in a data.table format
pivot_table( data = mtcars, col_names = "am", row_names = c("cyl", "vs"), function_as_character = "mean(mpg)") pivot_table( data = mtcars, col_names = "am", row_names = c("cyl", "vs"), function_as_character = "sum(mpg < 17)") pivot_table( data = mtcars, col_names = "am", row_names = c("cyl", "vs"), function_as_character = "round(sum(mpg < 17) / sum(!is.na(mpg)) * 100, 0)")
pivot_table( data = mtcars, col_names = "am", row_names = c("cyl", "vs"), function_as_character = "mean(mpg)") pivot_table( data = mtcars, col_names = "am", row_names = c("cyl", "vs"), function_as_character = "sum(mpg < 17)") pivot_table( data = mtcars, col_names = "am", row_names = c("cyl", "vs"), function_as_character = "round(sum(mpg < 17) / sum(!is.na(mpg)) * 100, 0)")
Creates a plot of sample means and error bars by group.
plot_group_means( data = NULL, dv_name = NULL, iv_name = NULL, na.rm = TRUE, error_bar = "ci", error_bar_range = 0.95, error_bar_tip_width = 0.13, error_bar_thickness = 1, error_bar_caption = TRUE, lines_connecting_means = TRUE, line_colors = NULL, line_types = NULL, line_thickness = 1, line_size = NULL, dot_size = 3, position_dodge = 0.13, x_axis_title = NULL, y_axis_title = NULL, y_axis_title_vjust = 0.85, legend_title = NULL, legend_position = "right" )
plot_group_means( data = NULL, dv_name = NULL, iv_name = NULL, na.rm = TRUE, error_bar = "ci", error_bar_range = 0.95, error_bar_tip_width = 0.13, error_bar_thickness = 1, error_bar_caption = TRUE, lines_connecting_means = TRUE, line_colors = NULL, line_types = NULL, line_thickness = 1, line_size = NULL, dot_size = 3, position_dodge = 0.13, x_axis_title = NULL, y_axis_title = NULL, y_axis_title_vjust = 0.85, legend_title = NULL, legend_position = "right" )
data |
a data object (a data frame or a data.table) |
dv_name |
name of the dependent variable |
iv_name |
name(s) of the independent variable(s). Up to two independent variables can be supplied. |
na.rm |
logical. If |
error_bar |
if |
error_bar_range |
width of the confidence or prediction interval
(default = 0.95 for 95 percent confidence or prediction interval).
This argument will not apply when |
error_bar_tip_width |
graphically, width of the segments at the end of error bars (default = 0.13) |
error_bar_thickness |
thickness of the error bars (default = 1) |
error_bar_caption |
should a caption be included to indicate the width of the error bars? (default = TRUE). |
lines_connecting_means |
logical. Should lines connecting means within each group be drawn? (default = TRUE) |
line_colors |
colors of the lines connecting means (default = NULL)
If the second IV has two levels, then by default,
|
line_types |
types of the lines connecting means (default = NULL)
If the second IV has two levels, then by default,
|
line_thickness |
thickness of the lines connecting group means (default = 1) |
line_size |
Deprecated. Use the 'linewidth' argument instead. (default = 1) |
dot_size |
size of the dots indicating group means (default = 3) |
position_dodge |
by how much should the group means and error bars be horizontally offset from each other so as not to overlap? (default = 0.13) |
x_axis_title |
a character string for the x-axis title. If no
input is entered, then, by default, the first value of
|
y_axis_title |
a character string for the y-axis title. If no
input is entered, then, by default, |
y_axis_title_vjust |
position of the y axis title (default = 0.85).
By default, |
legend_title |
a character for the legend title. If no input
is entered, then, by default, the second value of |
legend_position |
position of the legend:
|
by default, the output will be a ggplot object.
If output = "table"
, the output will be a data.table object.
plot_group_means(data = mtcars, dv_name = "mpg", iv_name = c("vs", "am")) plot_group_means( data = mtcars, dv_name = "mpg", iv_name = c("vs", "am"), error_bar = "se" ) plot_group_means( data = mtcars, dv_name = "mpg", iv_name = c("vs", "am"), error_bar = "pi", error_bar_range = 0.99 ) # set line colors and types manually plot_group_means( data = mtcars, dv_name = "mpg", iv_name = c("vs", "am"), line_colors = c("green4", "purple"), line_types = c("solid", "solid")) # remove axis titles plot_group_means( data = mtcars, dv_name = "mpg", iv_name = c("vs", "am"), x_axis_title = FALSE, y_axis_title = FALSE, legend_title = FALSE)
plot_group_means(data = mtcars, dv_name = "mpg", iv_name = c("vs", "am")) plot_group_means( data = mtcars, dv_name = "mpg", iv_name = c("vs", "am"), error_bar = "se" ) plot_group_means( data = mtcars, dv_name = "mpg", iv_name = c("vs", "am"), error_bar = "pi", error_bar_range = 0.99 ) # set line colors and types manually plot_group_means( data = mtcars, dv_name = "mpg", iv_name = c("vs", "am"), line_colors = c("green4", "purple"), line_types = c("solid", "solid")) # remove axis titles plot_group_means( data = mtcars, dv_name = "mpg", iv_name = c("vs", "am"), x_axis_title = FALSE, y_axis_title = FALSE, legend_title = FALSE)
Combines the base functions paste0 and message
pm(..., collapse = NULL)
pm(..., collapse = NULL)
... |
one or more R objects, to be converted to character vectors. Input(s) to this argument will be passed onto the paste0 function. |
collapse |
an optional character string to separate the results.
Not |
there will be no output from this function. Rather, a message will be generated from the arguments.
pm("hello", 123) pm(c("hello", 123), collapse = ", ")
pm("hello", 123) pm(c("hello", 123), collapse = ", ")
Calculates the population variance, rather than the sample variance, of a vector
population_variance(vector, na.rm = TRUE)
population_variance(vector, na.rm = TRUE)
vector |
a numeric vector |
na.rm |
if |
population_variance(1:4) var(1:4)
population_variance(1:4) var(1:4)
Installs, loads, and attaches package(s). If package(s) are not installed, installs them prior to loading and attaching.
prep( ..., pkg_names_as_object = FALSE, silent_if_successful = FALSE, silent_load_pkgs = NULL )
prep( ..., pkg_names_as_object = FALSE, silent_if_successful = FALSE, silent_load_pkgs = NULL )
... |
names of packages to load and attach, separated by commas,
e.g., |
pkg_names_as_object |
logical. If |
silent_if_successful |
logical. If |
silent_load_pkgs |
a character vector indicating names of
packages to load silently (i.e., suppress messages that get printed
when loading the packaged). By default, |
there will be no output from this function. Rather, packages given as inputs to the function will be installed, loaded, and attached.
prep(data.table) prep("data.table", silent_if_successful = TRUE) prep("base", utils, ggplot2, "data.table") pkgs <- c("ggplot2", "data.table") prep(pkgs, pkg_names_as_object = TRUE) prep("data.table", silent_load_pkgs = "data.table")
prep(data.table) prep("data.table", silent_if_successful = TRUE) prep("base", utils, ggplot2, "data.table") pkgs <- c("ggplot2", "data.table") prep(pkgs, pkg_names_as_object = TRUE) prep("data.table", silent_load_pkgs = "data.table")
Round p-values to the desired number of decimals and remove leading 0s before the decimal.
pretty_round_p_value( p_value_vector = NULL, round_digits_after_decimal = 3, include_p_equals = FALSE )
pretty_round_p_value( p_value_vector = NULL, round_digits_after_decimal = 3, include_p_equals = FALSE )
p_value_vector |
one number or a numeric vector |
round_digits_after_decimal |
how many digits after the decimal point should the p-value be rounded to? |
include_p_equals |
if |
the output will be a character vector with p values, e.g., a vector of strings like "< .001" (or "p < .001").
pretty_round_p_value(0.00001) pretty_round_p_value(0.00001, round_digits_after_decimal = 4) pretty_round_p_value(0.00001, round_digits_after_decimal = 5) # WARNING: the line of code below adding precision that may be unwarranted pretty_round_p_value(0.00001, round_digits_after_decimal = 6) pretty_round_p_value( p_value_vector = 0.049, round_digits_after_decimal = 2, include_p_equals = FALSE) pretty_round_p_value(c(0.0015, 0.0014, 0.0009), include_p_equals = TRUE)
pretty_round_p_value(0.00001) pretty_round_p_value(0.00001, round_digits_after_decimal = 4) pretty_round_p_value(0.00001, round_digits_after_decimal = 5) # WARNING: the line of code below adding precision that may be unwarranted pretty_round_p_value(0.00001, round_digits_after_decimal = 6) pretty_round_p_value( p_value_vector = 0.049, round_digits_after_decimal = 2, include_p_equals = FALSE) pretty_round_p_value(c(0.0015, 0.0014, 0.0009), include_p_equals = TRUE)
Round correlation coefficients in APA style (7th Ed.)
pretty_round_r(r = NULL, round_digits_after_decimal = 2)
pretty_round_r(r = NULL, round_digits_after_decimal = 2)
r |
a (vector of) correlation coefficient(s) |
round_digits_after_decimal |
how many digits after the decimal point should the p-value be rounded to? (default = 2) |
the output will be a character vector of correlation coefficient(s).
pretty_round_r(r = -0.123) pretty_round_r(c(-0.12345, 0.45678), round_digits_after_decimal = 3) pretty_round_r(c(-0.12, 0.45), round_digits_after_decimal = 4)
pretty_round_r(r = -0.123) pretty_round_r(c(-0.12345, 0.45678), round_digits_after_decimal = 3) pretty_round_r(c(-0.12, 0.45), round_digits_after_decimal = 4)
Print current progress inside a loop (e.g., for loop or lapply)
print_loop_progress( iteration_number = NULL, iteration_start = 1, iteration_end = NULL, text_before = "", percent = 1, output_method = "cat" )
print_loop_progress( iteration_number = NULL, iteration_start = 1, iteration_end = NULL, text_before = "", percent = 1, output_method = "cat" )
iteration_number |
current number of iteration |
iteration_start |
iteration number at which the loop begins (default = 1) |
iteration_end |
iteration number at which the loop ends. |
text_before |
text to add before "Loop Progress..."
By default, it is set to be blank, i.e., |
percent |
if |
output_method |
if |
for (i in seq_len(250)) { Sys.sleep(0.001) print_loop_progress( iteration_number = i, iteration_end = 250) } unlist(lapply(seq_len(7), function (i) { Sys.sleep(0.1) print_loop_progress( iteration_number = i, iteration_end = 7) return(i) }))
for (i in seq_len(250)) { Sys.sleep(0.001) print_loop_progress( iteration_number = i, iteration_end = 250) } unlist(lapply(seq_len(7), function (i) { Sys.sleep(0.1) print_loop_progress( iteration_number = i, iteration_end = 7) return(i) }))
Proportion of given values in a vector
proportion_of_values_in_vector( values = NULL, vector = NULL, na.exclude = TRUE, output_type = "proportion", silent = FALSE, conf.level = 0.95, correct_yates = TRUE )
proportion_of_values_in_vector( values = NULL, vector = NULL, na.exclude = TRUE, output_type = "proportion", silent = FALSE, conf.level = 0.95, correct_yates = TRUE )
values |
a set of values that will count as successes (hits) |
vector |
a numeric or character vector containing successes (hits) and failures (misses) |
na.exclude |
if |
output_type |
By default, |
silent |
If |
conf.level |
confidence level of the returned confidence interval.
Input to this argument will be passed onto the conf.level argument
in the |
correct_yates |
a logical indicating whether Yates' continuity
correction should be applied where possible (default = TRUE).
Input to this argument will be passed onto the |
proportion_of_values_in_vector( values = 2:3, vector = c(rep(1:3, each = 10), rep(NA, 10)) ) proportion_of_values_in_vector( values = 2:3, vector = c(rep(1:3, each = 10), rep(NA, 10)), output_type = "se" ) proportion_of_values_in_vector( values = 2:3, vector = c(rep(1:3, each = 10), rep(NA, 10)), conf.level = 0.99 ) proportion_of_values_in_vector( values = c(2:3, NA), vector = c(rep(1:3, each = 10), rep(NA, 10)), na.exclude = FALSE )
proportion_of_values_in_vector( values = 2:3, vector = c(rep(1:3, each = 10), rep(NA, 10)) ) proportion_of_values_in_vector( values = 2:3, vector = c(rep(1:3, each = 10), rep(NA, 10)), output_type = "se" ) proportion_of_values_in_vector( values = 2:3, vector = c(rep(1:3, each = 10), rep(NA, 10)), conf.level = 0.99 ) proportion_of_values_in_vector( values = c(2:3, NA), vector = c(rep(1:3, each = 10), rep(NA, 10)), na.exclude = FALSE )
Calculate the Q statistic to test for homogeneity of correlation coefficients. See p. 235 of the book Hedges & Olkin (1985), Statistical Methods for Meta-Analysis (ISBN: 0123363802).
q_stat_test_homo_r(z = NULL, n = NULL)
q_stat_test_homo_r(z = NULL, n = NULL)
z |
a vector of z values |
n |
a vector of sample sizes which will be used to calculate the weights, which in turn will be used to calculate the weighted z. |
the output will be a weighted z value.
q_stat_test_homo_r(1:3, c(100, 200, 300)) q_stat_test_homo_r(z = c(1:3, NA), n = c(100, 200, 300, NA))
q_stat_test_homo_r(1:3, c(100, 200, 300)) q_stat_test_homo_r(z = c(1:3, NA), n = c(100, 200, 300, NA))
Read a csv file
read_csv(name = NULL, head = FALSE, dirname = NULL, ...)
read_csv(name = NULL, head = FALSE, dirname = NULL, ...)
name |
a character string of the csv file name without the
".csv" extension. For example, if the csv file to read is "myfile.csv",
enter |
head |
logical. if |
dirname |
a character string of the directory containing
the csv file, e.g., |
... |
optional arguments for the |
the output will be a data.table object, that is,
an output from the data.table function, fread
## Not run: mydata <- read_csv("myfile") ## End(Not run)
## Not run: mydata <- read_csv("myfile") ## End(Not run)
Read the sole csv file in the working directory
read_sole_csv(head = FALSE, ...)
read_sole_csv(head = FALSE, ...)
head |
logical. if |
... |
optional arguments for the |
the output will be a data.table object, that is,
an output from the data.table function, fread
mydata <- read_sole_csv() mydata <- read_sole_csv(head = TRUE) mydata <- read_sole_csv(fill = TRUE, nrows = 5)
mydata <- read_sole_csv() mydata <- read_sole_csv(head = TRUE) mydata <- read_sole_csv(fill = TRUE, nrows = 5)
Returns elements of a character vector that match the given regular expression
regex_match(regex = NULL, vector = NULL, silent = FALSE, perl = FALSE)
regex_match(regex = NULL, vector = NULL, silent = FALSE, perl = FALSE)
regex |
a regular expression provided, a default theme will be used. |
vector |
a character vector in which to search for regular expression matches, or a data table whose column names will be searched |
silent |
logical. If |
perl |
logical. Should Perl-compatible regexps be used? |
regex_match("p$", names(mtcars)) colnames_ending_with_p <- regex_match("p$", names(mtcars))
regex_match("p$", names(mtcars)) colnames_ending_with_p <- regex_match("p$", names(mtcars))
Find relative position of a value in a vector that may or may not contain the value
rel_pos_of_value_in_vector(value = NULL, vector = NULL)
rel_pos_of_value_in_vector(value = NULL, vector = NULL)
value |
a value whose relative position is to be searched in a vector |
vector |
a numeric vector |
a number indicating the relative position of the value in the vector
rel_pos_of_value_in_vector(value = 3, vector = c(2, 4)) rel_pos_of_value_in_vector(value = 3, vector = c(2, 6)) rel_pos_of_value_in_vector(value = 3, vector = 1:3)
rel_pos_of_value_in_vector(value = 3, vector = c(2, 4)) rel_pos_of_value_in_vector(value = 3, vector = c(2, 6)) rel_pos_of_value_in_vector(value = 3, vector = 1:3)
Find relative value of a position in a vector
rel_value_of_pos_in_vector(vector = NULL, position = NULL)
rel_value_of_pos_in_vector(vector = NULL, position = NULL)
vector |
a numeric vector |
position |
position of a vector |
a number indicating the relative value of the position in the vector
rel_value_of_pos_in_vector(vector = c(0, 100), position = 1.5) rel_value_of_pos_in_vector(vector = 2:4, position = 2) rel_value_of_pos_in_vector(vector = c(2, 4, 6), position = 2.5)
rel_value_of_pos_in_vector(vector = c(0, 100), position = 1.5) rel_value_of_pos_in_vector(vector = 2:4, position = 2) rel_value_of_pos_in_vector(vector = c(2, 4, 6), position = 2.5)
Remove certain values from a vector
remove_from_vector(values = NULL, vector = NULL, silent = FALSE)
remove_from_vector(values = NULL, vector = NULL, silent = FALSE)
values |
a single value or a vector of values which will be removed from the target vector |
vector |
a character or numeric vector |
silent |
if |
the output will be a vector with the given values removed.
remove_from_vector(values = 1, vector = 1:3) remove_from_vector(values = NA, vector = c(1:3, NA)) remove_from_vector(values = c(1, NA), vector = c(1:3, NA)) remove_from_vector(values = 1:5, vector = 1:10)
remove_from_vector(values = 1, vector = 1:3) remove_from_vector(values = NA, vector = c(1:3, NA)) remove_from_vector(values = c(1, NA), vector = c(1:3, NA)) remove_from_vector(values = 1:5, vector = 1:10)
Remove all user installed packages
remove_user_installed_pkgs( exceptions = NULL, type_of_pkg_to_keep = c("base", "recommended"), keep_kim = FALSE )
remove_user_installed_pkgs( exceptions = NULL, type_of_pkg_to_keep = c("base", "recommended"), keep_kim = FALSE )
exceptions |
a character vector of names of packages to keep |
type_of_pkg_to_keep |
a character vector indicating types
of packages to keep. The default,
|
keep_kim |
logical. If |
## Not run: remove_user_installed_pkgs() ## End(Not run)
## Not run: remove_user_installed_pkgs() ## End(Not run)
Conduct a repeated-measures analysis of variance (ANOVA). This analysis will be appropriate for within-subjects experimental design.
repeated_measures_anova( data = NULL, p_col_name = NULL, measure_vars = NULL, histograms = TRUE, round_w = 2, round_epsilon = 2, round_df_model = 2, round_df_error = 2, round_f = 2, round_ges = 2 )
repeated_measures_anova( data = NULL, p_col_name = NULL, measure_vars = NULL, histograms = TRUE, round_w = 2, round_epsilon = 2, round_df_model = 2, round_df_error = 2, round_f = 2, round_ges = 2 )
data |
a data object (a data frame or a data.table) |
p_col_name |
name of the column identifying participants |
measure_vars |
names of the columns containing repeated measures (within-subjects variables) |
histograms |
logical. If |
round_w |
number of decimal places to which to round W statistic from Mauchly's test (default = 2) |
round_epsilon |
number of decimal places to which to round the epsilon statistic from Greenhouse-Geisser or Huynh-Feldt correction (default = 2) |
round_df_model |
number of decimal places to which to round the corrected degrees of freedom for model (default = 2) |
round_df_error |
number of decimal places to which to round the corrected degrees of freedom for error (default = 2) |
round_f |
number of decimal places to which to round the F statistic (default = 2) |
round_ges |
number of decimal places to which to round generalized eta-squared (default = 2) |
The following package(s) must be installed prior to running the function: Package 'ez' v4.4-0 (or possibly a higher version) by Michael A Lawrence (2016), https://cran.r-project.org/package=ez
## Not run: repeated_measures_anova( data = mtcars, p_col_name = "cyl", measure_vars = c("wt", "qsec")) ## End(Not run)
## Not run: repeated_measures_anova( data = mtcars, p_col_name = "cyl", measure_vars = c("wt", "qsec")) ## End(Not run)
Replace values in a data.table
replace_values_in_dt( data = NULL, old_values = NULL, new_values = NULL, silent = FALSE )
replace_values_in_dt( data = NULL, old_values = NULL, new_values = NULL, silent = FALSE )
data |
a data object (a data frame or a data.table) |
old_values |
a vector of old values that need to be replaced |
new_values |
a new value or a vector of new values that will replace the old values |
silent |
If |
replace_values_in_dt(data = mtcars, old_values = 21.0, new_values = 888) replace_values_in_dt(data = mtcars, old_values = c(0, 1), new_values = 999) replace_values_in_dt( data = mtcars, old_values = c(0, 1), new_values = 990:991) replace_values_in_dt( data = data.table::data.table(a = NA_character_, b = NA_character_), old_values = NA, new_values = "")
replace_values_in_dt(data = mtcars, old_values = 21.0, new_values = 888) replace_values_in_dt(data = mtcars, old_values = c(0, 1), new_values = 999) replace_values_in_dt( data = mtcars, old_values = c(0, 1), new_values = 990:991) replace_values_in_dt( data = data.table::data.table(a = NA_character_, b = NA_character_), old_values = NA, new_values = "")
Estimate coefficients in a multiple regression model by bootstrapping.
robust_regression( data = NULL, formula = NULL, sigfigs = NULL, round_digits_after_decimal = NULL, iterations = 1000 )
robust_regression( data = NULL, formula = NULL, sigfigs = NULL, round_digits_after_decimal = NULL, iterations = 1000 )
data |
a data object (a data frame or a data.table) |
formula |
a formula object for the regression equation |
sigfigs |
number of significant digits to round to |
round_digits_after_decimal |
round to nth digit after decimal
(alternative to |
iterations |
number of bootstrap samples. The default is set at 1000, but consider increasing the number of samples to 5000, 10000, or an even larger number, if slower handling time is not an issue. |
The following package(s) must be installed prior to running this function: Package 'boot' v1.3-26 (or possibly a higher version) by Canty & Ripley (2021), https://cran.r-project.org/package=boot
## Not run: robust_regression( data = mtcars, formula = mpg ~ cyl * hp, iterations = 100 ) ## End(Not run)
## Not run: robust_regression( data = mtcars, formula = mpg ~ cyl * hp, iterations = 100 ) ## End(Not run)
Round numbers to a flexible number of significant digits. "Flexible" rounding refers to rounding all numbers to the highest level of precision seen among numbers that would have resulted from the 'signif()' function in base R. The usage examples of this function demonstrate flexible rounding (see below).
round_flexibly(x = NULL, sigfigs = 3)
round_flexibly(x = NULL, sigfigs = 3)
x |
a numeric vector |
sigfigs |
number of significant digits to flexibly round to.
By default, |
the output will be a numeric vector with values rounded to the highest level of precision seen among numbers that result from the 'signif()' function in base R.
# Example 1 # First, observe results from the 'signif' function: c(0.00012345, pi) signif(c(0.00012345, pi), 3) # In the result above, notice how info is lost on some digits # (e.g., 3.14159265 becomes 3.140000). # In contrast, flexible rounding retains the lost info in the digits round_flexibly(x = c(0.00012345, pi), sigfigs = 3) # Example 2 # Again, first observe results from the 'signif' function: c(0.12345, 1234, 0.12, 1.23, .01) signif(c(0.12345, 1234, 0.12, 1.23, .01), 3) # In the result above, notice how info is lost on some digits # (e.g., 1234 becomes 1230.000). # In contrast, flexible rounding retains the lost info in the digits. # Specifically, in the example below, 0.12345 rounded to 3 significant # digits (default) is signif(0.12345, 3) = 0.123 (3 decimal places). # Because this 3 decimal places is the highest precision seen among # all numbers, all other numbers will also be rounded to 3 decimal places. round_flexibly( c(0.12345, 1234, 0.12, 1.23, .01)) # Example 3 # If the input is a character vector, the original input will be returned. round_flexibly(c("a", "b", "c")) # Example 4 # If the input is a list (e.g., a data.frame) that contains at least # one numeric vector, the numeric vector element(s) will be rounded # flexibly. round_flexibly(data.frame(a = c(1.2345, 123.45), b = c("a", "b"))) # Example 5 # If the input is a matrix, all numbers will be rounded flexibly round_flexibly(matrix( c(1.23, 2.345, 3.4567, 4.56789), ncol = 2), sigfigs = 3)
# Example 1 # First, observe results from the 'signif' function: c(0.00012345, pi) signif(c(0.00012345, pi), 3) # In the result above, notice how info is lost on some digits # (e.g., 3.14159265 becomes 3.140000). # In contrast, flexible rounding retains the lost info in the digits round_flexibly(x = c(0.00012345, pi), sigfigs = 3) # Example 2 # Again, first observe results from the 'signif' function: c(0.12345, 1234, 0.12, 1.23, .01) signif(c(0.12345, 1234, 0.12, 1.23, .01), 3) # In the result above, notice how info is lost on some digits # (e.g., 1234 becomes 1230.000). # In contrast, flexible rounding retains the lost info in the digits. # Specifically, in the example below, 0.12345 rounded to 3 significant # digits (default) is signif(0.12345, 3) = 0.123 (3 decimal places). # Because this 3 decimal places is the highest precision seen among # all numbers, all other numbers will also be rounded to 3 decimal places. round_flexibly( c(0.12345, 1234, 0.12, 1.23, .01)) # Example 3 # If the input is a character vector, the original input will be returned. round_flexibly(c("a", "b", "c")) # Example 4 # If the input is a list (e.g., a data.frame) that contains at least # one numeric vector, the numeric vector element(s) will be rounded # flexibly. round_flexibly(data.frame(a = c(1.2345, 123.45), b = c("a", "b"))) # Example 5 # If the input is a matrix, all numbers will be rounded flexibly round_flexibly(matrix( c(1.23, 2.345, 3.4567, 4.56789), ncol = 2), sigfigs = 3)
Creates a scatter plot and calculates a correlation between two variables.
scatterplot( data = NULL, x_var_name = NULL, y_var_name = NULL, dot_label_var_name = NULL, weight_var_name = NULL, alpha = 1, annotate_stats = TRUE, annotate_y_pos_rel = 5, annotate_y_pos_abs = NULL, annotated_stats_color = "green4", annotated_stats_font_size = 6, annotated_stats_font_face = "bold", line_of_fit_type = "lm", ci_for_line_of_fit = FALSE, line_of_fit_color = "blue", line_of_fit_thickness = 1, dot_color = "black", x_axis_label = NULL, y_axis_label = NULL, x_axis_tick_marks = NULL, y_axis_tick_marks = NULL, dot_size = 2, dot_label_size = NULL, dot_size_range = c(3, 12), jitter_x_y_percent = 0, jitter_x_percent = 0, jitter_y_percent = 0, cap_axis_lines = TRUE, color_dots_by = NULL, png_name = NULL, save_as_png = FALSE, width = 13, height = 9 )
scatterplot( data = NULL, x_var_name = NULL, y_var_name = NULL, dot_label_var_name = NULL, weight_var_name = NULL, alpha = 1, annotate_stats = TRUE, annotate_y_pos_rel = 5, annotate_y_pos_abs = NULL, annotated_stats_color = "green4", annotated_stats_font_size = 6, annotated_stats_font_face = "bold", line_of_fit_type = "lm", ci_for_line_of_fit = FALSE, line_of_fit_color = "blue", line_of_fit_thickness = 1, dot_color = "black", x_axis_label = NULL, y_axis_label = NULL, x_axis_tick_marks = NULL, y_axis_tick_marks = NULL, dot_size = 2, dot_label_size = NULL, dot_size_range = c(3, 12), jitter_x_y_percent = 0, jitter_x_percent = 0, jitter_y_percent = 0, cap_axis_lines = TRUE, color_dots_by = NULL, png_name = NULL, save_as_png = FALSE, width = 13, height = 9 )
data |
a data object (a data frame or a data.table) |
x_var_name |
name of the variable that will go on the x axis |
y_var_name |
name of the variable that will go on the y axis |
dot_label_var_name |
name of the variable that will be used to label individual observations |
weight_var_name |
name of the variable by which to weight the individual observations for calculating correlation and plotting the line of fit |
alpha |
opacity of the dots (0 = completely transparent, 1 = completely opaque) |
annotate_stats |
if |
annotate_y_pos_rel |
position of the annotated stats, expressed
as a percentage of the range of y values by which the annotated
stats will be placed above the maximum value of y in the data set
(default = 5). This value will be determined relative to the data.
If |
annotate_y_pos_abs |
as an alternative to the argument
|
annotated_stats_color |
color of the annotated stats (default = "green4"). |
annotated_stats_font_size |
font size of the annotated stats (default = 6). |
annotated_stats_font_face |
font face of the annotated stats (default = "bold"). |
line_of_fit_type |
if |
ci_for_line_of_fit |
if |
line_of_fit_color |
color of the line of fit (default = "blue") |
line_of_fit_thickness |
thickness of the line of fit (default = 1) |
dot_color |
color of the dots (default = "black") |
x_axis_label |
alternative label for the x axis |
y_axis_label |
alternative label for the y axis |
x_axis_tick_marks |
a numeric vector indicating the positions of the tick marks on the x axis |
y_axis_tick_marks |
a numeric vector indicating the positions of the tick marks on the y axis |
dot_size |
size of the dots on the plot (default = 2) |
dot_label_size |
size for dots' labels on the plot. If no
input is entered for this argument, it will be set as
|
dot_size_range |
minimum and maximum size for dots on the plot when they are weighted |
jitter_x_y_percent |
horizontally and vertically jitter dots by a percentage of the respective ranges of x and y values. |
jitter_x_percent |
horizontally jitter dots by a percentage of the range of x values. |
jitter_y_percent |
vertically jitter dots by a percentage of the range of y values |
cap_axis_lines |
logical. Should the axis lines be capped at the outer tick marks? (default = TRUE) |
color_dots_by |
name of the variable that will determine colors of the dots |
png_name |
name of the PNG file to be saved. By default, the name will be "scatterplot_" followed by a timestamp of the current time. The timestamp will be in the format, jan_01_2021_1300_10_000001, where "jan_01_2021" would indicate January 01, 2021; 1300 would indicate 13:00 (i.e., 1 PM); and 10_000001 would indicate 10.000001 seconds after the hour. |
save_as_png |
if |
width |
width of the plot to be saved. This argument will be
directly entered as the |
height |
height of the plot to be saved. This argument will be
directly entered as the |
If a weighted correlation is to be calculated, the following package(s) must be installed prior to running the function: Package 'weights' v1.0 (or possibly a higher version) by John Pasek (2018), https://cran.r-project.org/package=weights
the output will be a scatter plot, a ggplot object.
## Not run: scatterplot(data = mtcars, x_var_name = "wt", y_var_name = "mpg") scatterplot( data = mtcars, x_var_name = "wt", y_var_name = "mpg", dot_label_var_name = "hp", weight_var_name = "drat", annotate_stats = TRUE) scatterplot( data = mtcars, x_var_name = "wt", y_var_name = "mpg", dot_label_var_name = "hp", weight_var_name = "cyl", dot_label_size = 7, annotate_stats = TRUE) scatterplot( data = mtcars, x_var_name = "wt", y_var_name = "mpg", color_dots_by = "gear") ## End(Not run)
## Not run: scatterplot(data = mtcars, x_var_name = "wt", y_var_name = "mpg") scatterplot( data = mtcars, x_var_name = "wt", y_var_name = "mpg", dot_label_var_name = "hp", weight_var_name = "drat", annotate_stats = TRUE) scatterplot( data = mtcars, x_var_name = "wt", y_var_name = "mpg", dot_label_var_name = "hp", weight_var_name = "cyl", dot_label_size = 7, annotate_stats = TRUE) scatterplot( data = mtcars, x_var_name = "wt", y_var_name = "mpg", color_dots_by = "gear") ## End(Not run)
Score items in a scale (e.g., Likert scale items) by computing the sum or mean of the items.
score_scale_items( item_list = NULL, reverse_item_list = NULL, operation = "mean", na.rm = FALSE, na_summary = TRUE, reverse_code_minuend = NULL )
score_scale_items( item_list = NULL, reverse_item_list = NULL, operation = "mean", na.rm = FALSE, na_summary = TRUE, reverse_code_minuend = NULL )
item_list |
a list of scale items (i.e., list of vectors of ratings) to code normally (as opposed to reverse coding). |
reverse_item_list |
a list of scale items to reverse code. |
operation |
if |
na.rm |
logical. The |
na_summary |
logical. If |
reverse_code_minuend |
required for reverse coding; the number
from which to subtract item ratings when reverse-coding. For example,
if the items to reverse code are measured on a 7-point scale, enter
|
score_scale_items(item_list = list(1:5, rep(3, 5)), reverse_item_list = list(rep(5, 5)), reverse_code_minuend = 6) score_scale_items(item_list = list(c(1, 1), c(1, 5)), reverse_item_list = list(c(5, 3)), reverse_code_minuend = 6, na_summary = FALSE) score_scale_items(item_list = list(c(1, 1), c(1, 5)), reverse_item_list = list(c(5, 1)), reverse_code_minuend = 6, operation = "sum") score_scale_items(item_list = list(1:5, rep(3, 5))) score_scale_items(item_list = list(c(1, NA, 3), c(NA, 2, 3))) score_scale_items(item_list = list(c(1, NA, 3), c(NA, 2, 3)), na.rm = TRUE)
score_scale_items(item_list = list(1:5, rep(3, 5)), reverse_item_list = list(rep(5, 5)), reverse_code_minuend = 6) score_scale_items(item_list = list(c(1, 1), c(1, 5)), reverse_item_list = list(c(5, 3)), reverse_code_minuend = 6, na_summary = FALSE) score_scale_items(item_list = list(c(1, 1), c(1, 5)), reverse_item_list = list(c(5, 1)), reverse_code_minuend = 6, operation = "sum") score_scale_items(item_list = list(1:5, rep(3, 5))) score_scale_items(item_list = list(c(1, NA, 3), c(NA, 2, 3))) score_scale_items(item_list = list(c(1, NA, 3), c(NA, 2, 3)), na.rm = TRUE)
Standard error of the mean
se_of_mean(vector, na.rm = TRUE, notify_na_count = NULL)
se_of_mean(vector, na.rm = TRUE, notify_na_count = NULL)
vector |
a numeric vector |
na.rm |
Deprecated. By default, NA values will be removed before calculation |
notify_na_count |
if |
the output will be a numeric vector of length one, which will be the standard error of the mean for the given numeric vector.
se_of_mean(c(1:10, NA))
se_of_mean(c(1:10, NA))
Calculate the standard error of a percentage. See Fowler, Jr. (2014, p. 34, ISBN: 978-1-4833-1240-8)
se_of_percentage(percent = NULL, n = NULL)
se_of_percentage(percent = NULL, n = NULL)
percent |
a vector of percentages; each of the percentage values must be between 0 and 100 |
n |
a vector of sample sizes; number of observations used to calculate each of the percentage values |
se_of_percentage(percent = 40, n = 50) se_of_percentage(percent = 50, n = 10)
se_of_percentage(percent = 40, n = 50) se_of_percentage(percent = 50, n = 10)
Calculate the standard error of a proportion. See Anderson and Finn (1996, p. 364, ISBN: 978-1-4612-8466-6)
se_of_proportion(p = NULL, n = NULL)
se_of_proportion(p = NULL, n = NULL)
p |
a vector of proportions; each of the proportion values must be between 0 and 1 |
n |
a vector of sample sizes; number of observations used to calculate each of the percentage values |
se_of_proportion(p = 0.56, n = 400) se_of_proportion(p = 0.5, n = 10)
se_of_proportion(p = 0.56, n = 400) se_of_proportion(p = 0.5, n = 10)
Set up R environment by (1) clearing the console; (2) removing all objects in the global environment; (3) setting the working directory to the active document (in RStudio only); (4) unloading and loading the kim package.
setup_r_env( clear_console = TRUE, clear_global_env = TRUE, setwd_to_active_doc = TRUE, prep_kim = TRUE )
setup_r_env( clear_console = TRUE, clear_global_env = TRUE, setwd_to_active_doc = TRUE, prep_kim = TRUE )
clear_console |
if |
clear_global_env |
if |
setwd_to_active_doc |
if |
prep_kim |
if |
## Not run: setup_r_env() ## End(Not run)
## Not run: setup_r_env() ## End(Not run)
Set working directory to location of the active document in RStudio
setwd_to_active_doc()
setwd_to_active_doc()
there will be no output from this function. Rather, the working directory will be set as location of the active document.
## Not run: setwd_to_active_doc() ## End(Not run)
## Not run: setwd_to_active_doc() ## End(Not run)
Conduct a simple effects analysis to probe a two-way interaction effect. See Field et al. (2012, ISBN: 978-1-4462-0045-2).
simple_effects_analysis( data = NULL, dv_name = NULL, iv_1_name = NULL, iv_2_name = NULL, iv_1_levels = NULL, iv_2_levels = NULL, print_contrast_table = "weights_sums_and_products", output = NULL )
simple_effects_analysis( data = NULL, dv_name = NULL, iv_1_name = NULL, iv_2_name = NULL, iv_1_levels = NULL, iv_2_levels = NULL, print_contrast_table = "weights_sums_and_products", output = NULL )
data |
a data object (a data frame or a data.table) |
dv_name |
name of the dependent variable (DV) |
iv_1_name |
name of the first independent variable (IV1), whose main effects will be examined in the first set of contrasts |
iv_2_name |
name of the second independent variable (IV2), whose simple effects at each level of IV1 will be examined in the second set of contrasts |
iv_1_levels |
ordered levels of IV1 |
iv_2_levels |
ordered levels of IV2 |
print_contrast_table |
If
|
output |
output can be one of the following: |
By default, the function will print a table of contrasts and a table of simple effects.
factorial_anova_2_way( data = mtcars, dv_name = "mpg", iv_1_name = "vs", iv_2_name = "am", iterations = 100, plot = TRUE) simple_effects_analysis( data = mtcars, dv_name = "mpg", iv_1_name = "vs", iv_2_name = "am")
factorial_anova_2_way( data = mtcars, dv_name = "mpg", iv_1_name = "vs", iv_2_name = "am", iterations = 100, plot = TRUE) simple_effects_analysis( data = mtcars, dv_name = "mpg", iv_1_name = "vs", iv_2_name = "am")
Conduct a simple slopes analysis, typically to probe a two-way interaction.
simple_slopes_analysis( data = NULL, iv_name = NULL, dv_name = NULL, mod_name = NULL, round_focal_value = 2, round_b = 2, round_se = 2, round_t = 2, round_p = 3, focal_values = NULL )
simple_slopes_analysis( data = NULL, iv_name = NULL, dv_name = NULL, mod_name = NULL, round_focal_value = 2, round_b = 2, round_se = 2, round_t = 2, round_p = 3, focal_values = NULL )
data |
a data object (a data frame or a data.table) |
iv_name |
name of the independent variable (IV) |
dv_name |
name of the dependent variable (DV) |
mod_name |
name of the moderator variable (MOD) |
round_focal_value |
number of decimal places to which to round the focal values (default = 2) |
round_b |
number of decimal places to which to round coefficients from the regression analysis (default = 2) |
round_se |
number of decimal places to which to round standard error values from the regression analysis (default = 2) |
round_t |
number of decimal places to which to round t statistics from the regression analysis (default = 2) |
round_p |
number of decimal places to which to round p values from the regression analysis (default = 2) |
focal_values |
this input will be used only in cases where moderator is continuous. In such cases, what are the focal values of the moderator at which to estimate the effect of IV on DV? By default, values corresponding to the mean of MOD, and mean of MOD +/-1 SD will be used. |
simple_slopes_analysis( data = mtcars, iv_name = "vs", dv_name = "mpg", mod_name = "am") simple_slopes_analysis( data = mtcars, iv_name = "vs", dv_name = "mpg", mod_name = "hp") simple_slopes_analysis( data = mtcars, iv_name = "disp", dv_name = "mpg", mod_name = "hp") simple_slopes_analysis( data = mtcars, iv_name = "vs", dv_name = "am", mod_name = "hp") simple_slopes_analysis( data = mtcars, iv_name = "disp", dv_name = "am", mod_name = "hp")
simple_slopes_analysis( data = mtcars, iv_name = "vs", dv_name = "mpg", mod_name = "am") simple_slopes_analysis( data = mtcars, iv_name = "vs", dv_name = "mpg", mod_name = "hp") simple_slopes_analysis( data = mtcars, iv_name = "disp", dv_name = "mpg", mod_name = "hp") simple_slopes_analysis( data = mtcars, iv_name = "vs", dv_name = "am", mod_name = "hp") simple_slopes_analysis( data = mtcars, iv_name = "disp", dv_name = "am", mod_name = "hp")
Conduct a simple slopes analysis with logistic regression analyses, typically to probe a two-way interaction when the dependent variable is binary.
simple_slopes_analysis_logistic( data = NULL, iv_name = NULL, dv_name = NULL, mod_name = NULL, round_b = 2, round_se = 2, round_z = 2, round_p = 3, focal_values = NULL )
simple_slopes_analysis_logistic( data = NULL, iv_name = NULL, dv_name = NULL, mod_name = NULL, round_b = 2, round_se = 2, round_z = 2, round_p = 3, focal_values = NULL )
data |
a data object (a data frame or a data.table) |
iv_name |
name of the independent variable (IV) |
dv_name |
name of the dependent variable (DV) |
mod_name |
name of the moderator variable (MOD) |
round_b |
number of decimal places to which to round coefficients from the regression analysis (default = 2) |
round_se |
number of decimal places to which to round standard error values from the regression analysis (default = 2) |
round_z |
number of decimal places to which to round t statistics from the regression analysis (default = 2) |
round_p |
number of decimal places to which to round p values from the regression analysis (default = 2) |
focal_values |
this input will be used only in cases where MOD is continuous. In such cases, what are the focal values of the MOD at which to estimate the effect of IV on DV? By default, values corresponding to the mean of MOD, and mean of MOD +/-1 SD will be used. |
simple_slopes_analysis_logistic( data = mtcars, iv_name = "vs", dv_name = "am", mod_name = "hp") simple_slopes_analysis_logistic( data = mtcars, iv_name = "disp", dv_name = "am", mod_name = "hp")
simple_slopes_analysis_logistic( data = mtcars, iv_name = "vs", dv_name = "am", mod_name = "hp") simple_slopes_analysis_logistic( data = mtcars, iv_name = "disp", dv_name = "am", mod_name = "hp")
Calculate skewness using one of three formulas: (1) the traditional Fisher-Pearson coefficient of skewness; (2) the adjusted Fisher-Pearson standardized moment coefficient; (3) the Pearson 2 skewness coefficient. Formulas were taken from Doane & Seward (2011), doi:10.1080/10691898.2011.11889611
skewness(vector = NULL, type = "adjusted")
skewness(vector = NULL, type = "adjusted")
vector |
a numeric vector |
type |
a character string indicating the type of skewness to
calculate. If |
a numeric value, i.e., skewness of the given vector
# calculate the adjusted Fisher-Pearson standardized moment coefficient kim::skewness(c(1, 2, 3, 4, 5, 10)) # calculate the traditional Fisher-Pearson coefficient of skewness kim::skewness(c(1, 2, 3, 4, 5, 10), type = "traditional") # compare with skewness from 'moments' package moments::skewness(c(1, 2, 3, 4, 5, 10)) # calculate the Pearson 2 skewness coefficient kim::skewness(c(1, 2, 3, 4, 5, 10), type = "pearson_2")
# calculate the adjusted Fisher-Pearson standardized moment coefficient kim::skewness(c(1, 2, 3, 4, 5, 10)) # calculate the traditional Fisher-Pearson coefficient of skewness kim::skewness(c(1, 2, 3, 4, 5, 10), type = "traditional") # compare with skewness from 'moments' package moments::skewness(c(1, 2, 3, 4, 5, 10)) # calculate the Pearson 2 skewness coefficient kim::skewness(c(1, 2, 3, 4, 5, 10), type = "pearson_2")
Conduct a spotlight analysis for a 2 x Continuous design. See Spiller et al. (2013) doi:10.1509/jmr.12.0420
spotlight_2_by_continuous( data = NULL, iv_name = NULL, dv_name = NULL, mod_name = NULL, logistic = NULL, covariate_name = NULL, focal_values = NULL, interaction_p_include = TRUE, iv_level_order = NULL, output_type = "plot", colors = c("red", "blue"), dot_size = 3, observed_dots = FALSE, reg_lines = FALSE, reg_line_width = 1, reg_line_size = 1, lines_connecting_est_dv = TRUE, lines_connecting_est_dv_width = 1, estimated_dv_dot_shape = 15, estimated_dv_dot_size = 6, error_bar = "ci", error_bar_range = 0.95, error_bar_tip_width = NULL, error_bar_tip_width_percent = 8, error_bar_thickness = 1, error_bar_offset = NULL, error_bar_offset_percent = 8, simp_eff_bracket_leg_ht = NULL, simp_eff_bracket_leg_ht_perc = 2, simp_eff_bracket_offset = NULL, simp_eff_bracket_offset_perc = 1, simp_eff_bracket_color = "black", simp_eff_bracket_line_width = 1, simp_eff_text_offset = NULL, simp_eff_text_offset_percent = 7, simp_eff_text_hjust = 0.5, simp_eff_text_part_1 = "Simple Effect\n", simp_eff_text_color = "black", simp_eff_font_size = 5, interaction_p_value_x = NULL, interaction_p_value_y = NULL, interaction_p_value_font_size = 6, interaction_p_value_vjust = -1, interaction_p_value_hjust = 0.5, x_axis_breaks = NULL, x_axis_limits = NULL, x_axis_tick_mark_labels = NULL, y_axis_breaks = NULL, y_axis_limits = NULL, x_axis_space_left_perc = 10, x_axis_space_right_perc = 30, y_axis_tick_mark_labels = NULL, x_axis_title = NULL, y_axis_title = NULL, legend_title = NULL, legend_position = "right", y_axis_title_vjust = 0.85, round_decimals_int_p_value = 3, jitter_x_percent = 0, jitter_y_percent = 0, dot_alpha = 0.2, reg_line_alpha = 0.5, jn_point_font_size = 6, reg_line_types = c("solid", "dashed"), caption = NULL, plot_margin = ggplot2::unit(c(60, 30, 7, 7), "pt"), silent = FALSE )
spotlight_2_by_continuous( data = NULL, iv_name = NULL, dv_name = NULL, mod_name = NULL, logistic = NULL, covariate_name = NULL, focal_values = NULL, interaction_p_include = TRUE, iv_level_order = NULL, output_type = "plot", colors = c("red", "blue"), dot_size = 3, observed_dots = FALSE, reg_lines = FALSE, reg_line_width = 1, reg_line_size = 1, lines_connecting_est_dv = TRUE, lines_connecting_est_dv_width = 1, estimated_dv_dot_shape = 15, estimated_dv_dot_size = 6, error_bar = "ci", error_bar_range = 0.95, error_bar_tip_width = NULL, error_bar_tip_width_percent = 8, error_bar_thickness = 1, error_bar_offset = NULL, error_bar_offset_percent = 8, simp_eff_bracket_leg_ht = NULL, simp_eff_bracket_leg_ht_perc = 2, simp_eff_bracket_offset = NULL, simp_eff_bracket_offset_perc = 1, simp_eff_bracket_color = "black", simp_eff_bracket_line_width = 1, simp_eff_text_offset = NULL, simp_eff_text_offset_percent = 7, simp_eff_text_hjust = 0.5, simp_eff_text_part_1 = "Simple Effect\n", simp_eff_text_color = "black", simp_eff_font_size = 5, interaction_p_value_x = NULL, interaction_p_value_y = NULL, interaction_p_value_font_size = 6, interaction_p_value_vjust = -1, interaction_p_value_hjust = 0.5, x_axis_breaks = NULL, x_axis_limits = NULL, x_axis_tick_mark_labels = NULL, y_axis_breaks = NULL, y_axis_limits = NULL, x_axis_space_left_perc = 10, x_axis_space_right_perc = 30, y_axis_tick_mark_labels = NULL, x_axis_title = NULL, y_axis_title = NULL, legend_title = NULL, legend_position = "right", y_axis_title_vjust = 0.85, round_decimals_int_p_value = 3, jitter_x_percent = 0, jitter_y_percent = 0, dot_alpha = 0.2, reg_line_alpha = 0.5, jn_point_font_size = 6, reg_line_types = c("solid", "dashed"), caption = NULL, plot_margin = ggplot2::unit(c(60, 30, 7, 7), "pt"), silent = FALSE )
data |
a data object (a data frame or a data.table) |
iv_name |
name of the binary independent variable (IV) |
dv_name |
name of the dependent variable (DV) |
mod_name |
name of the continuous moderator variable (MOD) |
logistic |
logical. Should logistic regressions be conducted, rather than ordinary least squares regressions? By default, ordinary least squares regressions will be conducted. |
covariate_name |
name(s) of the variable(s) to control for in estimating conditional values of the DV. |
focal_values |
focal values of the moderator variable at which to estimate IV's effect on DV. |
interaction_p_include |
logical. Should the plot include a p-value for the interaction term? |
iv_level_order |
order of levels in the independent
variable for legend. By default, it will be set as levels of the
independent variable ordered using R's base function |
output_type |
type of output (default = "plot"). Other possible values include "spotlight_results", "dt_for_plotting", "modified_dt" |
colors |
set colors for the two levels of the independent variable
By default, |
dot_size |
size of the observed_dots (default = 3) |
observed_dots |
logical. If |
reg_lines |
logical. If |
reg_line_width |
thickness of the regression lines (default = 1). |
reg_line_size |
deprecated. Use |
lines_connecting_est_dv |
logical. Should lines connecting the estimated values of DV be drawn? (default = TRUE) |
lines_connecting_est_dv_width |
thickness of the lines connecting the estimated values of DV (default = 1). |
estimated_dv_dot_shape |
ggplot value for shape of the dots at estimated values of DV (default = 15, a square shape). |
estimated_dv_dot_size |
size of the dots at estimated values of DV (default = 6). |
error_bar |
if |
error_bar_range |
width of the confidence interval
(default = 0.95 for a 95 percent confidence interval).
This argument will not apply when |
error_bar_tip_width |
graphically, width of the segments at the end of error bars (default = 0.13) |
error_bar_tip_width_percent |
(default) |
error_bar_thickness |
thickness of the error bars (default = 1) |
error_bar_offset |
(default) |
error_bar_offset_percent |
(default) |
simp_eff_bracket_leg_ht |
(default) |
simp_eff_bracket_leg_ht_perc |
(default) |
simp_eff_bracket_offset |
(default) |
simp_eff_bracket_offset_perc |
(default) |
simp_eff_bracket_color |
(default) |
simp_eff_bracket_line_width |
(default) |
simp_eff_text_offset |
(default) |
simp_eff_text_offset_percent |
(default) |
simp_eff_text_hjust |
(default) |
simp_eff_text_part_1 |
The first part of the text for
labeling simple effects.
By default, |
simp_eff_text_color |
color for the text indicating p-values of simple effects (default = "black"). |
simp_eff_font_size |
font size of the text indicating p-values of simple effects (default = 5). |
interaction_p_value_x |
(default) |
interaction_p_value_y |
(default) |
interaction_p_value_font_size |
font size for the interaction p value (default = 6) |
interaction_p_value_vjust |
(default) |
interaction_p_value_hjust |
(default) |
x_axis_breaks |
(default) |
x_axis_limits |
(default) |
x_axis_tick_mark_labels |
(default) |
y_axis_breaks |
(default) |
y_axis_limits |
(default) |
x_axis_space_left_perc |
(default) |
x_axis_space_right_perc |
(default) |
y_axis_tick_mark_labels |
(default) |
x_axis_title |
title of the x axis. By default, it will be set
as input for |
y_axis_title |
title of the y axis. By default, it will be set
as input for |
legend_title |
title of the legend. By default, it will be set
as input for |
legend_position |
position of the legend (default = "right").
If |
y_axis_title_vjust |
position of the y axis title (default = 0.85).
If default is used, |
round_decimals_int_p_value |
To how many digits after the decimal point should the p value for the interaction term be rounded? (default = 3) |
jitter_x_percent |
horizontally jitter dots by a percentage of the range of x values |
jitter_y_percent |
vertically jitter dots by a percentage of the range of y values |
dot_alpha |
opacity of the dots (0 = completely transparent,
1 = completely opaque). By default, |
reg_line_alpha |
(default) |
jn_point_font_size |
(default) |
reg_line_types |
types of the regression lines for the two levels
of the independent variable.
By default, |
caption |
(default) |
plot_margin |
margin for the plot
By default |
silent |
If |
spotlight_2_by_continuous( data = mtcars, iv_name = "am", dv_name = "mpg", mod_name = "qsec") # control for variables spotlight_2_by_continuous( data = mtcars, iv_name = "am", dv_name = "mpg", mod_name = "qsec", covariate_name = c("cyl", "hp")) # control for variables and adjust simple effect labels spotlight_2_by_continuous( data = mtcars, iv_name = "am", dv_name = "mpg", mod_name = "qsec", covariate_name = c("cyl", "hp"), reg_lines = TRUE, observed_dots = TRUE, error_bar_offset_percent = 3, error_bar_tip_width_percent = 3, simp_eff_text_offset_percent = 3, simp_eff_bracket_leg_ht_perc = 2, dot_alpha = 0.2, simp_eff_text_part_1 = "") # spotlight at specific values spotlight_2_by_continuous( data = mtcars, iv_name = "am", dv_name = "mpg", mod_name = "qsec", covariate_name = c("cyl", "hp"), focal_values = seq(15, 22, 1), reg_lines = TRUE, observed_dots = TRUE, dot_alpha = 0.2, simp_eff_text_part_1 = "", simp_eff_font_size = 4, error_bar_offset_percent = 3, error_bar_tip_width_percent = 3, simp_eff_text_offset_percent = 3, simp_eff_bracket_leg_ht_perc = 1, x_axis_breaks = seq(15, 22, 1)) # spotlight for logistic regression spotlight_2_by_continuous( data = mtcars, iv_name = "am", dv_name = "vs", mod_name = "drat", logistic = TRUE)
spotlight_2_by_continuous( data = mtcars, iv_name = "am", dv_name = "mpg", mod_name = "qsec") # control for variables spotlight_2_by_continuous( data = mtcars, iv_name = "am", dv_name = "mpg", mod_name = "qsec", covariate_name = c("cyl", "hp")) # control for variables and adjust simple effect labels spotlight_2_by_continuous( data = mtcars, iv_name = "am", dv_name = "mpg", mod_name = "qsec", covariate_name = c("cyl", "hp"), reg_lines = TRUE, observed_dots = TRUE, error_bar_offset_percent = 3, error_bar_tip_width_percent = 3, simp_eff_text_offset_percent = 3, simp_eff_bracket_leg_ht_perc = 2, dot_alpha = 0.2, simp_eff_text_part_1 = "") # spotlight at specific values spotlight_2_by_continuous( data = mtcars, iv_name = "am", dv_name = "mpg", mod_name = "qsec", covariate_name = c("cyl", "hp"), focal_values = seq(15, 22, 1), reg_lines = TRUE, observed_dots = TRUE, dot_alpha = 0.2, simp_eff_text_part_1 = "", simp_eff_font_size = 4, error_bar_offset_percent = 3, error_bar_tip_width_percent = 3, simp_eff_text_offset_percent = 3, simp_eff_bracket_leg_ht_perc = 1, x_axis_breaks = seq(15, 22, 1)) # spotlight for logistic regression spotlight_2_by_continuous( data = mtcars, iv_name = "am", dv_name = "vs", mod_name = "drat", logistic = TRUE)
Standardize (i.e., normalize, obtain z-scores, or obtain the standard scores)
standardize(x = NULL)
standardize(x = NULL)
x |
a numeric vector |
the output will be a vector of the standard scores of the input.
standardize(1:10)
standardize(1:10)
This function standardizes all variables for a regression analysis (i.e., dependent variable and all independent variables) and then conducts a regression with the standardized variables.
standardized_regression( data = NULL, formula = NULL, reverse_code_vars = NULL, sigfigs = NULL, round_digits_after_decimal = NULL, round_p = 3, pretty_round_p_value = TRUE, return_table_upper_half = FALSE, round_r_squared = 3, round_f_stat = 2, prettify_reg_table_col_names = TRUE )
standardized_regression( data = NULL, formula = NULL, reverse_code_vars = NULL, sigfigs = NULL, round_digits_after_decimal = NULL, round_p = 3, pretty_round_p_value = TRUE, return_table_upper_half = FALSE, round_r_squared = 3, round_f_stat = 2, prettify_reg_table_col_names = TRUE )
data |
a data object (a data frame or a data.table) |
formula |
a formula object for the regression equation |
reverse_code_vars |
names of binary variables to reverse code |
sigfigs |
number of significant digits to round to |
round_digits_after_decimal |
round to nth digit after decimal
(alternative to |
round_p |
number of decimal places to which to round p-values (default = 3) |
pretty_round_p_value |
logical. Should the p-values be rounded
in a pretty format (i.e., lower threshold: "<.001").
By default, |
return_table_upper_half |
logical. Should only the upper part
of the table be returned?
By default, |
round_r_squared |
number of digits after the decimal both r-squared and adjusted r-squared values should be rounded to (default 3) |
round_f_stat |
number of digits after the decimal the f statistic of the regression model should be rounded to (default 2) |
prettify_reg_table_col_names |
logical. Should the column names
of the regression table be made pretty (e.g., change "std_beta" to
"Std. Beta")? (Default = |
the output will be a data.table showing multiple regression results.
standardized_regression(data = mtcars, formula = mpg ~ gear * cyl) standardized_regression( data = mtcars, formula = mpg ~ gear + gear:am + disp * cyl, round_digits_after_decimal = 3)
standardized_regression(data = mtcars, formula = mpg ~ gear * cyl) standardized_regression( data = mtcars, formula = mpg ~ gear + gear:am + disp * cyl, round_digits_after_decimal = 3)
Start kim (update kim; attach default packages; set working directory, etc.) This function requires installing Package 'remotes' v2.4.2 (or possibly a higher version) by Csardi et al. (2021), https://cran.r-project.org/package=remotes
start_kim( update = TRUE, upgrade_other_pkg = FALSE, setup_r_env = TRUE, default_packages = c("data.table", "ggplot2"), silent_load_pkgs = c("data.table", "ggplot2") )
start_kim( update = TRUE, upgrade_other_pkg = FALSE, setup_r_env = TRUE, default_packages = c("data.table", "ggplot2"), silent_load_pkgs = c("data.table", "ggplot2") )
update |
If |
upgrade_other_pkg |
input for the |
setup_r_env |
logical. If |
default_packages |
a vector of names of packages to load and attach.
By default, |
silent_load_pkgs |
a character vector indicating names of
packages to load silently (i.e., suppress messages that get printed
when loading the packages).
By default, |
## Not run: start_kim() start_kim(default_packages = c("dplyr", "ggplot2")) start_kim(update = TRUE, setup_r_env = FALSE) ## End(Not run)
## Not run: start_kim() start_kim(default_packages = c("dplyr", "ggplot2")) start_kim(update = TRUE, setup_r_env = FALSE) ## End(Not run)
Extract unique elements and sort them
su(x = NULL, na.last = TRUE, decreasing = FALSE)
su(x = NULL, na.last = TRUE, decreasing = FALSE)
x |
a vector or a data frame or an array or NULL. |
na.last |
an argument to be passed onto the 'sort' function
(in base R) for controlling the treatment of NA values.
If |
decreasing |
logical. Should the sort be increasing or decreasing?
An argument to be passed onto the 'sort' function (in base R).
By default, |
a vector, data frame, or array-like 'x' but with duplicate elements/rows removed.
su(c(10, 3, 7, 10, NA)) su(c("b", "z", "b", "a", NA, NA, NA))
su(c(10, 3, 7, 10, NA)) su(c("b", "z", "b", "a", NA, NA, NA))
Conducts a t-test for every possible pairwise comparison with Holm or Bonferroni correction
t_test_pairwise( data = NULL, iv_name = NULL, dv_name = NULL, sigfigs = 3, welch = TRUE, cohen_d = TRUE, cohen_d_w_ci = TRUE, adjust_p = "holm", bonferroni = NULL, mann_whitney = TRUE, mann_whitney_exact = FALSE, t_test_stats = TRUE, sd = FALSE, round_p = 3, anova = FALSE, round_f = 2, round_t = 2, round_t_test_df = 2 )
t_test_pairwise( data = NULL, iv_name = NULL, dv_name = NULL, sigfigs = 3, welch = TRUE, cohen_d = TRUE, cohen_d_w_ci = TRUE, adjust_p = "holm", bonferroni = NULL, mann_whitney = TRUE, mann_whitney_exact = FALSE, t_test_stats = TRUE, sd = FALSE, round_p = 3, anova = FALSE, round_f = 2, round_t = 2, round_t_test_df = 2 )
data |
a data object (a data frame or a data.table) |
iv_name |
name of the independent variable |
dv_name |
name of the dependent variable |
sigfigs |
number of significant digits to round to |
welch |
Should Welch's t-tests be conducted?
By default, |
cohen_d |
if |
cohen_d_w_ci |
if |
adjust_p |
the name of the method to use to adjust p-values.
If |
bonferroni |
The use of this argument is deprecated.
Use the 'adjust_p' argument instead.
If |
mann_whitney |
if |
mann_whitney_exact |
this is the input for the 'exact'
argument used in the 'stats::wilcox.test' function, which
conducts a Mann-Whitney test. By default, |
t_test_stats |
if |
sd |
if |
round_p |
number of decimal places to which to round p-values (default = 3) |
anova |
Should a one-way ANOVA be conducted and reported?
By default, |
round_f |
number of decimal places to which to round the f statistic (default = 2) |
round_t |
number of decimal places to which to round the t statistic (default = 2) |
round_t_test_df |
number of decimal places to which to round the degrees of freedom for t tests (default = 2) |
the output will be a data.table showing results of all pairwise comparisons between levels of the independent variable.
## Not run: # Basic example t_test_pairwise( data = iris, iv_name = "Species", dv_name = "Sepal.Length") # Welch's t-test t_test_pairwise( data = mtcars, iv_name = "am", dv_name = "hp") # A Student's t-test t_test_pairwise( data = mtcars, iv_name = "am", dv_name = "hp", welch = FALSE) # Other examples t_test_pairwise(data = iris, iv_name = "Species", dv_name = "Sepal.Length", t_test_stats = TRUE, sd = TRUE) t_test_pairwise( data = iris, iv_name = "Species", dv_name = "Sepal.Length", mann_whitney = FALSE) ## End(Not run)
## Not run: # Basic example t_test_pairwise( data = iris, iv_name = "Species", dv_name = "Sepal.Length") # Welch's t-test t_test_pairwise( data = mtcars, iv_name = "am", dv_name = "hp") # A Student's t-test t_test_pairwise( data = mtcars, iv_name = "am", dv_name = "hp", welch = FALSE) # Other examples t_test_pairwise(data = iris, iv_name = "Species", dv_name = "Sepal.Length", t_test_stats = TRUE, sd = TRUE) t_test_pairwise( data = iris, iv_name = "Species", dv_name = "Sepal.Length", mann_whitney = FALSE) ## End(Not run)
Shows frequency and proportion of unique values in a table format
tabulate_vector( vector = NULL, na.rm = TRUE, sort_by_decreasing_count = NULL, sort_by_increasing_count = NULL, sort_by_decreasing_value = NULL, sort_by_increasing_value = NULL, total_included = TRUE, sigfigs = NULL, round_digits_after_decimal = NULL, output_type = "dt" )
tabulate_vector( vector = NULL, na.rm = TRUE, sort_by_decreasing_count = NULL, sort_by_increasing_count = NULL, sort_by_decreasing_value = NULL, sort_by_increasing_value = NULL, total_included = TRUE, sigfigs = NULL, round_digits_after_decimal = NULL, output_type = "dt" )
vector |
a character or numeric vector |
na.rm |
if |
sort_by_decreasing_count |
if |
sort_by_increasing_count |
if |
sort_by_decreasing_value |
if |
sort_by_increasing_value |
if |
total_included |
if |
sigfigs |
number of significant digits to round to |
round_digits_after_decimal |
round to nth digit after decimal
(alternative to |
output_type |
if |
if output_type = "dt"
, which is the default, the output
will be a data.table showing the count and proportion (percent) of each
element in the given vector; if output_type = "df"
, the output will
be a data.frame showing the count and proportion (percent) of each value
in the given vector.
tabulate_vector(c("a", "b", "b", "c", "c", "c", NA)) tabulate_vector(c("a", "b", "b", "c", "c", "c", NA), sort_by_increasing_count = TRUE ) tabulate_vector(c("a", "b", "b", "c", "c", "c", NA), sort_by_decreasing_value = TRUE ) tabulate_vector(c("a", "b", "b", "c", "c", "c", NA), sort_by_increasing_value = TRUE ) tabulate_vector(c("a", "b", "b", "c", "c", "c", NA), sigfigs = 4 ) tabulate_vector(c("a", "b", "b", "c", "c", "c", NA), round_digits_after_decimal = 1 ) tabulate_vector(c("a", "b", "b", "c", "c", "c", NA), output_type = "df" )
tabulate_vector(c("a", "b", "b", "c", "c", "c", NA)) tabulate_vector(c("a", "b", "b", "c", "c", "c", NA), sort_by_increasing_count = TRUE ) tabulate_vector(c("a", "b", "b", "c", "c", "c", NA), sort_by_decreasing_value = TRUE ) tabulate_vector(c("a", "b", "b", "c", "c", "c", NA), sort_by_increasing_value = TRUE ) tabulate_vector(c("a", "b", "b", "c", "c", "c", NA), sigfigs = 4 ) tabulate_vector(c("a", "b", "b", "c", "c", "c", NA), round_digits_after_decimal = 1 ) tabulate_vector(c("a", "b", "b", "c", "c", "c", NA), output_type = "df" )
Calculate tau-squared, the between-studies variance (the variance of the effect size parameters across the population of studies), as illustrated in Borenstein et al. (2009, pp. 72-73, ISBN: 978-0-470-05724-7).
tau_squared(effect_sizes = NULL, effect_size_variances = NULL)
tau_squared(effect_sizes = NULL, effect_size_variances = NULL)
effect_sizes |
effect sizes (e.g., standardized mean differences) |
effect_size_variances |
within-study variances |
Negative values of tau-squared are converted to 0 in the output (see Cheung, 2013; https://web.archive.org/web/20230512225539/https://openmx.ssri.psu.edu/thread/2432)
## Not run: tau_squared(effect_sizes = c(1, 2), effect_size_variances = c(3, 4)) # a negative tau squared value is converted to 0: tau_squared(effect_sizes = c(1.1, 1.4), effect_size_variances = c(1, 4)) ## End(Not run)
## Not run: tau_squared(effect_sizes = c(1, 2), effect_size_variances = c(3, 4)) # a negative tau squared value is converted to 0: tau_squared(effect_sizes = c(1.1, 1.4), effect_size_variances = c(1, 4)) ## End(Not run)
A custom ggplot theme
theme_kim( legend_position = "none", legend_spacing_y = 1, legend_key_size = 3, base_size = 20, axis_tick_font_size = 20, axis_tick_marks_color = "black", axis_title_font_size = 24, y_axis_title_vjust = 0.85, axis_title_margin_size = 24, cap_axis_lines = FALSE )
theme_kim( legend_position = "none", legend_spacing_y = 1, legend_key_size = 3, base_size = 20, axis_tick_font_size = 20, axis_tick_marks_color = "black", axis_title_font_size = 24, y_axis_title_vjust = 0.85, axis_title_margin_size = 24, cap_axis_lines = FALSE )
legend_position |
position of the legend (default = "none") |
legend_spacing_y |
vertical spacing of the legend keys in the unit of "cm" (default = 1) |
legend_key_size |
size of the legend keys in the unit of "lines" (default = 3) |
base_size |
base font size |
axis_tick_font_size |
font size for axis tick marks |
axis_tick_marks_color |
color of the axis tick marks |
axis_title_font_size |
font size for axis title |
y_axis_title_vjust |
position of the y axis title (default = 0.85).
If default is used, |
axis_title_margin_size |
size of the margin between axis title and the axis line |
cap_axis_lines |
logical. Should the axis lines be capped at the outer tick marks? (default = FALSE) |
If a axis lines are to be capped at the ends, the following package(s) must be installed prior to running the function: Package 'lemon' v0.4.4 (or possibly a higher version) by Edwards et al. (2020), https://cran.r-project.org/package=lemon
a ggplot object; there will be no meaningful output from
this function. Instead, this function should be used with another
ggplot object, e.g.,
ggplot(mtcars , aes(x = disp, y = mpg)) + theme_kim()
prep(ggplot2) ggplot2::ggplot(mtcars, aes(x = cyl, y = mpg)) + geom_point() + theme_kim()
prep(ggplot2) ggplot2::ggplot(mtcars, aes(x = cyl, y = mpg)) + geom_point() + theme_kim()
Indicates whether each value in a vector belongs to top, median, or bottom
top_median_or_bottom(vector)
top_median_or_bottom(vector)
vector |
a numeric vector |
a character vector indicating whether each element in a vector belongs to "top", "median", or "bottom"
top_median_or_bottom(c(1, 2, 3, NA)) top_median_or_bottom(c(1, 2, 2, NA)) top_median_or_bottom(c(1, 1, 2, NA))
top_median_or_bottom(c(1, 2, 3, NA)) top_median_or_bottom(c(1, 2, 2, NA)) top_median_or_bottom(c(1, 1, 2, NA))
Shows frequency and proportion of unique values in a table format. This function is a copy of the earlier function, tabulate_vector, in Package 'kim'
tv( vector = NULL, na.rm = FALSE, sort_by_decreasing_count = NULL, sort_by_increasing_count = NULL, sort_by_decreasing_value = NULL, sort_by_increasing_value = NULL, total_included = TRUE, sigfigs = NULL, round_digits_after_decimal = NULL, output_type = "dt" )
tv( vector = NULL, na.rm = FALSE, sort_by_decreasing_count = NULL, sort_by_increasing_count = NULL, sort_by_decreasing_value = NULL, sort_by_increasing_value = NULL, total_included = TRUE, sigfigs = NULL, round_digits_after_decimal = NULL, output_type = "dt" )
vector |
a character or numeric vector |
na.rm |
if |
sort_by_decreasing_count |
if |
sort_by_increasing_count |
if |
sort_by_decreasing_value |
if |
sort_by_increasing_value |
if |
total_included |
if |
sigfigs |
number of significant digits to round to |
round_digits_after_decimal |
round to nth digit after decimal
(alternative to |
output_type |
if |
if output_type = "dt"
, which is the default, the output
will be a data.table showing the count and proportion (percent) of each
element in the given vector; if output_type = "df"
, the output will
be a data.frame showing the count and proportion (percent) of each value
in the given vector.
tv(c("a", "b", "b", "c", "c", "c", NA)) tv(c("a", "b", "b", "c", "c", "c", NA), sort_by_increasing_count = TRUE ) tv(c("a", "b", "b", "c", "c", "c", NA), sort_by_decreasing_value = TRUE ) tv(c("a", "b", "b", "c", "c", "c", NA), sort_by_increasing_value = TRUE ) tv(c("a", "b", "b", "c", "c", "c", NA), sigfigs = 4 ) tv(c("a", "b", "b", "c", "c", "c", NA), round_digits_after_decimal = 1 ) tv(c("a", "b", "b", "c", "c", "c", NA), output_type = "df" )
tv(c("a", "b", "b", "c", "c", "c", NA)) tv(c("a", "b", "b", "c", "c", "c", NA), sort_by_increasing_count = TRUE ) tv(c("a", "b", "b", "c", "c", "c", NA), sort_by_decreasing_value = TRUE ) tv(c("a", "b", "b", "c", "c", "c", NA), sort_by_increasing_value = TRUE ) tv(c("a", "b", "b", "c", "c", "c", NA), sigfigs = 4 ) tv(c("a", "b", "b", "c", "c", "c", NA), round_digits_after_decimal = 1 ) tv(c("a", "b", "b", "c", "c", "c", NA), output_type = "df" )
This function is deprecated. Use the function 'factorial_anova_2_way' instead.
two_way_anova( data = NULL, dv_name = NULL, iv_1_name = NULL, iv_2_name = NULL, iv_1_values = NULL, iv_2_values = NULL, sigfigs = 3, robust = FALSE, iterations = 2000, plot = TRUE, error_bar = "ci", error_bar_range = 0.95, error_bar_tip_width = 0.13, error_bar_thickness = 1, error_bar_caption = TRUE, line_colors = NULL, line_types = NULL, line_thickness = 1, dot_size = 3, position_dodge = 0.13, x_axis_title = NULL, y_axis_title = NULL, y_axis_title_vjust = 0.85, legend_title = NULL, legend_position = "right", output = "anova_table", png_name = NULL, width = 7000, height = 4000, units = "px", res = 300, layout_matrix = NULL )
two_way_anova( data = NULL, dv_name = NULL, iv_1_name = NULL, iv_2_name = NULL, iv_1_values = NULL, iv_2_values = NULL, sigfigs = 3, robust = FALSE, iterations = 2000, plot = TRUE, error_bar = "ci", error_bar_range = 0.95, error_bar_tip_width = 0.13, error_bar_thickness = 1, error_bar_caption = TRUE, line_colors = NULL, line_types = NULL, line_thickness = 1, dot_size = 3, position_dodge = 0.13, x_axis_title = NULL, y_axis_title = NULL, y_axis_title_vjust = 0.85, legend_title = NULL, legend_position = "right", output = "anova_table", png_name = NULL, width = 7000, height = 4000, units = "px", res = 300, layout_matrix = NULL )
data |
a data object (a data frame or a data.table) |
dv_name |
name of the dependent variable |
iv_1_name |
name of the first independent variable |
iv_2_name |
name of the second independent variable |
iv_1_values |
restrict all analyses to observations having these values for the first independent variable |
iv_2_values |
restrict all analyses to observations having these values for the second independent variable |
sigfigs |
number of significant digits to which to round values in anova table (default = 3) |
robust |
if |
iterations |
number of bootstrap samples for robust ANOVA. The default is set at 2000, but consider increasing the number of samples to 5000, 10000, or an even larger number, if slower handling time is not an issue. |
plot |
if |
error_bar |
if |
error_bar_range |
width of the confidence interval
(default = 0.95 for 95 percent confidence interval).
This argument will not apply when |
error_bar_tip_width |
graphically, width of the segments at the end of error bars (default = 0.13) |
error_bar_thickness |
thickness of the error bars (default = 1) |
error_bar_caption |
should a caption be included to indicate the width of the error bars? (default = TRUE). |
line_colors |
colors of the lines connecting means (default = NULL)
If the second IV has two levels, then by default,
|
line_types |
types of the lines connecting means (default = NULL)
If the second IV has two levels, then by default,
|
line_thickness |
thickness of the lines connecting group means, (default = 1) |
dot_size |
size of the dots indicating group means (default = 3) |
position_dodge |
by how much should the group means and error bars be horizontally offset from each other so as not to overlap? (default = 0.13) |
x_axis_title |
a character string for the x-axis title. If no
input is entered, then, by default, the first value of
|
y_axis_title |
a character string for the y-axis title. If no
input is entered, then, by default, |
y_axis_title_vjust |
position of the y axis title (default = 0.85).
By default, |
legend_title |
a character for the legend title. If no input
is entered, then, by default, the second value of |
legend_position |
position of the legend:
|
output |
output type can be one of the following: |
png_name |
name of the PNG file to be saved.
If |
width |
width of the PNG file (default = 7000) |
height |
height of the PNG file (default = 4000) |
units |
the units for the |
res |
The nominal resolution in ppi which will be recorded in the png file, if a positive integer. Used for units other than the default. If not specified, taken as 300 ppi to set the size of text and line widths. |
layout_matrix |
The layout argument for arranging plots and tables
using the |
Conduct a two-way factorial analysis of variance (ANOVA).
The following package(s) must be installed prior to running this function: Package 'car' v3.0.9 (or possibly a higher version) by Fox et al. (2020), https://cran.r-project.org/package=car
If robust ANOVA is to be conducted, the following package(s) must be installed prior to running the function: Package 'WRS2' v1.1-1 (or possibly a higher version) by Mair & Wilcox (2021), https://cran.r-project.org/package=WRS2
by default, the output will be "anova_table"
## Not run: two_way_anova( data = mtcars, dv_name = "mpg", iv_1_name = "vs", iv_2_name = "am", iterations = 100) anova_results <- two_way_anova( data = mtcars, dv_name = "mpg", iv_1_name = "vs", iv_2_name = "am", output = "all") anova_results ## End(Not run)
## Not run: two_way_anova( data = mtcars, dv_name = "mpg", iv_1_name = "vs", iv_2_name = "am", iterations = 100) anova_results <- two_way_anova( data = mtcars, dv_name = "mpg", iv_1_name = "vs", iv_2_name = "am", output = "all") anova_results ## End(Not run)
A collection of miscellaneous functions lacking documentations
und(fn, ...)
und(fn, ...)
fn |
name of the function |
... |
arguments for the function |
the output will vary by function
# correlation und(corr_text, x = 1:5, y = c(1, 2, 2, 2, 3)) # mean center und(mean_center, 1:10) # compare results with base function scale(1:10, scale = TRUE) # find the modes und(mode, c(3, 3, 3, 1, 2, 2)) # return values that are not outliers und(outlier_rm, c(12:18, 100)) kim::outlier(c(1:10, 100))
# correlation und(corr_text, x = 1:5, y = c(1, 2, 2, 2, 3)) # mean center und(mean_center, 1:10) # compare results with base function scale(1:10, scale = TRUE) # find the modes und(mode, c(3, 3, 3, 1, 2, 2)) # return values that are not outliers und(outlier_rm, c(12:18, 100)) kim::outlier(c(1:10, 100))
Unload all user-installed packages
unload_user_installed_pkgs(exceptions = NULL, force = FALSE, keep_kim = TRUE)
unload_user_installed_pkgs(exceptions = NULL, force = FALSE, keep_kim = TRUE)
exceptions |
a character vector of names of packages to keep loaded |
force |
logical. Should a package be unloaded even though other
attached packages depend on it? By default, |
keep_kim |
logical. If |
## Not run: unload_user_installed_pkgs() ## End(Not run)
## Not run: unload_user_installed_pkgs() ## End(Not run)
Updates the current package 'kim' by installing the most recent version of the package from GitHub This function requires installing Package 'remotes' v2.4.2 (or possibly a higher version) by Csardi et al. (2021), https://cran.r-project.org/package=remotes
update_kim(force = TRUE, upgrade_other_pkg = FALSE, confirm = TRUE)
update_kim(force = TRUE, upgrade_other_pkg = FALSE, confirm = TRUE)
force |
logical. If |
upgrade_other_pkg |
input for the |
confirm |
logical. If |
there will be no output from this function. Rather, executing this function will update the current 'kim' package by installing the most recent version of the package from GitHub.
## Not run: if (interactive()) {update_kim()} ## End(Not run)
## Not run: if (interactive()) {update_kim()} ## End(Not run)
Convert the variance of a log odds ratio to the variance of a Cohen'd (standardized mean difference), as illustrated in Borenstein et al. (2009, p. 47, ISBN: 978-0-470-05724-7)
var_of_log_odds_ratio_to_var_of_d(var_of_log_odds_ratio = NULL)
var_of_log_odds_ratio_to_var_of_d(var_of_log_odds_ratio = NULL)
var_of_log_odds_ratio |
the variance of a log odds ratio (the input can be a vector of values) |
## Not run: var_of_log_odds_ratio_to_var_of_d(1) ## End(Not run)
## Not run: var_of_log_odds_ratio_to_var_of_d(1) ## End(Not run)
Calculate the variance of a percentage. See Fowler, Jr. (2014, p. 34, ISBN: 978-1-4833-1240-8)
var_of_percentage(percent = NULL, n = NULL)
var_of_percentage(percent = NULL, n = NULL)
percent |
a vector of percentages; each of the percentage values must be between 0 and 100 |
n |
a vector of sample sizes; number of observations used to calculate each of the percentage values |
var_of_percentage(percent = 40, n = 50) var_of_percentage(percent = 50, n = 10)
var_of_percentage(percent = 40, n = 50) var_of_percentage(percent = 50, n = 10)
Calculate the variance of a proportion. See Anderson and Finn (1996, p. 364, ISBN: 978-1-4612-8466-6)
var_of_proportion(p = NULL, n = NULL)
var_of_proportion(p = NULL, n = NULL)
p |
a vector of proportions; each of the proportion values must be between 0 and 1 |
n |
a vector of sample sizes; number of observations used to calculate each of the percentage values |
var_of_proportion(p = 0.56, n = 400) var_of_proportion(p = 0.5, n = 100) var_of_proportion(p = 0.4, n = 50) var_of_proportion(p = c(0.5, 0.9), n = c(100, 200))
var_of_proportion(p = 0.56, n = 400) var_of_proportion(p = 0.5, n = 100) var_of_proportion(p = 0.4, n = 50) var_of_proportion(p = c(0.5, 0.9), n = c(100, 200))
Look up values in a reference data.table and return values associated with the looked-up values contained in the reference data.table
vlookup( lookup_values = NULL, reference_dt = NULL, col_name_for_lookup_values = NULL, col_name_for_output_values = NULL )
vlookup( lookup_values = NULL, reference_dt = NULL, col_name_for_lookup_values = NULL, col_name_for_output_values = NULL )
lookup_values |
a vector of values to look up |
reference_dt |
a data.table containing the values to look up as well as values associated with the looked-up values that need to be returned. |
col_name_for_lookup_values |
in the reference data.table,
name of the column containing |
col_name_for_output_values |
in the reference data.table, name of the column containing values to return (i.e., values associated with the looked-up values that will be the function's output) |
vlookup(lookup_values = c(2.620, 2.875), reference_dt = mtcars[1:9, ], col_name_for_lookup_values = "wt", col_name_for_output_values = "qsec")
vlookup(lookup_values = c(2.620, 2.875), reference_dt = mtcars[1:9, ], col_name_for_lookup_values = "wt", col_name_for_output_values = "qsec")
Estimate the mean effect size in a meta analysis, as illustrated in Borenstein et al. (2009, pp. 73-74, ISBN: 978-0-470-05724-7)
weighted_mean_effect_size( effect_sizes = NULL, effect_size_variances = NULL, ci = 0.95, one_tailed = FALSE, random_vs_fixed = "random" )
weighted_mean_effect_size( effect_sizes = NULL, effect_size_variances = NULL, ci = 0.95, one_tailed = FALSE, random_vs_fixed = "random" )
effect_sizes |
effect sizes (e.g., standardized mean differences) |
effect_size_variances |
within-study variances |
ci |
width of the confidence interval (default = 0.95) |
one_tailed |
logical. If |
random_vs_fixed |
If |
## Not run: weighted_mean_effect_size( effect_sizes = c(1, 2), effect_size_variances = c(3, 4)) weighted_mean_effect_size( effect_sizes = c(0.095, 0.277, 0.367, 0.664, 0.462, 0.185), effect_size_variances = c(0.033, 0.031, 0.050, 0.011, 0.043, 0.023)) # if effect sizes have a variance of 0, they will be excluded from # the analysis weighted_mean_effect_size( effect_sizes = c(1.1, 1.2, 1.3, 1.4), effect_size_variances = c(1, 0, 0, 4)) ## End(Not run)
## Not run: weighted_mean_effect_size( effect_sizes = c(1, 2), effect_size_variances = c(3, 4)) weighted_mean_effect_size( effect_sizes = c(0.095, 0.277, 0.367, 0.664, 0.462, 0.185), effect_size_variances = c(0.033, 0.031, 0.050, 0.011, 0.043, 0.023)) # if effect sizes have a variance of 0, they will be excluded from # the analysis weighted_mean_effect_size( effect_sizes = c(1.1, 1.2, 1.3, 1.4), effect_size_variances = c(1, 0, 0, 4)) ## End(Not run)
Calculate the weighted mean correlation coefficient for a given correlations and sample sizes. This function uses the Hedges-Olkin Method with random effects. See Field (2001) doi:10.1037/1082-989X.6.2.161
weighted_mean_r(r = NULL, n = NULL, ci = 0.95, sigfigs = 3, silent = FALSE)
weighted_mean_r(r = NULL, n = NULL, ci = 0.95, sigfigs = 3, silent = FALSE)
r |
a (vector of) correlation coefficient(s) |
n |
a (vector of) sample size(s) |
ci |
width of the confidence interval. Input can be any value
less than 1 and greater than or equal to 0. By default, |
sigfigs |
number of significant digits to round to (default = 3) |
silent |
logical. If |
the output will be a list of vector of correlation coefficient(s).
weighted_mean_r(r = c(0.2, 0.4), n = c(100, 100)) weighted_mean_r(r = c(0.2, 0.4), n = c(100, 20000)) # example consistent with using MedCalc weighted_mean_r( r = c(0.51, 0.48, 0.3, 0.21, 0.6, 0.46, 0.22, 0.25), n = c(131, 129, 155, 121, 111, 119, 112, 145))
weighted_mean_r(r = c(0.2, 0.4), n = c(100, 100)) weighted_mean_r(r = c(0.2, 0.4), n = c(100, 20000)) # example consistent with using MedCalc weighted_mean_r( r = c(0.51, 0.48, 0.3, 0.21, 0.6, 0.46, 0.22, 0.25), n = c(131, 129, 155, 121, 111, 119, 112, 145))
Calculate the weighted z (for calculating weighted mean correlation). See p. 231 of the book Hedges & Olkin (1985), Statistical Methods for Meta-Analysis (ISBN: 0123363802).
weighted_z(z = NULL, n = NULL)
weighted_z(z = NULL, n = NULL)
z |
a vector of z values |
n |
a vector of sample sizes which will be used to calculate the weights, which in turn will be used to calculate the weighted z. |
the output will be a weighted z value.
weighted_z(1:3, c(100, 200, 300)) weighted_z(z = c(1:3, NA), n = c(100, 200, 300, NA))
weighted_z(1:3, c(100, 200, 300)) weighted_z(z = c(1:3, NA), n = c(100, 200, 300, NA))
A nonparametric equivalent of the independent t-test
wilcoxon_rank_sum_test( data = NULL, iv_name = NULL, dv_name = NULL, sigfigs = 3 )
wilcoxon_rank_sum_test( data = NULL, iv_name = NULL, dv_name = NULL, sigfigs = 3 )
data |
a data object (a data frame or a data.table) |
iv_name |
name of the independent variable (grouping variable) |
dv_name |
name of the dependent variable (measure variable of interest) |
sigfigs |
number of significant digits to round to |
the output will be a data.table object with all pairwise Wilcoxon rank-sum test results
wilcoxon_rank_sum_test( data = iris, iv_name = "Species", dv_name = "Sepal.Length")
wilcoxon_rank_sum_test( data = iris, iv_name = "Species", dv_name = "Sepal.Length")
Write to a csv file
write_csv(data = NULL, name = NULL, timestamp = NULL)
write_csv(data = NULL, name = NULL, timestamp = NULL)
data |
a data object (a data frame or a data.table) |
name |
a character string of the csv file name without the
".csv" extension. For example, if the csv file to write to is
"myfile.csv", enter |
timestamp |
logical. Should the timestamp be appended to the file name? |
the output will be a .csv file in the working directory,
that is, an output from the data.table function, fwrite
## Not run: write_csv(mtcars, "mtcars_from_write_csv") write_csv(mtcars) ## End(Not run)
## Not run: write_csv(mtcars, "mtcars_from_write_csv") write_csv(mtcars) ## End(Not run)
Calculate z-scores (i.e., standardize or obtain the standard scores)
z_score(x = NULL, na.rm = TRUE)
z_score(x = NULL, na.rm = TRUE)
x |
a numeric vector |
na.rm |
logical. If |
the output will be a vector of z-scores.
z_score(1:10)
z_score(1:10)
Perform the Z-to-r transformation (i.e., the inverse of Fisher's r-to-Z transformation) for given Z value(s).
z_to_r_transform(z = NULL)
z_to_r_transform(z = NULL)
z |
a (vector of) Z values |
the output will be a vector of correlation coefficient(s) that are the result(s) of the Z-to-r transformation.
z_to_r_transform(2.646652) z_to_r_transform(z = -3:3)
z_to_r_transform(2.646652) z_to_r_transform(z = -3:3)