January 15, 2021

R Factors

Factors in R

Categorise values into discrete categories.
Factors are used to store categorical variables.
R stores characters/strings as factors, by default.

factor(x = character(), levels, labels = levels, exclude = NA, ordered = is.ordered(x), nmax = NA)
dice_vector <- c(3, 2, 4, 4, 5, 3, 2, 6, 8, 6, 2, 6, 5, 3, 4, 5)
dice_factor <-  factor(dice_vector)
logical_levels <- factor(c(TRUE,FALSE,TRUE,FALSE,TRUE,TRUE))
x <- factor(c("single", "married", "married", "single"))
x <- factor(c("single", "married", "married", "single"), levels = c("single", "married", "divorced"))
factor(letters[1:20], labels = "letter")
(x <- factor(c(1, 2, NA), exclude = NULL))
my_factor <- factor(my_vector, levels = c("S", "M", "L"), labels = c("Small", "Medium", "Large"))
factor(z, exclude = "B") 
factor_temp <- factor(temp_vector, order = TRUE, levels = c("Low", "Medium", "High"))
red$age <- factor(red$age, levels=c('19', '18-24', '25-34', '35-44', '45-54', '55-64', '65'), ordered=T)
a_factor <- factor(sample(c(LETTERS[1:5], NA), 50, replace = TRUE))
bank$marital <- factor(bank$marital)
y <- gl(33, 6, 2, labels = c("apple", "salad", "orange"))
as.factor(x)
as.factor(vector22)
as.factor(c(rep("male", 8), rep("female", 3)))

factor( ) function can be applied on vector; cannot be applied on each row of the data frame since it’s a list
By default, the function factor() transforms a vector into an unordered factor. On unordered factors, we can't use relational operators.
To create an ordered factor, you have to use ordered or levels arguments.

x[2] <- "divorced"    # modify second element of  x

levels(dice_factor)
levels(x) <- c(levels(x), "widowed")    # add new level
levels(x)[2] <- "high" # renaming second level
levels(factor_survey_factor) <- c("Female", "Male") # renaming all levels
levels(z) <- list("fruit" = c("apple", "orange"), "veg" = "salad")

ds_dropeed_vals <- droplevesl(ds$colname) # removes unused levels of factor variables
email50_big$number <- droplevels(email50_big$number)
male <- droplevels(male)

summary(dice_factor)
summary(as.factor(lst))
summary(attenu$station, maxsum = 20)
s = summary(factor(red, levels=c(0, 1, 2, 3, 4)))
str(FACTOR_R)
is.factor(x)

ordered(x, …)
ordered(red$age, levels=c('19', '18-24', '25-34', '35-44', '45-54', '55-64', '65'))
mydata$v1 <- ordered(mydata$y, levels = c(1, 3, 5), labels = c("Low", "Medium", "High"))
is.ordered(r_factor)
as.ordered(x)

addNA(x, ifany = FALSE)

Accessing components of a factor is very much similar to that of vectors.

for (i in factor_columns) {
    bank[, i] <- factor(bank[, i]) }

table(ds$colname)
table(email50_big$number)

mean(ds$colname)
med_num_char <- median(email50$num_char)


Related aRticles:   R Data Frames       R Matrices


No comments:

Post a Comment