Factors

11 Oct 2022 in Notes / Dataanalytics / R / R Programming Language

Factors are different from Vectors

문자열 벡터와 팩터

문자열 벡터와 팩터

벡터의 cat & print

single, double quotations (인용보어) 참고
큰 인용보어는 문자열 안에 안쓰는게 맞는듯 하다..(예상대로 안나옴)
- 근데 cat에 사용하면 제대로 나옴!
작은 인용보어는 ““안에서 사용하거나 ' 로 하면 된다

> "The 'R' project"
 [1] "The 'R' project"
> 'The "R" project'
 [1] "The \"R\" project"

> a <- 'The \'R\' project for stats'
> b <- "THe \"R\" project for stats"
> a; b;
 [1] "THe 'R' project for stats"
 [1] "The \"R\" project for stats"

> print(a)
 [1] "The 'R' project for stats"
> cat(a, '\n')
 The 'R' project for stats
> print(b)
 [1] "The \"R\" project for stats"
> cat(b, '\n')
 The "R" project for stats

문자열과 벡터, print와 cat의 차이점을 잘 살펴보기

escape 문자

print에서는 \t, ", \n 예상대로 나오지 않는다
cat에서는 나옴

> a <- 'a\'b\"c\td\ne' ;  a  ;
[1] "a'b\"c\td\ne"

> cat(a, '\n')
a'b"c   d
e 

[참고] : print문은 변수 안에 저장 가능, cat함수는 저장 불가능 (선언 시 호출은 되나 다음 호출부터 출력 안됨)

Factor

> x <- rep(c("male", "female"), 5)
> x
 [1] "male"   "female" "male"   "female" "male"   "female" "male"   "female" "male"   "female"

> y <- factor(x)
> y
 [1] male   female male   female male   female male   female male   female
 Levels: female male
> class(y)
 [1] "factor"
> str(y)
 Factor w/ 2 levels "female", "male": 2 1 2 1 2 1 2 1 2 1

factor: 척도, 모두 문자열로 변환된다 => 인용보어가 (필요)없다
Level: 알파벳 오름차 순으로 표기한다. 그에 따른 인덱스를 반환
character 여부: x (vector) 은 is.character 시 TRUE, y (factor) 은 FALSE

> summary( x )
   Length     Class      Mode
       10 character character
> summary(y)
    female   male
        5      5
> table(x)
    x
    female   male
        5      5
> table(y)
    y
    female   male
        5      5

summary : 적절한 요약 통계를 알려준다
- vector: 통계를 내준다 (문자형: 빈도수, 수치형: quadrant별 통계)
- factor: 문자열이기 떄문에 숫에 의한 통계는 아닌, 빈도수를 추출해 준다.
table : factor의 summary와 비슷하지만, 테이블명이 같이 출력된다.
cat 함수 시:
- vector: 내용 출력 male female male female ...
- factor: Level의 index값 출력 2 1 2 1 ...
  - 단, cat( as.factor/character( y ), '\n')하면 vector과 됨
    - level항목 미출력
    - print도 동일

수치 벡터와 팩터

수치 벡터

> x <- rep(1:3, 3)
> x
[1] 1 2 3 1 2 3 1 2 3
> class(x)
[1] "integer"
> str(x)
int [1:9] 1 2 3 1 2 3 1 2 3
> summary(x)
  Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
      1       1       2       2       3       3

수치 팩터**

> y <- factor(x)
> y
[1] 1 2 3 1 2 3 1 2 3
Levels: 1 2 3
> str(y)
Factor w/ 3 levels "1","2","3": 1 2 3 1 2 3 1 2 3
> summary(y)
1 2 3
3 3 3

summary의 첫번째 줄(key): 레벨, 둘째줄(value): 빈도값

함수 levels

levels 출력 시 (팩터는 문자열) 벡터로 전환되며 인용보어 사용

> levels(x)
NULL                # 벡터는 level이 없다
> levels(y)
[1] "1" "2" "3"     # 문자열로 변환됨

> levels( rep(c("male", "female"), 5))
NULL
> levels( factor(rep(c("male", "female"), 5)))
[1] "female" "male"

Named Vector
Vecor에도 Level 명 비슷한거 만들 수 있음 / Factor에는 names() 불가능

Level

Factor꺼 / Vector에는 levels()불가능

> names(x) <- LETTERS[1:length(x)]
A B C D E F G H I
1 2 3 1 2 3 1 2 3
> class(x)
[1] "integer"
> str(x)
Named int [1:9] 1 2 3 ...
- attr(*, "names") = chr [1:9] "A" ...
> levels(x)
NULL
> names(x)
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I"
> names(y)
NULL

vector, named vector, factor의 차이점을 명확하게 알기

팩터 수준 순서 및 값 편집

값 편집

Level에 없는 값으로 변경/추가 시 Warning
해당 펙터 호출 시 값으로 출력 (바뀐 부분만)

> x
[1] "mid" "low" "mid"
> y
[1] mid low mid
Levels: low mid

> x[3] <- "high"    #문제 없음
> y[3] <- "high"    #Warning 뜸. LEvel에 없는 거
> y[3] <- "low"     # 문제 없음

Level 값 추가

중복값 추가 시 무시함

> y <- factor( y, levels = c(levels(y), "high"))
> y[3] <- "high"    #문제 없음
> y
[1] mid  low  high
Levels: low mid high

Level 값 변경

주의: 값도 함께 바뀐다

> levels(y)[1:3] <- c("nazeum", "zhonggan", "nopem")
> y
[1] zhonggan nazeum   nopem
Levels: nazeum zhonggan nopem

labels 로 초기에 레벨 명 변경하기

> y <- factor(y, levels = c("low", "mid", "high", "very.high"),
+                labels = c( "SMALL", "MEDIUM", "LARGE", "HUGE")); y;
[1] MEDIUM SMALL  MEDIUM
Levels: SMALL MEDIUM LARGE HUGE
#다시 변경
> levels(y)[1:3] <- c("low", "mid", "high"); y;
[1] mid low mid
Levels: low mid high HUGE

as.factor

> a <- sample(1:3, 20, replace = T)
> str(a)
 int [1:20] 1 3 1 2 1 1 3 3 1 2 ...
> summary(a)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    1.0     1.0     2.0     2.1     3.0     3.0 
> b <- as.character(a)
> str(b)
 chr [1:20] "1" "3" "1" "2" "1" "1" "3" "3" "1" "2" "1" "2" "2" "3" "3" ...
> summary(b)
   Length     Class      Mode 
       20 character character 
> c <- as.factor(a)
> str(c)
 Factor w/ 3 levels "1","2","3": 1 3 1 2 1 1 3 3 1 2 ...
> summary(c)
1 2 3 
6 6 8 

연습문제

> x <- factor( rep(c("long", "intmed", "short"), 1:3))
> x
[1] long   intmed intmed short  short  short
Levels: intmed long short

1.변수명 x에 대해 Levels의 순서를 다음과 같이 변환: Levels: short intmed long

  > x <- factor( x, levels = levels(x)[ c( 3, 1, 2 )])
  > x
  [1] long   intmed intmed short  short  short 
  Levels: short intmed long

–밑에 코드 작성 시 값도 함께 바뀜–

OR: levels(x) <- levels(x)[c(3,1,2)]
OR: levels(x)[1:3] <- c("short", "intmed", "long")
OR: levels(x) <- c("short", "intmed", "long")

2. intmed를 intermed로 바꾸기

> levels(x)[2] <- "intermed"
> x
[1] long    intermed   intermed short    short    short   
Levels: short intermed long

Factors

문자열 벡터와 팩터

벡터의 cat & print

Factor

수치 벡터와 팩터

함수 levels

팩터 수준 순서 및 값 편집

as.factor

연습문제

HYPNOTES

Error

문자열 벡터와 팩터

벡터의 cat & print

Factor

수치 벡터와 팩터

함수 levels

팩터 수준 순서 및 값 편집

as.factor

연습문제

Templates (for web app):

Error