[ML] Mondrian Forest - API reference / Properties

Recent Posts

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

관리 메뉴

너와 나의 스토리

[ML] Mondrian Forest - API reference / Properties 본문

Data Analysis/Machine learning

[ML] Mondrian Forest - API reference / Properties

노는게제일좋아! 2019. 9. 4. 21:03

Mondrain Forest Regressor 위주로 정리를 하였다.

skgarden.mondrian.MondrianForestRegressor

Parameters
- n_estimators (integer, optional(default=10)): 포레스트에 있는 트리의 개수
- max_depth (integer,optional(default=None)): 각 트리가 자라는 깊이. None이면 트리가 전체 깊이로 자라거나 min_samples_split에 의해 제한됨
- min_samples_split (integer, optional (default=2)): 만약 노드의 샘플 수가 min_samples_split보다 작아지만 성장을 멈춤
- bootstrap (boolean, optional(default=False)): 만약 bootstrap을 False로 설정하면 모든 트리는 전체 트레이닝 데이터 셋으로 트레이닝된다. True이면 각 트리는 훈련 데이터 세트에서 교체된 n_sample에 적합한다. (bagging - 중복을 허용하여 훈련 데이터 셋 생성)
- random_state (int, RandomState instance or None, optional(default=None)): 만약 int이면 random_state는 랜덤 넘버 생성자의 seed가 된다. 만약 instance이면 random_state는 랜덤 넘버 생성자가된다. 만약 None이면 랜덤 넘버 생성자는 np.random에서 사용하는 instance이다.

Methods

MondrainForestRegressor.fit(X,y): 트레이닝 셋(X,y)로부터 포레스트를 만듦
- Parameters
  - X (array-like or sparse matrix of shape = [n_samples, n_features]): 트레이닝 인풋 샘플, 내부적으로 이것의dtype은 dtype=np.float32로 변환된다. 만약 sparse matrix가 제공되며느 이것은 sparse csc_matrix로 변환된다.
  - y (array-like, shape = [n_samples] or [n_samples, n_outputs]): target values (classification에서 class label이나, regression에서 real numbers)
  - sample_weight (array-like, shape = [n_samples] or None): 샘플의 가중치. 만약 None이면 샘플들은 동일하게 가중됨. 영이나 음의 가중치로 자식 노드를 생성하는 분할은 각 노드에서 분할을 검색하는 동안 무시된다.
- Returns
  - self (object): 자기 자신 리턴
MondrianForestRegressor.partial_fit(X,y): Mondrian Forest Regressors의 incremental building
- Parameters
  - X (array-like, shape = [n_samples, n_features]): 인풋 샘플. 내부적으로 dtype=np.float32로 변환됨.
  - y (array-like, shape = [n_samples]): input targets
MondrianForestRegressor.predict(X,return_std=False)
- 예측된 mean과 std를 리턴한다.
- 평균과 분산 구하는 공식은 출처 참조
- Parameters
  - X (array-like, shape = [n_samples, n_features]): 인풋 샘플
  - return_std (boolean, default (False)): 표준 편차(standard deviation)를 리턴할지 안할지 결정
- Returns
  - y (array-like, shape = (n_samples)): X에 대한 예측
  - std (array-like, shape = (n_samples)): X에 대한 표준 편차
MondrianForestRegressor.weighted_decision_path(X)
- 포레스트에서 weighted decision path를 리턴
- decision path에서 0이 아닌 값은 예측을 도출하는 동안 특정 노드의 가중치를 결정한다.
- Parameters
  - X (array-like, shape = (n_samples, n_features)): input
- Returns
  - decision_path (sparse csr matrix, shape = (n_samples, n_total_nodes)): 0이 아닌 요소가 예측을 할 때 특정 노드의 가중치를 나타내는 노드 표시 매트릭스를 반환
  - est_inds (array-like, shape = (n_estimators+1)): weighted_decision_path[:, est_inds[i]: est_inds[i + 1]]는 i를 추정하는 weighted_decision_path를 제공한다.

skgarden.mondrian.MondrianTreeRegressor

mondrian tree regressor에서 split은 기존의 regression tree와는 다르다 ( in the following ways)

At fit time:
- 분할은 label과 독립적으로 수행된다.
- candidate feature은 항상 feature range에 비례하는 확률로 그려진다.
- candidate threshold는 candidate feature의 bound와 같은 bound를 갖는 uniform distribution에서 도출된다.
- bounding box의 크기의 역에 비례하는 분할 시간도 저장된다.
At prediction time:
- 루트부터 리프노드까지의 모든 노드는 예측이 만들어지는 동안 가중치를 가지게 된다.
- 각 노드에서 보이지 않는 샘플이 해당 노드에서 분리될 확률을 계산한다.
- 샘플이 bounding box에서 멀리 떨어져 있을수록 분리될 가능성이 더 높다.
- 모든 노드에서, 보이지 않는 샘플이 해당 노드에 도달하기 전에 분할되지 않았을 확률과 특정 노드에서 분리될 확률을 곱하여 가중치를 부여한다.

Parameters

max_depth (int or None, optional (default=None))
min_samples_split (int, float, optional (default=2))
random_state

Methods

MondrianTreeRegressor.apply(X, check_input=True)
- Parameters
  - X
  - check_input (boolean, (default=True)): 몇 가지 입력 확인을 건너뛰도록 허용. 뭐 할지 잘 모르겠으면 이 매개변수 사용하지 마세욤
- Returns
  - X_leaves (array_like, shape = [n_samples]): X에 있는 각 데이터 포인트 x에 대해, leaf x의 인덱스를 반환. 잎들은(leaves) [0; self.tree_.node_count) 내의 번호가 매겨져 있으며, 번호 매김에 공백이 있을 수 있다.
MondrianTreeRegressor.decision_path(X, check_input=True)
- 트리의 decision path 리턴
- Parameters
  - X
  - check_input
- Returns
  - indicator (sparse csr array, shape = [n_samples, n_nodes]): 0이 아닌 요소가 샘플이 노드를 통과한다는 것을 나타내는 노드 표시 매트릭스를 반환
MondrianTreeRegressor.fit(X, y, sample_weight=None, check_input=True, X_idx_sorted=None)
MondrianTreeRegressor.partial_fit(X, y)
- Mondrian tree regressor의 incremental building
MondrianTreeRegressor.predict(X, check_input=True, return_std=False)
MondrianTreeRegressor.weighted_decision_path(X, check_input=True)

출처: https://scikit-garden.github.io/api/

'Data Analysis > Machine learning' 카테고리의 다른 글

Amazon SageMaker / Jupyter 노트북 인스턴스 생성 및 실행 / 커널 설치 / mxnet_p36 (0)	2019.09.09
[ML] MondrianForestRegression으로 time series 주식 가격 예측하기 - feature 1개일 때 / 2개일 때 (0)	2019.09.06
[ML] Mondrian Forest (0)	2019.09.02
[ML] 푸아송 분포(Poisson distribution) (1)	2019.08.27
Random forest regression 실습 1 (0)	2019.08.19