Getting Started
Installation¶
End users¶
pip install distributed-random-forest
Contributors¶
git clone https://github.com/Bowenislandsong/distributed_random_forest
cd distributed_random_forest
python -m pip install -e ".[dev,docs]"
First Federated Run¶
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from distributed_random_forest import FederatedRandomForest
X, y = make_classification(
n_samples=1200,
n_features=20,
n_classes=3,
n_informative=10,
random_state=42,
)
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.2,
random_state=42,
stratify=y,
)
model = FederatedRandomForest(
n_clients=4,
rf_params={"n_estimators": 24, "random_state": 42, "voting": "weighted"},
partition_strategy="dirichlet",
partition_kwargs={"alpha": 0.5},
aggregation_strategy="auto",
execution_backend="thread",
max_workers=4,
random_state=42,
)
model.fit(X_train, y_train)
metrics = model.evaluate(X_test, y_test)
print(model.selected_strategy)
print(metrics)
Local Quality Checks¶
make test
make lint
make docs
make build
If you do not use make, the equivalent commands are:
python -m pytest tests -q
python -m ruff check .
python -m mkdocs build --strict
python -m build
Differential Privacy¶
Differential privacy is optional. The built-in DP mode works without extra packages:
model = FederatedRandomForest(
n_clients=5,
rf_params={"n_estimators": 16, "random_state": 7},
use_differential_privacy=True,
epsilon=2.0,
)
If you want external privacy tooling as well, install the optional extra:
python -m pip install -e ".[privacy]"
Reports¶
Every orchestrated run can export a JSON report:
model.export_report("artifacts/federated-run.json")
That report includes:
- client sample counts and training metrics
- partition summaries
- evaluated aggregation strategies
- validation and final test metrics