Getting started¶
Installation¶
End users¶
pip install distributed-random-forest
Contributors¶
git clone https://github.com/Bowenislandsong/distributed_random_forest
cd distributed_random_forest
python -m pip install -e ".[dev,docs]"
Editable install is also fine with pip install -e . and optional extras as needed.
First federated run¶
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from distributed_random_forest import FederatedRandomForest
X, y = make_classification(
n_samples=1200,
n_features=20,
n_classes=3,
n_informative=10,
random_state=42,
)
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.2,
random_state=42,
stratify=y,
)
model = FederatedRandomForest(
n_clients=4,
rf_params={"n_estimators": 24, "random_state": 42, "voting": "weighted"},
partition_strategy="dirichlet",
partition_kwargs={"alpha": 0.5},
aggregation_strategy="auto",
execution_backend="thread",
max_workers=4,
random_state=42,
)
model.fit(X_train, y_train)
metrics = model.evaluate(X_test, y_test)
print(model.selected_strategy)
print(metrics)
Build the documentation¶
pip install -e ".[docs]"
mkdocs serve # local preview
# mkdocs build # static site in ./site
Run experiment scripts¶
| Stage | Command |
|---|---|
| EXP 1 — hyperparameters | python run_exp1_hparams.py |
| EXP 2 — per-client RFs | python run_exp2_clients.py |
| EXP 3 — federation | python run_exp3_federation.py |
| EXP 4 — DP federation | python run_exp4_dp_federation.py |
| UCI example (accuracy & latency) | python examples/benchmark_public_dataset.py — use --quick for a short run |
Local quality checks¶
If the repo includes a Makefile:
make test
make lint
make docs
make build
Without make:
python -m pytest tests -q
python -m ruff check .
python -m mkdocs build --strict
python -m build
Run tests¶
pytest tests/ -v
With coverage of the distributed_random_forest package:
pytest tests/ -v --cov=distributed_random_forest
Targeted suites:
| File | Focus |
|---|---|
tests/test_tree_utils.py |
Utilities and tree metrics |
tests/test_random_forest.py |
Core RF |
tests/test_dp_rf.py |
DP random forest |
tests/test_voting.py |
Voting |
tests/test_aggregator.py |
Aggregation |
tests/test_e2e.py |
End-to-end (synthetic) |
tests/test_e2e_public_dataset.py |
End-to-end (UCI breast cancer) |
tests/test_datasets.py |
Public dataset loader |
tests/test_performance.py |
Accuracy / latency bounds (marked performance) |
tests/test_examples_run.py |
Example script smoke test |
tests/test_parallel_e2e.py |
E2E: n_jobs=1 vs -1 parity (federated, EXP3) |
tests/test_parallel_stress.py |
Stress: many clients/trees, ranking load |
tests/test_parallelism.py |
resolve_n_jobs |
Differential privacy¶
Differential privacy is optional. The built-in DP mode works without extra packages:
model = FederatedRandomForest(
n_clients=5,
rf_params={"n_estimators": 16, "random_state": 7},
use_differential_privacy=True,
epsilon=2.0,
)
For optional privacy tooling as well:
python -m pip install -e ".[privacy]"
Reports¶
Every orchestrated run can export a JSON report:
model.export_report("artifacts/federated-run.json")
That report includes:
- client sample counts and training metrics
- partition summaries
- evaluated aggregation strategies
- validation and final test metrics
Next steps¶
- Supported distributed RF patterns for partitioning, aggregation, and DP layout.
- Code examples for API usage.
- Core concepts for design detail.