shapley-value

Shapley Value Calculator

A comprehensive Python package for calculating Shapley values in cooperative game theory. Shapley values provide a mathematically principled method for fairly distributing payoffs among players based on their marginal contributions to coalitions.

📖 Table of Contents

Overview
Installation
Quick Start
Usage Examples
API Reference
Features
Performance
Testing
Contributing
License

🎯 Overview

The Shapley value is a solution concept from cooperative game theory that assigns a unique distribution (among the players) of a total surplus generated by the coalition of all players. This package provides four approaches:

Class	When to use	Scales to
`ShapleyCombinations`	Pre-defined coalition values (exact)	~15 players
`ShapleyValue`	Pre-defined coalition values, alt algorithm	~15 players
`ShapleyValueCalculator`	Dynamic evaluation function (exact, parallel)	~20 players
`MonteCarloShapleyValue`	Dynamic evaluation function (approximate, parallel)	100+ players

🚀 Installation

pip install shapley-value

⚡ Quick Start

Exact calculation from pre-defined coalition values

from shapley_value import ShapleyCombinations

players = ['Alice', 'Bob', 'Charlie']
coalition_values = {
    (): 0,
    ('Alice',): 10,  ('Bob',): 20,  ('Charlie',): 30,
    ('Alice', 'Bob'): 50,  ('Alice', 'Charlie'): 60,  ('Bob', 'Charlie'): 70,
    ('Alice', 'Bob', 'Charlie'): 100,
}

calculator = ShapleyCombinations(players)
shapley_values = calculator.calculate_shapley_values(coalition_values)
# {'Alice': 16.67, 'Bob': 33.33, 'Charlie': 50.0}

Monte Carlo approximation for large games

from shapley_value import MonteCarloShapleyValue

def my_model(coalition):
    """Any callable – ML model, simulation, or analytic function."""
    return sum(coalition) ** 0.9 if coalition else 0.0

mc = MonteCarloShapleyValue(
    my_model,
    players=list(range(50)),   # 50 players – exact is infeasible
    num_samples=5000,
    random_seed=42,
    n_jobs=-1,                 # use all CPU cores
)
values = mc.calculate_shapley_values()   # Dict[player, float]

📋 Usage Examples

Method 1 – Pre-defined coalition values (`ShapleyCombinations`)

Use this when every coalition value is already known:

from shapley_value import ShapleyCombinations

players = ['Player1', 'Player2', 'Player3']
coalition_values = {
    (): 0,
    ('Player1',): 100,  ('Player2',): 200,  ('Player3',): 300,
    ('Player1', 'Player2'): 450,
    ('Player1', 'Player3'): 500,
    ('Player2', 'Player3'): 600,
    ('Player1', 'Player2', 'Player3'): 900,
}

calculator = ShapleyCombinations(players)
shapley_values = calculator.calculate_shapley_values(coalition_values)
print(shapley_values)

Method 2 – Exact evaluation function (`ShapleyValueCalculator`)

Use this when a callable computes the coalition value and you have ≤ ~20 players:

from shapley_value import ShapleyValueCalculator

def profit_function(coalition):
    if not coalition:
        return 0
    return sum(coalition) + len(coalition) * 10  # synergy bonus

players = [100, 200, 300]

calculator = ShapleyValueCalculator(profit_function, players, n_jobs=-1)
shapley_values = calculator.calculate_shapley_values()
print(shapley_values)

raw_data = calculator.get_raw_data()           # detailed per-coalition data
calculator.save_raw_data('analysis.csv')       # export to CSV

n_jobs follows the scikit-learn convention: 1 = sequential, -1 = all cores, k = exactly k cores.

Method 3 – Monte Carlo approximation (`MonteCarloShapleyValue`)

Use this for large games (20+ players) where exact computation is infeasible. Complexity is O(m × n) where m = num_samples and n = number of players, versus O(2ⁿ) for exact methods.

from shapley_value import MonteCarloShapleyValue

def complex_model(coalition):
    """Expensive callable – e.g. a trained ML model."""
    if not coalition:
        return 0.0
    return float(sum(p ** 1.2 for p in coalition))

players = list(range(1, 51))  # 50 players

mc = MonteCarloShapleyValue(
    complex_model,
    players=players,
    num_samples=5000,   # more samples → lower error
    random_seed=0,      # set for reproducibility
    n_jobs=-1,          # parallelise across all CPU cores
)

# Core result
values = mc.calculate_shapley_values()

# Diagnose how many samples you need
convergence_df = mc.get_convergence_data()  # DataFrame shape (5000, 50)

# Inspect per-permutation marginal contributions
raw_df = mc.get_raw_data()  # columns: iteration, permutation, player, marginal_contribution

Choosing `num_samples`

Players	Suggested `num_samples`	Typical error
≤ 10	1 000	< 0.5 %
10–30	5 000	< 1 %
30–100	10 000–50 000	1–3 %

Use get_convergence_data() to plot running estimates and verify convergence for your specific game.

`n_jobs` quick reference

# Sequential (default – no overhead, good for small games / debugging)
mc = MonteCarloShapleyValue(f, players, n_jobs=1)

# All available CPU cores
mc = MonteCarloShapleyValue(f, players, n_jobs=-1)

# Exactly 4 cores
mc = MonteCarloShapleyValue(f, players, n_jobs=4)

Note: Permutations are generated in a fixed order before the parallel step, so random_seed guarantees bit-identical results regardless of n_jobs.

📚 API Reference

`ShapleyCombinations`

class ShapleyCombinations:
    def __init__(self, players: List[Any])
    def calculate_shapley_values(
        self, coalition_values: Dict[Tuple, float]
    ) -> Dict[Any, float]
    def get_all_coalitions(self) -> List[Tuple]

`ShapleyValueCalculator`

class ShapleyValueCalculator:
    def __init__(
        self,
        evaluation_function: Callable[[List[Any]], float],
        players: List[Any],
        n_jobs: int = 1,   # sklearn-style: 1=seq (default), -1=all cores, k=k cores
    )
    def calculate_shapley_values(self) -> Dict[Any, float]
    def get_raw_data(self) -> pd.DataFrame
    def save_raw_data(self, file_path: str) -> None

`MonteCarloShapleyValue`

class MonteCarloShapleyValue:
    def __init__(
        self,
        evaluation_function: Callable[[List[Any]], float],
        players: List[Any],
        num_samples: int = 1000,     # permutations to sample
        random_seed: Optional[int] = None,
        n_jobs: int = 1,             # sklearn-style parallelism
    )
    def calculate_shapley_values(self) -> Dict[Any, float]
    def get_convergence_data(self) -> pd.DataFrame   # running estimates per iteration
    def get_raw_data(self) -> pd.DataFrame           # per-permutation detail

`ShapleyValue`

Low-level calculator using the weighted-marginal-contribution formula directly.

class ShapleyValue:
    def __init__(
        self,
        players: List[Any],
        coalition_values: Dict[Tuple[Any, ...], float],
    )
    def calculate_shapley_values(self) -> Dict[Any, float]

✨ Features

Four calculation methods – exact (pre-defined values, evaluation function) and Monte Carlo approximation for large games
sklearn-style n_jobs on both ShapleyValueCalculator and MonteCarloShapleyValue: 1 = sequential, -1 = all cores, k = exactly k cores
Reproducible parallel results – random_seed in MonteCarloShapleyValue gives bit-identical output regardless of n_jobs
Convergence diagnostics – get_convergence_data() lets you plot running estimates and tune num_samples for your accuracy target
Data export – get_raw_data() and save_raw_data() for downstream analysis
Type flexibility – any hashable player type (strings, ints, tuples, …)
Comprehensive test suite – 53 tests covering correctness, reproducibility, parallelism, edge cases, and stress / performance

⚡ Performance

Exact methods (`ShapleyCombinations` / `ShapleyValueCalculator`)

Exact computation enumerates all 2ⁿ coalitions and becomes impractical beyond ~20 players.

Players	Coalitions	Sequential	Parallel (`n_jobs=-1`)	Speedup
5	32	< 0.001 s	< 0.001 s	1×
10	1 024	~ 6.6 s	~ 1.1 s	~ 6×
12	4 096	~ 31 s	~ 2.6 s	~ 12×
15	32 768	~ 4 min	~ 20 s	~ 12×

Monte Carlo (`MonteCarloShapleyValue`)

O(m × n) cost – scales to 100+ players. Parallelism is beneficial when the evaluation function is expensive (e.g. ML model inference, simulations).

Players	`num_samples`	`n_jobs=1`	`n_jobs=-1`	Notes
20	5 000	< 1 s	< 1 s	cheap eval function
50	5 000	< 1 s	< 1 s	cheap eval function
50	10 000	~ 2 s	~ 0.7 s	expensive eval (~1 ms/call)
100	10 000	~ 5 s	~ 1.5 s	expensive eval (~1 ms/call)

Rule of thumb: if your evaluation function is cheap (< 10 µs), use n_jobs=1 to avoid joblib process-spawn overhead. If it’s expensive (ML model, simulation), set n_jobs=-1 for maximum throughput.

🧪 Testing

# Install dev dependencies
pip install -e ".[dev,examples,performance]"

# Run the full test suite (53 tests)
python -m pytest tests/ -v

# Run only stress / performance tests (with printed benchmark table)
python -m pytest tests/test_montecarlo_stress.py -v -s

The suite is organised into:

File	Tests	Coverage
`test_calculator.py`	11	`ShapleyValue`, `ShapleyCombinations`, `ShapleyValueCalculator`
`test_montecarlo.py`	29	Correctness, reproducibility, parallelism, edge cases
`test_montecarlo_stress.py`	13	20–100 player stress, timing bounds, benchmark table

🤝 Contributing

We welcome contributions!

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes and add tests
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Setup

git clone https://github.com/Bowenislandsong/shapley-value.git
cd shapley-value
pip install -e ".[dev,examples,performance]"
python -m pytest tests/

📄 License

This project is licensed under the MIT License – see the LICENSE file for details.

📝 Citation

If you use this package in your research or project, please cite it as:

@software{song2026shapley,
  author = {Song, Bowen},
  title = {Shapley Value Calculator},
  year = {2026},
  publisher = {GitHub},
  url = {https://github.com/Bowenislandsong/shapley-value},
  version = {0.0.9}
}

APA Format:

Song, B. (2026). Shapley Value Calculator (Version 0.0.9) [Computer software]. https://github.com/Bowenislandsong/shapley-value

For more citation formats see CITATION.cff.

🏷️ Version History

0.0.9: PyPI publish on v* tag push; citation / version metadata
0.0.8: ShapleyValueCalculator uses n_jobs (sklearn-style); doc and landing-page sync; example runner includes Monte Carlo example
0.0.7: MonteCarloShapleyValue with sklearn-style n_jobs; comprehensive stress tests; convergence and raw-data diagnostics
0.0.6: Citation information for academic use
0.0.5: CI improvements and Python 3.13 support
0.0.4: Comprehensive examples, parallel processing, performance optimizations
0.0.3: Enhanced parallel processing
0.0.2: Function-based evaluation and data export
0.0.1: Initial release

👥 Authors

Bowen Song – Profile

🔗 Links

For more information about Shapley values and cooperative game theory, see the Wikipedia article.

This site is open source. Improve this page.