shapley-value

Shapley Value Calculator

PyPI Downloads License: MIT Python 3.8+ Project Page

A comprehensive Python package for calculating Shapley values in cooperative game theory. Shapley values provide a mathematically principled method for fairly distributing payoffs among players based on their marginal contributions to coalitions.

πŸ“– Table of Contents

🎯 Overview

The Shapley value is a solution concept from cooperative game theory that assigns a unique distribution (among the players) of a total surplus generated by the coalition of all players. This package provides four approaches:

Class When to use Scales to
ShapleyCombinations Pre-defined coalition values (exact) ~15 players
ShapleyValue Pre-defined coalition values, alt algorithm ~15 players
ShapleyValueCalculator Dynamic evaluation function (exact, parallel) ~20 players
MonteCarloShapleyValue Dynamic evaluation function (approximate, parallel) 100+ players

πŸš€ Installation

pip install shapley-value

⚑ Quick Start

Exact calculation from pre-defined coalition values

from shapley_value import ShapleyCombinations

players = ['Alice', 'Bob', 'Charlie']
coalition_values = {
    (): 0,
    ('Alice',): 10,  ('Bob',): 20,  ('Charlie',): 30,
    ('Alice', 'Bob'): 50,  ('Alice', 'Charlie'): 60,  ('Bob', 'Charlie'): 70,
    ('Alice', 'Bob', 'Charlie'): 100,
}

calculator = ShapleyCombinations(players)
shapley_values = calculator.calculate_shapley_values(coalition_values)
# {'Alice': 16.67, 'Bob': 33.33, 'Charlie': 50.0}

Monte Carlo approximation for large games

from shapley_value import MonteCarloShapleyValue

def my_model(coalition):
    """Any callable – ML model, simulation, or analytic function."""
    return sum(coalition) ** 0.9 if coalition else 0.0

mc = MonteCarloShapleyValue(
    my_model,
    players=list(range(50)),   # 50 players – exact is infeasible
    num_samples=5000,
    random_seed=42,
    n_jobs=-1,                 # use all CPU cores
)
values = mc.calculate_shapley_values()   # Dict[player, float]

πŸ“‹ Usage Examples

Method 1 – Pre-defined coalition values (ShapleyCombinations)

Use this when every coalition value is already known:

from shapley_value import ShapleyCombinations

players = ['Player1', 'Player2', 'Player3']
coalition_values = {
    (): 0,
    ('Player1',): 100,  ('Player2',): 200,  ('Player3',): 300,
    ('Player1', 'Player2'): 450,
    ('Player1', 'Player3'): 500,
    ('Player2', 'Player3'): 600,
    ('Player1', 'Player2', 'Player3'): 900,
}

calculator = ShapleyCombinations(players)
shapley_values = calculator.calculate_shapley_values(coalition_values)
print(shapley_values)

Method 2 – Exact evaluation function (ShapleyValueCalculator)

Use this when a callable computes the coalition value and you have ≀ ~20 players:

from shapley_value import ShapleyValueCalculator

def profit_function(coalition):
    if not coalition:
        return 0
    return sum(coalition) + len(coalition) * 10  # synergy bonus

players = [100, 200, 300]

calculator = ShapleyValueCalculator(profit_function, players, n_jobs=-1)
shapley_values = calculator.calculate_shapley_values()
print(shapley_values)

raw_data = calculator.get_raw_data()           # detailed per-coalition data
calculator.save_raw_data('analysis.csv')       # export to CSV

n_jobs follows the scikit-learn convention: 1 = sequential, -1 = all cores, k = exactly k cores.

Method 3 – Monte Carlo approximation (MonteCarloShapleyValue)

Use this for large games (20+ players) where exact computation is infeasible. Complexity is O(m Γ— n) where m = num_samples and n = number of players, versus O(2ⁿ) for exact methods.

from shapley_value import MonteCarloShapleyValue

def complex_model(coalition):
    """Expensive callable – e.g. a trained ML model."""
    if not coalition:
        return 0.0
    return float(sum(p ** 1.2 for p in coalition))

players = list(range(1, 51))  # 50 players

mc = MonteCarloShapleyValue(
    complex_model,
    players=players,
    num_samples=5000,   # more samples β†’ lower error
    random_seed=0,      # set for reproducibility
    n_jobs=-1,          # parallelise across all CPU cores
)

# Core result
values = mc.calculate_shapley_values()

# Diagnose how many samples you need
convergence_df = mc.get_convergence_data()  # DataFrame shape (5000, 50)

# Inspect per-permutation marginal contributions
raw_df = mc.get_raw_data()  # columns: iteration, permutation, player, marginal_contribution

Choosing num_samples

Players Suggested num_samples Typical error
≀ 10 1 000 < 0.5 %
10–30 5 000 < 1 %
30–100 10 000–50 000 1–3 %

Use get_convergence_data() to plot running estimates and verify convergence for your specific game.

n_jobs quick reference

# Sequential (default – no overhead, good for small games / debugging)
mc = MonteCarloShapleyValue(f, players, n_jobs=1)

# All available CPU cores
mc = MonteCarloShapleyValue(f, players, n_jobs=-1)

# Exactly 4 cores
mc = MonteCarloShapleyValue(f, players, n_jobs=4)

Note: Permutations are generated in a fixed order before the parallel step, so random_seed guarantees bit-identical results regardless of n_jobs.

πŸ“š API Reference

ShapleyCombinations

class ShapleyCombinations:
    def __init__(self, players: List[Any])
    def calculate_shapley_values(
        self, coalition_values: Dict[Tuple, float]
    ) -> Dict[Any, float]
    def get_all_coalitions(self) -> List[Tuple]

ShapleyValueCalculator

class ShapleyValueCalculator:
    def __init__(
        self,
        evaluation_function: Callable[[List[Any]], float],
        players: List[Any],
        n_jobs: int = 1,   # sklearn-style: 1=seq (default), -1=all cores, k=k cores
    )
    def calculate_shapley_values(self) -> Dict[Any, float]
    def get_raw_data(self) -> pd.DataFrame
    def save_raw_data(self, file_path: str) -> None

MonteCarloShapleyValue

class MonteCarloShapleyValue:
    def __init__(
        self,
        evaluation_function: Callable[[List[Any]], float],
        players: List[Any],
        num_samples: int = 1000,     # permutations to sample
        random_seed: Optional[int] = None,
        n_jobs: int = 1,             # sklearn-style parallelism
    )
    def calculate_shapley_values(self) -> Dict[Any, float]
    def get_convergence_data(self) -> pd.DataFrame   # running estimates per iteration
    def get_raw_data(self) -> pd.DataFrame           # per-permutation detail

ShapleyValue

Low-level calculator using the weighted-marginal-contribution formula directly.

class ShapleyValue:
    def __init__(
        self,
        players: List[Any],
        coalition_values: Dict[Tuple[Any, ...], float],
    )
    def calculate_shapley_values(self) -> Dict[Any, float]

✨ Features

⚑ Performance

Exact methods (ShapleyCombinations / ShapleyValueCalculator)

Exact computation enumerates all 2ⁿ coalitions and becomes impractical beyond ~20 players.

Players Coalitions Sequential Parallel (n_jobs=-1) Speedup
5 32 < 0.001 s < 0.001 s 1Γ—
10 1 024 ~ 6.6 s ~ 1.1 s ~ 6Γ—
12 4 096 ~ 31 s ~ 2.6 s ~ 12Γ—
15 32 768 ~ 4 min ~ 20 s ~ 12Γ—

Monte Carlo (MonteCarloShapleyValue)

O(m Γ— n) cost – scales to 100+ players. Parallelism is beneficial when the evaluation function is expensive (e.g. ML model inference, simulations).

Players num_samples n_jobs=1 n_jobs=-1 Notes
20 5 000 < 1 s < 1 s cheap eval function
50 5 000 < 1 s < 1 s cheap eval function
50 10 000 ~ 2 s ~ 0.7 s expensive eval (~1 ms/call)
100 10 000 ~ 5 s ~ 1.5 s expensive eval (~1 ms/call)

Rule of thumb: if your evaluation function is cheap (< 10 Β΅s), use n_jobs=1 to avoid joblib process-spawn overhead. If it’s expensive (ML model, simulation), set n_jobs=-1 for maximum throughput.

πŸ§ͺ Testing

# Install dev dependencies
pip install -e ".[dev,examples,performance]"

# Run the full test suite (53 tests)
python -m pytest tests/ -v

# Run only stress / performance tests (with printed benchmark table)
python -m pytest tests/test_montecarlo_stress.py -v -s

The suite is organised into:

File Tests Coverage
test_calculator.py 11 ShapleyValue, ShapleyCombinations, ShapleyValueCalculator
test_montecarlo.py 29 Correctness, reproducibility, parallelism, edge cases
test_montecarlo_stress.py 13 20–100 player stress, timing bounds, benchmark table

🀝 Contributing

We welcome contributions!

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes and add tests
  4. Commit your changes (git commit -m 'Add amazing feature')
  5. Push to the branch (git push origin feature/amazing-feature)
  6. Open a Pull Request

Development Setup

git clone https://github.com/Bowenislandsong/shapley-value.git
cd shapley-value
pip install -e ".[dev,examples,performance]"
python -m pytest tests/

πŸ“„ License

This project is licensed under the MIT License – see the LICENSE file for details.

πŸ“ Citation

If you use this package in your research or project, please cite it as:

@software{song2026shapley,
  author = {Song, Bowen},
  title = {Shapley Value Calculator},
  year = {2026},
  publisher = {GitHub},
  url = {https://github.com/Bowenislandsong/shapley-value},
  version = {0.0.9}
}

APA Format:

Song, B. (2026). Shapley Value Calculator (Version 0.0.9) [Computer software]. https://github.com/Bowenislandsong/shapley-value

For more citation formats see CITATION.cff.

🏷️ Version History

πŸ‘₯ Authors


For more information about Shapley values and cooperative game theory, see the Wikipedia article.