新闻

足球联赛关键球员数据统计(足球联赛核心球员数据概览)

sgrp

Interpreting user request

I need to understand the user's request in Chinese about football league key player data statistics. It's a bit vague, and they could need help with statistical analysis, script creation, or a report. I should ask clarifying questions: which league, data sources, metrics, timeframe, output format, coding language, and if a dataset exists. Also, I'm considering our environment settings; it seems I need to follow protocols regarding running commands judiciously. I plan to gather data, define metrics, and compute relevant statistics.Proposing coding options

I need to provide a Python code template using pandas for the user's request. I could suggest using CSVs from football-data.co.uk or scraping from FBref, but I must check if network approvals are needed. I’ll propose working offline and ask the user for data or permission to scrape. I can offer three options: 1) they share a CSV, and I’ll write the script; 2) they approve fetching data from certain sites; or 3) they provide the metrics and timeframe. I could include a sample script to calculate key metrics and rankings.Using StatsBomb data for analysis

I can use the StatsBomb open data from GitHub for both men's and women's football, even without scraping, although it's a bit more complex. Since the user might be Chinese-speaking, I should respond in Chinese. I’ll outline a structured plan with bullet points that clarify the scope, data source, metrics, and output. I’ll also provide a code skeleton, ask key questions, and define next steps. Following the guidelines, I'll keep the formatting light. I can also offer to adapt mapping if the user sends a sample row.Computing player metrics

I’ll compute metrics like npxG, where we set pens_xG to 0 unless a column exists. Then, I want to calculate values like npxG90, npxG+xA per 90, shot volume, key passes, SCA, GCA, and defensive actions, among others, while checking for missing columns. I plan to group percentiles by position. I can also provide a small notebook or CLI tool if the user wants that. ASCII is fine for coding, and I’ll keep it concise while ensuring clarity in the response.

想做哪种统计?先帮我确认这些关键信息:

  • 联赛与赛季范围(例如 2024-25 英超,或最近3赛季)
  • 数据来源(你有现成CSV/Excel吗?还是让我抓 FBref/Understat/StatsBomb 开源数据)
  • 指标口味与输出形式(榜单/表格/雷达图;导出 CSV/Excel/图片;是否分位置)
  • 最低上场门槛(如 ≥900 分钟)与是否做每90分钟标准化与百分位

建议统计维度(按位置分层并做每90 + 百分位):

  • 进攻产出:非点球进球、助攻、npxG、xA、射门、关键传球、SCA/GCA
  • 推进威胁:推进传球/带球、进攻三区传球、进禁区传球
  • 终结效率:进球 - xG、每射门进球、射正率
  • 组织与连接:传球尝试/成功率、三分之一/禁区传球成功率
  • 防守与对抗:抢断、拦截、压迫、空中对抗胜率、地面对抗
  • 可用性:出场分钟、首发次数、位置标签(FW/MF/DF,或更细分)
  • 综合:按权重或PCA构建“关键球员评分”,并输出榜单

下面给你一个可直接跑的 Python/pandas 脚本雏形:读取一个球员赛季汇总 CSV,计算 per90、百分位与综合评分,并导出榜单。你只需把列名映射到你实际数据即可。

# file: key_players.py
import pandas as pd
import numpy as np

en

# 配置 INPUT = "players.csv" # 你的球员赛季级数据 OUT_LEADERS = "leaders.csv" # 联赛榜单 OUT_BY_TEAM = "by_team_top.csv" MIN_MINUTES = 900 # 最低出场门槛 POS_MAP = {"F":"FW","M":"MF","D":"DF","GK":"GK"} # 粗分位置 # 期望的列(可缺省,脚本会兜底为0) EXPECTED = { "player":"player", "team":"team", "pos":"pos", "minutes":"minutes", "goals":"goals", "pens_made":"pens_made", "assists":"assists", "shots":"shots", "key_passes":"key_passes", "xG":"xG", "xA":"xA", "sca":"sca", "gca":"gca", "progressive_passes":"prog_passes", "progressive_carries":"prog_carries", "passes_into_final_third":"p_final3", "passes_into_penalty_area":"p_box", "tackles":"tackles", "interceptions":"interceptions", "pressures":"pressures", "aerials_won":"aerials_won", "aerials_lost":"aerials_lost" } def load(): df = pd.read_csv(INPUT) # 标准化列名:尽量匹配 EXPECTED 指向的名字 # 你可以直接把 EXPECTED 的 value 改成你CSV的真实列名 rename = {v:k for k,v in EXPECTED.items() if v in df.columns} df = df.rename(columns=rename) # 为缺失列补0 for k in EXPECTED.keys(): if k not in df.columns: df[k] = 0 # 位置粗分 def map_pos(p): if isinstance(p,str) and p: c = p[0].upper() return POS_MAP.get(c, c) return "UNK" df["pos_grp"] = df["pos"].apply(map_pos) return df def per90(df): m = df["minutes"].replace(0, np.nan) def p90(x): return x / m * 90 df["npxG"] = df["xG"] - 0.0 # 若有点球xG列可减去 df["npxG90"] = p90(df["npxG"]) df["xA90"] = p90(df["xA"]) df["G90"] = p90(df["goals"]) df["A90"] = p90(df["assists"]) df["Shots90"] = p90(df["shots"]) df["KP90"] = p90(df["key_passes"]) df["SCA90"] = p90(df["sca"]) df["GCA90"] = p90(df["gca"]) df["ProgAct90"] = p90(df["progressive_passes"] + df["progressive_carries"]) df["Final3P90"] = p90(df["passes_into_final_third"]) df["BoxP90"] = p90(df["passes_into_penalty_area"]) df["DefAct90"] = p90(df["tackles"] + df["interceptions"]) df["Press90"] = p90(df["pressures"]) # 空战胜率 air_total = df["aerials_won"] + df["aerials_lost"] df["AerialWin%"] = np.where(air_total>0, df["aerials_won"]/air_total, np.nan) # 进攻综合 df["npxG+xA90"] = df["npxG90"] + df["xA90"] # 终结效率(简单版) df["G-xG"] = df["goals"] - df["xG"] return df def percentiles(df, cols, by="pos_grp"): # 按位置分组做百分位 def pct(s, v): # 以nan安全的方式计算百分位 return (s < v).mean() if np.isfinite(v) else np.nan for c in cols: pname = c + "_pct" df[pname] = np.nan for g, sub in df.groupby(by): s = sub[c].astype(float) ranks = s.rank(pct=True, method="average") df.loc[sub.index, pname] = ranks.values return df def composite_score(df): # 不同位置可用不同权重,这里提供一个默认权重 weights = { "FW": {"npxG90_pct":0.35, "xA90_pct":0.15, "Shots90_pct":0.10, "KP90_pct":0.10, "SCA90_pct":0.10, "ProgAct90_pct":0.05, "DefAct90_pct":0.05, "Press90_pct":0.05, "AerialWin%_pct":0.05}, "MF": {"xA90_pct":0.20, "KP90_pct":0.15, "SCA90_pct":0.15, "ProgAct90_pct":0.15, "Final3P90_pct":0.10, "BoxP90_pct":0.05, "DefAct90_pct":0.10, "Press90_pct":0.05, "npxG90_pct":0.05}, "DF": {"DefAct90_pct":0.30, "Press90_pct":0.10, "AerialWin%_pct":0.15, "ProgAct90_pct":0.10, "Final3P90_pct":0.10, "BoxP90_pct":0.05, "xA90_pct":0.05, "KP90_pct":0.05, "npxG90_pct":0.10}, "GK": {} # 本脚本不评GK } cols_for_pct = ["npxG90","xA90","Shots90","KP90","SCA90","GCA90", "ProgAct90","Final3P90","BoxP90","DefAct90","Press90","AerialWin%"] df = percentiles(df, cols_for_pct + ["G-xG"], by="pos_grp") scores = [] for idx, row in df.iterrows(): w = weights.get(row["pos_grp"], weights["MF"]) s = 0.0 for k, alpha in w.items(): v = row.get(k, np.nan) if np.isfinite(v): s += alpha * v scores.append(s) df["KeyScore"] = scores return df def main(): df = load() # 过滤分钟 df = df[df["minutes"] >= MIN_MINUTES].copy() df = per90(df) df = composite_score(df) # 榜单 leaders = (df.sort_values(["pos_grp","KeyScore"], ascending=[True, False]) .groupby("pos_grp") .head(50) .reset_index(drop=True)) # 导出 keep_cols = ["player","team","pos","pos_grp","minutes", "npxG90","xA90","npxG+xA90","Shots90","KP90","SCA90","GCA90", "ProgAct90","Final3P90","BoxP90","DefAct90","Press90","AerialWin%", "G-xG","KeyScore"] leaders[keep_cols].to_csv(OUT_LEADERS, index=False) # 各队内前5 by_team = (df.sort_values("KeyScore", ascending=False) .groupby("team") .head(5) .reset_index(drop=True)) by_team[["team","player","pos","KeyScore","npxG+xA90","ProgAct90","DefAct90"]].to_csv(OUT_BY_TEAM, index=False) print(f"Saved {OUT_LEADERS} and {OUT_BY_TEAM}") if __name__ == "__main__": main()

使用方法

  • 准备一个 players.csv,至少含这些列:player, team, pos, minutes, goals, assists, shots, key_passes, xG, xA;其余列缺失也可运行。
  • 若你的列名不同,把脚本顶部 EXPECTED 右侧的值改成你实际列名。
  • 运行:python key_players.py,会生成 leaders.csv(联赛榜单)与 by_team_top.csv(各队内Top5)。

需要我:

  1. 直接对接你已有CSV做一次适配与输出图表;
  2. 帮你抓取指定联赛赛季的数据并清洗成上述格式;
  3. 调整权重/指标,做专属“关键球员评分”和雷达图。

给我一个样例数据头几行或说明目标联赛与赛季,我就继续完善到可复用的管道。

CA