Zum Inhalt

Python API Referenz

Die Python API Referenz dient als zentrale Anlaufstelle für Entwicklerinnen und Entwickler. Sie hilft dabei, Funktionen, Klassen, Module nachvollziehen und effizient nutzen zu können. Sie bietet eine strukturierte Übersicht über alle öffentlich zugänglichen Elemente des Codes und deren Verwendung.

In dieser Referenz finden Sie detaillierte Informationen zu:

  • Modulen und Paketen: Welche Module verfügbar sind und wie sie importiert werden.

  • Funktionen und Methoden: Beschreibung von Parametern, Rückgabewerten, möglichen Ausnahmen und Anwendungsbeispielen.

  • Klassen und Objekten: Informationen zu Konstruktoren, Attributen, geerbten Methoden.

Diese Dokumentation richtet sich sowohl an Einsteigerinnen und Einsteiger als auch an erfahrene Entwicklerinnen und Entwickler. Sie soll den Einstieg erleichtern, die Entwicklung beschleunigen und die Wiederverwendbarkeit des Codes fördern.

evaluation

MODULE DESCRIPTION
main

Main script with the command line interface.

src

Main module for the evaluation code.

main

Main script with the command line interface.

FUNCTION DESCRIPTION
main

Run the command line interface.

main

main()

Run the command line interface.

Source code in docs/repositories-clones/evaluation/main.py
def main() -> None:
    """Run the command line interface."""
    parser = argparse.ArgumentParser(description="")

    subparsers = parser.add_subparsers(help="subcommand help", dest="subparser_name")

    parser_evaluate = subparsers.add_parser("evaluate", help="Run and evaluate an experiment.")
    parser_evaluate.add_argument(
        "--only-transform",
        action="store_true",
        help="Run only the transformation, without evaluation and summary.",
    )

    _ = subparsers.add_parser("create-dataset", help="Create a synthetic dataset.")
    combine_experiments_parser = subparsers.add_parser(
        "combine-experiments", help="Create a summary for different experiments."
    )
    combine_experiments_parser.add_argument(
        "paths",
        nargs="+",
        help="The paths to the experiment folders to combine.",
        type=Path,
    )

    # extra logic to have evaluate as the default subcommand
    # first parse known args to extract subparser_name
    args, extras = parser.parse_known_args()
    if args.subparser_name is None:
        # if no subparser name is given, use the evaluate subparser
        args = parser_evaluate.parse_args(extras)
        args.subparser_name = "evaluate"
    else:
        # if a subparser name is given, parser the args regularly
        args = parser.parse_args()

    if args.subparser_name == "evaluate":
        results_folder = Path(f"results/{settings.experiment_name}__{settings.experiment_suffix}")
        results_folder.mkdir(parents=True, exist_ok=True)

        # copy the configuration
        copy_config_dir = results_folder / "config"
        # copy files (excluding hidden files)
        shutil.copytree(
            Path("config"),
            copy_config_dir,
            dirs_exist_ok=True,
            ignore=shutil.ignore_patterns(r".*"),
        )
        # write commit hash to copied config directory
        try:
            commit_hash = (
                subprocess.check_output(["/usr/bin/git", "rev-parse", "HEAD"])  # noqa: S603 (command is static)
                .decode("utf-8")
                .strip()
            )
            with open(copy_config_dir / "commit_hash.txt", "w", encoding="utf-8") as f:
                f.write(f"{commit_hash}\n")
        except (subprocess.CalledProcessError, FileNotFoundError) as e:
            print("Could not store commit hash:", e)

        print("Running transformation...")
        transform(results_folder=results_folder)

        if not args.only_transform:
            print("Running evaluation...")
            evaluate(results_folder=results_folder)

            print("Running summarize...")
            summarize(input_folders=[results_folder], output_folder=results_folder)

        print("Done.")
    elif args.subparser_name == "create-dataset":
        create_dataset()
    elif args.subparser_name == "combine-experiments":
        output_folder = Path("results") / "combined_experiments" / settings.experiment_suffix
        output_folder.mkdir(parents=True, exist_ok=True)
        summarize(input_folders=args.paths, output_folder=output_folder)

src

Main module for the evaluation code.

MODULE DESCRIPTION
create_dataset

Generate a synthetic dataset.

evaluate

Main script to evaluate transformed texts.

indices

Functions to calculate additional indices.

models

Pydantic models.

openai_evaluator

Class for text evaluation using OpenAI.

request_handling

Utility function for making robust synchronous HTTP POST requests with retries and error handling.

settings

Loads the configuration and makes it accessible for other modules.

summarize

Script for extended and summarized evaluation.

transform

Script for LLM-Evaluation.

utils

Utility functions.

create_dataset

Generate a synthetic dataset.

FUNCTION DESCRIPTION
create_dataset

Create a synthetic dataset.

format_user_prompt

Format the user prompt string with values from the configuration.

generate

Perform a chat completion using a system prompt and a user prompt.

create_dataset
create_dataset()

Create a synthetic dataset.

The parameters are provided by configuration files.

Source code in docs/repositories-clones/evaluation/src/create_dataset.py
def create_dataset() -> None:
    """Create a synthetic dataset.

    The parameters are provided by configuration files.
    """
    data_dir = Path("data")
    config = DatasetCreationSettings()

    user_prompt_format = format_user_prompt(config=config)

    print("=== User Prompt ===")
    print(user_prompt_format)
    print("===================")

    llm_provider = OpenAI(
        api_key=config.llm_config.api.auth.secret.get_secret_value(),
        base_url=str(config.llm_config.api.url),
    )

    dataset = generate(
        llm_provider=llm_provider,
        system_prompt=config.system_prompt,
        user_prompt=user_prompt_format,
        llm_config=config.llm_config,
    )

    print("=== LLM Output ===")
    print(dataset)
    print("==================")

    df = pd.DataFrame(dataset.examples, columns=[settings.input_column_name])

    if config.output_format == "csv":
        output_path = data_dir / f"{config.dataset_name}.csv"
        save_dataframe(df, output_path)
    elif config.output_format == "xlsx":
        output_path = data_dir / f"{config.dataset_name}.xlsx"
        df.to_excel(output_path, index=False)
    print(f"Datei '{output_path}' mit {len(df)} Beispielen erstellt.")
format_user_prompt
format_user_prompt(config)

Format the user prompt string with values from the configuration.

PARAMETER DESCRIPTION
config

Configuration dictionary that must contain the keys: - "user" (str): Prompt template with placeholders. - "num_examples" (int or str): Number of examples to generate. - "description" (str): Description of the dataset/task. - "criteria" (str): Criteria or constraints for the examples.

TYPE: DatasetCreationSettings

RETURNS DESCRIPTION
str

The formatted user prompt.

TYPE: str

Source code in docs/repositories-clones/evaluation/src/create_dataset.py
def format_user_prompt(config: DatasetCreationSettings) -> str:
    """Format the user prompt string with values from the configuration.

    Args:
        config (DatasetCreationSettings): Configuration dictionary that must contain the keys:
            - "user" (str): Prompt template with placeholders.
            - "num_examples" (int or str): Number of examples to generate.
            - "description" (str): Description of the dataset/task.
            - "criteria" (str): Criteria or constraints for the examples.

    Returns:
        str: The formatted user prompt.
    """
    return config.user_prompt.format(
        num_examples=config.num_examples,
        description=config.description,
        criteria=config.criteria,
    )
generate
generate(llm_provider, system_prompt, user_prompt, llm_config)

Perform a chat completion using a system prompt and a user prompt.

PARAMETER DESCRIPTION
llm_provider

An OpenAI client or a compatible LLM provider.

TYPE: OpenAI

system_prompt

The system-level prompt that defines the model's behavior.

TYPE: str

user_prompt

The user input prompt.

TYPE: str

llm_config

Model configuration parameters. Must include the attribute "label" specifying the model name or identifier.

TYPE: LLMConfig

RETURNS DESCRIPTION
Dataset

A Dataset object containing the generated examples.

TYPE: Dataset

Source code in docs/repositories-clones/evaluation/src/create_dataset.py
def generate(
    llm_provider: OpenAI, system_prompt: str, user_prompt: str, llm_config: LLMConfig
) -> Dataset:
    """Perform a chat completion using a system prompt and a user prompt.

    Args:
        llm_provider (OpenAI):
            An OpenAI client or a compatible LLM provider.
        system_prompt (str):
            The system-level prompt that defines the model's behavior.
        user_prompt (str):
            The user input prompt.
        llm_config (LLMConfig):
            Model configuration parameters. Must include the attribute `"label"`
            specifying the model name or identifier.

    Returns:
        Dataset:
            A `Dataset` object containing the generated examples.
    """
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt},
    ]
    response = llm_provider.responses.parse(
        model=llm_config.label,
        input=messages,
        text_format=Dataset,
    )
    content = response.output_parsed
    return content

evaluate

Main script to evaluate transformed texts.

FUNCTION DESCRIPTION
evaluate

Main function to run the evaluation.

evaluate_folder

Runs the evaluation for the data in the results_folder.

run_index_evaluation

Calculates indices for the texts in a DataFrame.

run_llm_criteria_evaluation

Performs an LLM-based criteria evaluation on the transformed texts in a DataFrame.

evaluate
evaluate(results_folder)

Main function to run the evaluation.

The parameters are provided by configuration files.

PARAMETER DESCRIPTION
results_folder

The path to the folder were the results of this run are stored.

TYPE: Path

Source code in docs/repositories-clones/evaluation/src/evaluate.py
def evaluate(results_folder: Path) -> None:
    """Main function to run the evaluation.

    The parameters are provided by configuration files.

    Args:
        results_folder (Path): The path to the folder were the results of this run
            are stored.
    """
    # filter selected evaluation tasks
    tasks = {key: settings.llm_tasks[key] for key in settings.tasks}

    # setup LLM provider for evaluation
    evaluation_llm_provider = EvaluationProviderOpenAiLike(settings.llm_config.evaluation)

    # folder to store the evaluation data
    results_folder.mkdir(parents=True, exist_ok=True)

    evaluation_dirs = []

    for directory, _, files in results_folder.walk():
        if settings.transformed_data_filename in files:
            evaluation_dirs.append(directory)

    evaluation_dirs = sorted(evaluation_dirs, key=str)
    for directory in evaluation_dirs:
        evaluate_folder(
            results_folder=directory,
            evaluation_llm_provider=evaluation_llm_provider,
            tasks=tasks,
            original_column=settings.input_column_name,
            transformed_column=settings.output_column_name,
        )
evaluate_folder
evaluate_folder(*, results_folder, evaluation_llm_provider, tasks, original_column, transformed_column)

Runs the evaluation for the data in the results_folder.

PARAMETER DESCRIPTION
results_folder

Path to the folder where data and results are stored.

TYPE: Path

evaluation_llm_provider

An object that provides the evaluate method to perform the task comparison evaluation.

TYPE: EvaluationProviderOpenAiLike

tasks

A dictionary where the keys are task identifiers, and the values are task descriptions used for the task evaluation.

TYPE: dict

original_column

Name of the column containing the original text.

TYPE: str

transformed_column

Name of the column containing the transformed text.

TYPE: str

RETURNS DESCRIPTION
None

None

Source code in docs/repositories-clones/evaluation/src/evaluate.py
def evaluate_folder(
    *,
    results_folder: Path,
    evaluation_llm_provider: EvaluationProviderOpenAiLike,
    tasks: dict,
    original_column: str,
    transformed_column: str,
) -> None:
    """Runs the evaluation for the data in the results_folder.

    Args:
        results_folder (Path): Path to the folder where data and results are stored.
        evaluation_llm_provider (EvaluationProviderOpenAiLike): An object that provides
            the `evaluate` method to perform the task comparison evaluation.
        tasks (dict): A dictionary where the keys are task identifiers,
            and the values are task descriptions used for the task evaluation.
        original_column (str): Name of the column containing the original text.
        transformed_column (str): Name of the column containing the transformed text.

    Returns:
        None
    """
    print(f"Evaluate transformed text in {results_folder}\n")
    # load dataframe with original and transformed text
    evaluation_df = pd.read_csv(
        results_folder / settings.transformed_data_filename,
        delimiter=settings.csv_separator,
    )

    # run LLM-based evaluation and compute indices
    run_llm_criteria_evaluation(
        evaluation_df=evaluation_df,
        openai_provider=evaluation_llm_provider,
        tasks=tasks,
        transformed_column=transformed_column,
        original_column=original_column,
    )

    run_index_evaluation(
        evaluation_df=evaluation_df,
        transformed_column=transformed_column,
        original_column=original_column,
    )

    if len(settings.score_weighting) > 0:
        # calculate score if a score weighting is specified
        calculate_weighted_average(evaluation_df, settings.score_weighting, result_column="Score")

    evaluation_filepath = results_folder / settings.evaluation_output_filename
    print(f"Saving evaluation data for '{results_folder}' to '{evaluation_filepath}'")
    save_dataframe(df=evaluation_df, filepath=evaluation_filepath)
    print("")
run_index_evaluation
run_index_evaluation(evaluation_df, transformed_column, original_column)

Calculates indices for the texts in a DataFrame.

PARAMETER DESCRIPTION
evaluation_df

The DataFrame containing the texts to be evaluated.

TYPE: DataFrame

transformed_column

Name of the column containing the transformed text.

TYPE: str

original_column

Name of the column containing the original text.

TYPE: str

RETURNS DESCRIPTION
None

The function modifies the evaluation_df DataFrame in-place, adding new columns for the results.

TYPE: None

Source code in docs/repositories-clones/evaluation/src/evaluate.py
def run_index_evaluation(
    evaluation_df: pd.DataFrame,
    transformed_column: str,
    original_column: str,
) -> None:
    """Calculates indices for the texts in a DataFrame.

    Args:
        evaluation_df (pd.DataFrame): The DataFrame containing the texts to be evaluated.
        transformed_column (str): Name of the column containing the transformed text.
        original_column (str): Name of the column containing the original text.

    Returns:
        None: The function modifies the `evaluation_df` DataFrame in-place,
            adding new columns for the results.
    """

    # using partial for closure in loop, see
    # https://docs.astral.sh/ruff/rules/function-uses-loop-variable/
    def _wrapper(row: pd.Series, index_fn: Callable) -> bool | int | float | None:
        return index_fn(
            original_text=row[original_column],
            transformed_text=row[transformed_column],
        )

    wrapped_selected_indices = {}
    for index_name in settings.indices:
        if index_name not in INDEX_FUNCTIONS:
            raise ValueError(
                f"There is no function associated with the supplied index name '{index_name}'. "
                "Please check the 'indices' key of your configuration."
            )

        wrapped_selected_indices[index_name] = partial(
            _wrapper,
            index_fn=INDEX_FUNCTIONS[index_name],
        )

    for index_name, wrapped_index_fn in wrapped_selected_indices.items():
        print(f'Calculating index "{index_name}"...')

        index_result = evaluation_df.progress_apply(wrapped_index_fn, axis=1)
        evaluation_df[index_name] = index_result
run_llm_criteria_evaluation
run_llm_criteria_evaluation(evaluation_df, openai_provider, tasks, transformed_column, original_column, output_column_prefix='')

Performs an LLM-based criteria evaluation on the transformed texts in a DataFrame.

The function compares the texts in the transformed_column of the DataFrame with the original texts in the original_column based on specific tasks provided in the tasks dictionary. Each task in the dictionary is evaluated, and the results are stored in-place in new columns named '{output_column_prefix}{task_key}' in the DataFrame. The evaluation results are then converted into boolean values indicating whether the task comparison criteria are met.

PARAMETER DESCRIPTION
evaluation_df

The DataFrame containing the texts to be evaluated.

TYPE: DataFrame

openai_provider

An object that provides the evaluate method to perform the task comparison evaluation.

TYPE: EvaluationProviderOpenAiLike

tasks

A dictionary where the keys are task identifiers, and the values are task descriptions used for the task comparison.

TYPE: dict

transformed_column

Name of the column containing the transformed text.

TYPE: str

original_column

Name of the column containing the original text.

TYPE: str

output_column_prefix

Prefix for the column names of the results.

TYPE: str DEFAULT: ''

RETURNS DESCRIPTION
None

The function modifies the evaluation_df DataFrame in-place, adding new columns for the results.

TYPE: None

Source code in docs/repositories-clones/evaluation/src/evaluate.py
def run_llm_criteria_evaluation(
    evaluation_df: pd.DataFrame,
    openai_provider: EvaluationProviderOpenAiLike,
    tasks: dict[str, str],
    transformed_column: str,
    original_column: str,
    output_column_prefix: str = "",
) -> None:
    """Performs an LLM-based criteria evaluation on the transformed texts in a DataFrame.

    The function compares the texts in the transformed_column of the DataFrame with the original texts
    in the original_column based on specific tasks provided in the `tasks` dictionary.
    Each task in the dictionary is evaluated, and the results are stored in-place in new columns named
    '{output_column_prefix}{task_key}' in the DataFrame. The evaluation results are then converted
    into boolean values indicating whether the task comparison criteria are met.

    Args:
        evaluation_df (pd.DataFrame): The DataFrame containing the texts to be evaluated.
        openai_provider (EvaluationProviderOpenAiLike): An object that provides the `evaluate`
            method to perform the task comparison evaluation.
        tasks (dict): A dictionary where the keys are task identifiers,
            and the values are task descriptions used for the task comparison.
        transformed_column (str): Name of the column containing the transformed text.
        original_column (str): Name of the column containing the original text.
        output_column_prefix (str): Prefix for the column names of the results.

    Returns:
        None: The function modifies the `evaluation_df` DataFrame in-place,
            adding new columns for the results.
    """
    print("Run LLM Criteria Evaluation...")
    for task_key, task_description in tasks.items():
        column_name = f"{output_column_prefix}{task_key}"

        def provider_evaluate(row: pd.Series, description: str) -> str:
            return openai_provider.evaluate(
                evaluate_input=row[transformed_column],
                system_prompt="evaluate_task_comparison",
                prompt_input=[row[original_column], description],
            )

        # using partial, as lambda inside loop leads to unexpected behavior,
        # see https://pylint.readthedocs.io/en/latest/user_guide/messages/warning/cell-var-from-loop.html
        evaluation_df[column_name] = evaluation_df.progress_apply(
            partial(provider_evaluate, description=task_description),
            axis=1,
        )
        evaluation_df[column_name] = evaluation_df[column_name].apply(convert_str_to_bool)

indices

Functions to calculate additional indices.

FUNCTION DESCRIPTION
example_index

An example index which serves as a template for new indices.

llm_hallucination_index

Performs an LLM-based hallucination evaluation on the original and transformed text.

example_index
example_index(*, original_text, transformed_text)

An example index which serves as a template for new indices.

Calculates the difference in length between original and transformed text.

PARAMETER DESCRIPTION
original_text

The column containing the original text.

TYPE: Series

transformed_text

The column containing the transformed text.

TYPE: Series

RETURNS DESCRIPTION
float

The difference in length between original and transformed text.

TYPE: float

Source code in docs/repositories-clones/evaluation/src/indices.py
def example_index(
    *,
    original_text: str,
    transformed_text: str,
) -> float:
    """An example index which serves as a template for new indices.

    Calculates the difference in length between original and transformed text.

    Args:
        original_text (pd.Series): The column containing the original text.
        transformed_text (pd.Series): The column containing the transformed text.

    Returns:
        float: The difference in length between original and transformed text.
    """
    original_length = len(original_text)
    transformed_length = len(transformed_text)
    return float(transformed_length - original_length)
llm_hallucination_index
llm_hallucination_index(*, original_text, transformed_text)

Performs an LLM-based hallucination evaluation on the original and transformed text.

The function checks for the transformed text if hallucinations occur in comparison to the original text.

PARAMETER DESCRIPTION
original_text

The column containing the original text.

TYPE: str

transformed_text

The column containing the transformed text.

TYPE: str

RETURNS DESCRIPTION
bool | None

bool or None: The result of the evaluation

Source code in docs/repositories-clones/evaluation/src/indices.py
def llm_hallucination_index(
    *,
    original_text: str,
    transformed_text: str,
) -> bool | None:
    """Performs an LLM-based hallucination evaluation on the original and transformed text.

    The function checks for the transformed text if hallucinations occur in
    comparison to the original text.

    Args:
        original_text (str): The column containing the original text.
        transformed_text (str): The column containing the transformed text.

    Returns:
        bool or None: The result of the evaluation
    """
    openai_provider = EvaluationProviderOpenAiLike(settings.llm_config.evaluation)
    return convert_str_to_bool(
        openai_provider.evaluate(
            evaluate_input=transformed_text,
            system_prompt="evaluate_hallucination",
            prompt_input=original_text,
        )
    )

models

Pydantic models.

MODULE DESCRIPTION
api_input

pydantic Models for API input parameters.

api_output

pydantic Models for API output parameters.

general

Load and check settings from YAML.

llm_input

Pydantic models for LLM configuration.

api_input

pydantic Models for API input parameters.

CLASS DESCRIPTION
SimplifyInput

Input model for /simplify endpoint to simplify text input.

SimplifyInput

Bases: BaseModel

Input model for /simplify endpoint to simplify text input.

input_text (str): The text to be simplified. language_model (str): The identifier of the language model to use.

Source code in docs/repositories-clones/evaluation/src/models/api_input.py
class SimplifyInput(BaseModel):
    """Input model for /simplify endpoint to simplify text input.

    input_text (str): The text to be simplified.
    language_model (str): The identifier of the language model to use.
    """

    input_text: str
    language_model: str
api_output

pydantic Models for API output parameters.

CLASS DESCRIPTION
SimplifyOutput

Represents the result of a text simplification process.

SimplifyOutput

Bases: BaseModel

Represents the result of a text simplification process.

ATTRIBUTE DESCRIPTION
input_text

The original input text.

TYPE: str

simplified_text

The simplified version of the input text.

TYPE: str

Source code in docs/repositories-clones/evaluation/src/models/api_output.py
class SimplifyOutput(BaseModel):
    """Represents the result of a text simplification process.

    Attributes:
        input_text (str): The original input text.
        simplified_text (str): The simplified version of the input text.
    """

    input_text: str
    simplified_text: str
general

Load and check settings from YAML.

CLASS DESCRIPTION
BaseTransformation

Base type for a transformation.

Dataset

Represents a dataset of text samples.

DatasetCreationSettings

Contains specific settings for dataset creation.

LLMConfig

Configuration for the list of available large language models.

PostConfig

Configuration for async_post request to other microservices.

Settings

The combined settings for the evaluation.

YamlSettings

A settings class that can read YAML files.

BaseTransformation

Bases: BaseModel

Base type for a transformation.

ATTRIBUTE DESCRIPTION
type

The type of the transformation.

TYPE: str

label

The human-readable label of the transformation.

TYPE: str

Source code in docs/repositories-clones/evaluation/src/models/general.py
class BaseTransformation(BaseModel):
    """Base type for a transformation.

    Attributes:
        type (str): The type of the transformation.
        label (str): The human-readable label of the transformation.
    """

    model_config = ConfigDict(extra="allow")
    type: str
    label: str
Dataset

Bases: BaseModel

Represents a dataset of text samples.

ATTRIBUTE DESCRIPTION
examples

The list of text samples contained in the dataset.

TYPE: list[str]

Source code in docs/repositories-clones/evaluation/src/models/general.py
class Dataset(BaseModel):
    """Represents a dataset of text samples.

    Attributes:
        examples (list[str]):
            The list of text samples contained in the dataset.
    """

    examples: list[str]
DatasetCreationSettings

Bases: YamlSettings

Contains specific settings for dataset creation.

ATTRIBUTE DESCRIPTION
dataset_name

The name of the dataset, used in the output filename.

TYPE: str

num_examples

The number of examples to prompt the LLM for (might be inaccurate).

TYPE: str

description

Short description/title of the dataset.

TYPE: str

criteria

Positive criteria describing the target outcome.

TYPE: str

llm_config

Configuration of the LLM.

TYPE: DatasetCreationLLM

system_prompt

The system prompt of the LLM.

TYPE: str

user_prompt

The user prompt template where num_examples, description and criteria can be inserted.

TYPE: str

output_format

Whether to output CSV or Excel files.

TYPE: str

Source code in docs/repositories-clones/evaluation/src/models/general.py
class DatasetCreationSettings(YamlSettings):
    """Contains specific settings for dataset creation.

    Attributes:
        dataset_name (str): The name of the dataset, used in the output filename.
        num_examples (str): The number of examples to prompt the LLM for (might be inaccurate).
        description (str): Short description/title of the dataset.
        criteria (str): Positive criteria describing the target outcome.
        llm_config (DatasetCreationLLM): Configuration of the LLM.
        system_prompt (str): The system prompt of the LLM.
        user_prompt (str): The user prompt template where num_examples,
            description and criteria can be inserted.
        output_format (str): Whether to output CSV or Excel files.
    """

    model_config = SettingsConfigDict(yaml_file="config/create_dataset.yaml", extra="forbid")
    dataset_name: str
    num_examples: str
    description: str
    criteria: str
    llm_config: DatasetCreationLLM
    system_prompt: str
    user_prompt: str
    output_format: Literal["csv", "xlsx"] = "csv"
LLMConfig

Bases: YamlSettings

Configuration for the list of available large language models.

ATTRIBUTE DESCRIPTION
evaluation

The configuration for the LLM used for evaluation.

TYPE: EvaluationLLM

Source code in docs/repositories-clones/evaluation/src/models/general.py
class LLMConfig(YamlSettings):
    """Configuration for the list of available large language models.

    Attributes:
        evaluation (EvaluationLLM): The configuration for the LLM used for evaluation.
    """

    model_config = SettingsConfigDict(yaml_file="config/llm_parameters.yaml", extra="forbid")
    evaluation: EvaluationLLM
PostConfig

Bases: BaseModel

Configuration for async_post request to other microservices.

The default values in this class can be overwritten by those values stated in configs/general.yml.

ATTRIBUTE DESCRIPTION
model_config

Used to ignore other services, which are defined in the config.

TYPE: ConfigDict

max_attempts

Maximal number of requests before returning status code 424.

TYPE: PositiveInt

timeout_in_s

Maximum waiting duration before timeout (in seconds).

TYPE: PositiveInt

Source code in docs/repositories-clones/evaluation/src/models/general.py
class PostConfig(BaseModel):
    """Configuration for async_post request to other microservices.

    The default values in this class can be overwritten by those values stated in configs/general.yml.

    Attributes:
        model_config (ConfigDict): Used to ignore other services, which are defined in the config.
        max_attempts (PositiveInt): Maximal number of requests before returning status code 424.
        timeout_in_s (PositiveInt):  Maximum waiting duration before timeout (in seconds).
    """

    model_config = ConfigDict(extra="ignore")

    max_attempts: PositiveInt = 1
    timeout_in_s: PositiveInt = 180
Settings

Bases: YamlSettings

The combined settings for the evaluation.

ATTRIBUTE DESCRIPTION
experiment_name

The name of the experiment, also used as a folder name.

TYPE: str

backend_endpoint

The url of the backend.

TYPE: AnyHttpUrl

data_files

The names of the data_files without file extension.

TYPE: list[str]

replications

The number of replications to run.

TYPE: PositiveInt

input_column_name

The name of the column with the input texts.

TYPE: str

output_column_name

The name of the column for the transformed texts.

TYPE: str

transformations

The transformations to apply to the text and evaluate.

TYPE: dict

tasks

The tasks to use for the evaluation.

TYPE: list

indices

The indices to compute for the evaluation.

TYPE: list

map

The map from metrics to human-readable labels.

TYPE: dict

score_weighting

The (non-normalized) weighting to calculate the score.

TYPE: dict

llm_tasks

The task definitions for the tasks evaluated by the LLM.

TYPE: dict

llm_config

The configuration for the LLM used for the evaluation.

TYPE: LLMConfig

transformed_data_filename

Filename to store the transformed texts.

TYPE: str

evaluation_output_filename

Filename to store the single evaluation results.

TYPE: str

transformation_metadata_filename

Filename to strore metadata about the transformation.

TYPE: str

transformation_label_column

The name of the column where the transformation label is stored.

TYPE: str

csv_separator

The CSV separator used to read and write CSV files.

TYPE: str

Source code in docs/repositories-clones/evaluation/src/models/general.py
class Settings(YamlSettings):
    """The combined settings for the evaluation.

    Attributes:
        experiment_name (str): The name of the experiment, also used as a folder name.
        backend_endpoint (AnyHttpUrl): The url of the backend.
        data_files (list[str]): The names of the data_files without file extension.
        replications (PositiveInt): The number of replications to run.
        input_column_name (str): The name of the column with the input texts.
        output_column_name (str): The name of the column for the transformed texts.
        transformations (dict): The transformations to apply to the text and evaluate.
        tasks (list): The tasks to use for the evaluation.
        indices (list): The indices to compute for the evaluation.
        map (dict): The map from metrics to human-readable labels.
        score_weighting (dict): The (non-normalized) weighting to calculate the score.
        llm_tasks (dict): The task definitions for the tasks evaluated by the LLM.
        llm_config (LLMConfig): The configuration for the LLM used for the evaluation.
        transformed_data_filename (str): Filename to store the transformed texts.
        evaluation_output_filename (str): Filename to store the single evaluation results.
        transformation_metadata_filename (str): Filename to strore metadata about the transformation.
        transformation_label_column (str): The name of the column where the transformation label is stored.
        csv_separator (str): The CSV separator used to read and write CSV files.
    """

    model_config = SettingsConfigDict(yaml_file="config/evaluation.yaml", extra="forbid")
    experiment_name: str
    backend_endpoint: AnyHttpUrl = "http://backend:8000/"
    data_files: list[str]
    replications: PositiveInt = 3
    input_column_name: str
    output_column_name: str = "Transformed"
    transformations: dict[str, BaseTransformation]
    tasks: list[str]
    indices: list[str] | tuple[str] = tuple()
    map: dict[str, str]
    score_weighting: dict[str, NonNegativeFloat] = Field(default_factory=dict)
    llm_tasks: dict[str, str]
    llm_config: LLMConfig = LLMConfig()

    transformed_data_filename: str = "input_transformed.csv"
    evaluation_output_filename: str = "evaluation_results.csv"
    transformation_metadata_filename: str = "transformation_metadata.yaml"

    transformation_label_column: str = "Model"
    csv_separator: str = ";"

    _experiment_suffix: str = PrivateAttr(
        default_factory=lambda: datetime.now().strftime("%Y%m%d-%H%M%S")
    )

    @computed_field
    @property
    def experiment_suffix(self) -> str:
        """The experiment suffix to distinguish different experiment runs."""
        return self._experiment_suffix
experiment_suffix property
experiment_suffix

The experiment suffix to distinguish different experiment runs.

YamlSettings

Bases: BaseSettings

A settings class that can read YAML files.

METHOD DESCRIPTION
settings_customise_sources

Define the sources and their order for loading the settings values.

Source code in docs/repositories-clones/evaluation/src/models/general.py
class YamlSettings(BaseSettings):
    """A settings class that can read YAML files."""

    @classmethod
    def settings_customise_sources(
        cls,
        settings_cls: type[BaseSettings],
        init_settings: PydanticBaseSettingsSource,
        env_settings: PydanticBaseSettingsSource,
        dotenv_settings: PydanticBaseSettingsSource,
        file_secret_settings: PydanticBaseSettingsSource,
    ) -> tuple[PydanticBaseSettingsSource, ...]:
        """Define the sources and their order for loading the settings values.

        Args:
            settings_cls: The Settings class.
            init_settings: The `InitSettingsSource` instance.
            env_settings: The `EnvSettingsSource` instance.
            dotenv_settings: The `DotEnvSettingsSource` instance.
            file_secret_settings: The `SecretsSettingsSource` instance.

        Returns:
            A tuple containing the sources and their order for loading the settings values.
        """
        return (
            init_settings,
            env_settings,
            dotenv_settings,
            file_secret_settings,
            YamlConfigSettingsSource(settings_cls),
        )
settings_customise_sources classmethod
settings_customise_sources(settings_cls, init_settings, env_settings, dotenv_settings, file_secret_settings)

Define the sources and their order for loading the settings values.

PARAMETER DESCRIPTION
settings_cls

The Settings class.

TYPE: type[BaseSettings]

init_settings

The InitSettingsSource instance.

TYPE: PydanticBaseSettingsSource

env_settings

The EnvSettingsSource instance.

TYPE: PydanticBaseSettingsSource

dotenv_settings

The DotEnvSettingsSource instance.

TYPE: PydanticBaseSettingsSource

file_secret_settings

The SecretsSettingsSource instance.

TYPE: PydanticBaseSettingsSource

RETURNS DESCRIPTION
tuple[PydanticBaseSettingsSource, ...]

A tuple containing the sources and their order for loading the settings values.

Source code in docs/repositories-clones/evaluation/src/models/general.py
@classmethod
def settings_customise_sources(
    cls,
    settings_cls: type[BaseSettings],
    init_settings: PydanticBaseSettingsSource,
    env_settings: PydanticBaseSettingsSource,
    dotenv_settings: PydanticBaseSettingsSource,
    file_secret_settings: PydanticBaseSettingsSource,
) -> tuple[PydanticBaseSettingsSource, ...]:
    """Define the sources and their order for loading the settings values.

    Args:
        settings_cls: The Settings class.
        init_settings: The `InitSettingsSource` instance.
        env_settings: The `EnvSettingsSource` instance.
        dotenv_settings: The `DotEnvSettingsSource` instance.
        file_secret_settings: The `SecretsSettingsSource` instance.

    Returns:
        A tuple containing the sources and their order for loading the settings values.
    """
    return (
        init_settings,
        env_settings,
        dotenv_settings,
        file_secret_settings,
        YamlConfigSettingsSource(settings_cls),
    )
llm_input

Pydantic models for LLM configuration.

CLASS DESCRIPTION
APIAuth

Defines Authentification settings for LLM.

DatasetCreationLLM

Configuration of a Large Language Model for the dataset creation.

EvaluationLLM

Configuration of a Large Language Model.

LLMAPI

Defines API-Connection to LLM.

LLMInference

Defines Inference parameters.

LLMPromptConfig

Defines the structure of a LLM prompt configuration.

LLMPrompts

Defines the selectable LLM Prompts.

APIAuth

Bases: BaseModel

Defines Authentification settings for LLM.

ATTRIBUTE DESCRIPTION
secret_path

File path where the api token or credentials are stored.

TYPE: FilePath

Source code in docs/repositories-clones/evaluation/src/models/llm_input.py
class APIAuth(BaseModel):
    """Defines Authentification settings for LLM.

    Attributes:
        secret_path (FilePath): File path where the api token or credentials are stored.
    """

    secret_path: FilePath

    @property
    def secret(self) -> SecretStr:
        """The secret variable."""
        with open(self.secret_path) as file:
            return SecretStr(file.read().strip())
secret property
secret

The secret variable.

DatasetCreationLLM

Bases: BaseModel

Configuration of a Large Language Model for the dataset creation.

ATTRIBUTE DESCRIPTION
label

Model name which is used in API call, e.g. ollama tag.

TYPE: str

api

API information.

TYPE: LLMAPI

Source code in docs/repositories-clones/evaluation/src/models/llm_input.py
class DatasetCreationLLM(BaseModel):
    """Configuration of a Large Language Model for the dataset creation.

    Attributes:
        label (str): Model name which is used in API call, e.g. ollama tag.
        api (LLMAPI): API information.
    """

    label: str
    api: LLMAPI
EvaluationLLM

Bases: BaseModel

Configuration of a Large Language Model.

ATTRIBUTE DESCRIPTION
label

Model name which is used in API call, e.g. ollama tag.

TYPE: str

api

API information.

TYPE: LLMAPI

inference

Inference parameters.

TYPE: LLMInference

prompt_yaml_file

Path to prompts.

TYPE: Path

Source code in docs/repositories-clones/evaluation/src/models/llm_input.py
class EvaluationLLM(BaseModel):
    """Configuration of a Large Language Model.

    Attributes:
        label (str): Model name which is used in API call, e.g. ollama tag.
        api (LLMAPI): API information.
        inference (LLMInference): Inference parameters.
        prompt_yaml_file (Path): Path to prompts.
    """

    model_config = ConfigDict(extra="ignore")

    label: str
    api: LLMAPI
    inference: LLMInference
    prompt_yaml_file: Path

    @property
    def prompt_config(self) -> LLMPromptConfig:
        """The system prompts for the model read from a YAML file."""
        return LLMPromptConfig(**_load_yml_config(self.prompt_yaml_file))
prompt_config property
prompt_config

The system prompts for the model read from a YAML file.

LLMAPI

Bases: BaseModel

Defines API-Connection to LLM.

ATTRIBUTE DESCRIPTION
url

URL to model.

TYPE: AnyHttpUrl

auth

Authentification settings for LLM

TYPE: APIAuth | None

Source code in docs/repositories-clones/evaluation/src/models/llm_input.py
class LLMAPI(BaseModel):
    """Defines API-Connection to LLM.

    Attributes:
        url (AnyHttpUrl): URL to model.
        auth (APIAuth | None): Authentification settings for LLM
    """

    url: AnyHttpUrl
    auth: APIAuth | None = None
LLMInference

Bases: BaseModel

Defines Inference parameters.

ATTRIBUTE DESCRIPTION
temperature

Randomness / variation of the output High values indicate more creativity.

TYPE: PositiveFloat

top_p

Threshold for sampling only from the most likely tokens.

TYPE: PositiveFloat

frequency_penalty

Reduces the likelihood of repeating tokens based on their existing frequency in the text.

TYPE: float

presence_penalty

Encourages the model to introduce new tokens by penalizing tokens that have already appeared.

TYPE: float

Source code in docs/repositories-clones/evaluation/src/models/llm_input.py
class LLMInference(BaseModel):
    """Defines Inference parameters.

    Attributes:
        temperature (PositiveFloat): Randomness / variation of the output High values indicate more creativity.
        top_p (PositiveFloat): Threshold for sampling only from the most likely tokens.
        frequency_penalty (float): Reduces the likelihood of repeating tokens based on their existing frequency
            in the text.
        presence_penalty (float): Encourages the model to introduce new tokens by penalizing tokens that have
            already appeared.
    """

    temperature: float = 0.7
    top_p: float = 1.0
    frequency_penalty: float = 0.0
    presence_penalty: float = 0.0
LLMPromptConfig

Bases: BaseModel

Defines the structure of a LLM prompt configuration.

ATTRIBUTE DESCRIPTION
model_config

Used to ignore other services, which are defined in the config.

TYPE: ConfigDict

system

System prompt.

TYPE: LLMPrompts

Source code in docs/repositories-clones/evaluation/src/models/llm_input.py
class LLMPromptConfig(BaseModel):
    """Defines the structure of a LLM prompt configuration.

    Attributes:
        model_config (ConfigDict): Used to ignore other services, which are defined in the config.
        system (LLMPrompts): System prompt.
    """

    model_config = ConfigDict(extra="ignore")

    system_prompts: LLMPrompts
LLMPrompts

Bases: BaseModel

Defines the selectable LLM Prompts.

ATTRIBUTE DESCRIPTION
model_config

Used to ignore other services, which are defined in the config.

TYPE: ConfigDict

evaluate_hallucination

Prompt for hallucination evaluation.

TYPE: str

evaluate_task_comparison

Prompt for task evaluation.

TYPE: str

Source code in docs/repositories-clones/evaluation/src/models/llm_input.py
class LLMPrompts(BaseModel):
    """Defines the selectable LLM Prompts.

    Attributes:
        model_config (ConfigDict): Used to ignore other services, which are defined in the config.
        evaluate_hallucination (str): Prompt for hallucination evaluation.
        evaluate_task_comparison (str): Prompt for task evaluation.
    """

    model_config = ConfigDict(extra="ignore")

    evaluate_hallucination: str = ""
    evaluate_task_comparison: str = ""

openai_evaluator

Class for text evaluation using OpenAI.

CLASS DESCRIPTION
EvaluationProviderOpenAiLike

Class for text evaluation with OpenAI-compatible LLM provider.

EvaluationProviderOpenAiLike

Class for text evaluation with OpenAI-compatible LLM provider.

ATTRIBUTE DESCRIPTION
llm

Definition of the LLM configuration.

TYPE: LLM

client

OpenAI-like client performing chat completion.

TYPE: OpenAI

METHOD DESCRIPTION
evaluate

Take an object of type stras input and return a model-generated readability evaluation as output.

Source code in docs/repositories-clones/evaluation/src/openai_evaluator.py
class EvaluationProviderOpenAiLike:
    """Class for text evaluation with OpenAI-compatible LLM provider.

    Attributes:
        llm (LLM): Definition of the LLM configuration.
        client (OpenAI): OpenAI-like client performing chat completion.

    Methods:
        evaluate: Take an object of type stras input and return a model-generated readability evaluation as output.
    """

    def __init__(self, llm: EvaluationLLM) -> None:
        """Initialise the class.

        Args:
            llm (LLM): Definition of the LLM configuration.
        """
        self.llm: EvaluationLLM = llm
        self.client = OpenAI(
            api_key=llm.api.auth.secret.get_secret_value(), base_url=str(llm.api.url)
        )

    def evaluate(
        self,
        evaluate_input: str,
        system_prompt: str = "evaluate",
        prompt_input: str | list | None = None,
    ) -> str:
        """Take a string as input and return a model-generated evaluation of the text as output.

        Args:
            evaluate_input (str): Input to the model.
            system_prompt (str): Key of the evaluation prompt
            prompt_input (str | list | None): Prompt input

        Returns:
            str: Model-generated response text.
        """
        system_prompt_content = self.llm.prompt_config.system_prompts.dict().get(system_prompt)
        if system_prompt_content:
            if system_prompt in ["evaluate_task", "evaluate_hallucination"]:
                system_prompt_content = system_prompt_content.format(prompt_input=prompt_input)
            elif system_prompt == "evaluate_task_comparison":
                system_prompt_content = system_prompt_content.format(
                    prompt_input_1=prompt_input[0], prompt_input_2=prompt_input[1]
                )
            else:
                pass
        messages = [
            {"role": "system", "content": system_prompt_content},
            {"role": "user", "content": evaluate_input},
        ]
        logger.info(f"Messages: {messages}")
        response: ChatCompletion = self._generate(messages, response_format="text")
        content: str = response.choices[0].message.content  # type: ignore

        logger.info(f"Content: {content}")

        return content

    def _generate(self, messages: list, response_format: str = "text") -> ChatCompletion:
        """Take a list of messages as input and return a model-generated message as output.

        Args:
            messages (list): Messages as input to the model.
            response_format (str): Format of the response.

        Returns:
            ChatCompletion: Model-generated response.
        """
        try:
            response: ChatCompletion = self.client.chat.completions.create(
                model=self.llm.label,
                messages=messages,
                response_format={"type": response_format},
                frequency_penalty=self.llm.inference.frequency_penalty,
                presence_penalty=self.llm.inference.presence_penalty,
                temperature=self.llm.inference.temperature,
                top_p=self.llm.inference.top_p,
                stream=False,
            )
            logger.debug(f"Response from LLM-Client: {response}")

        except Exception as e:
            msg = f"{self.llm.label} API call of Chat-Completion to LLM failed: {e}"
            logger.error(msg)
            raise RuntimeError(msg) from e

        return response
evaluate
evaluate(evaluate_input, system_prompt='evaluate', prompt_input=None)

Take a string as input and return a model-generated evaluation of the text as output.

PARAMETER DESCRIPTION
evaluate_input

Input to the model.

TYPE: str

system_prompt

Key of the evaluation prompt

TYPE: str DEFAULT: 'evaluate'

prompt_input

Prompt input

TYPE: str | list | None DEFAULT: None

RETURNS DESCRIPTION
str

Model-generated response text.

TYPE: str

Source code in docs/repositories-clones/evaluation/src/openai_evaluator.py
def evaluate(
    self,
    evaluate_input: str,
    system_prompt: str = "evaluate",
    prompt_input: str | list | None = None,
) -> str:
    """Take a string as input and return a model-generated evaluation of the text as output.

    Args:
        evaluate_input (str): Input to the model.
        system_prompt (str): Key of the evaluation prompt
        prompt_input (str | list | None): Prompt input

    Returns:
        str: Model-generated response text.
    """
    system_prompt_content = self.llm.prompt_config.system_prompts.dict().get(system_prompt)
    if system_prompt_content:
        if system_prompt in ["evaluate_task", "evaluate_hallucination"]:
            system_prompt_content = system_prompt_content.format(prompt_input=prompt_input)
        elif system_prompt == "evaluate_task_comparison":
            system_prompt_content = system_prompt_content.format(
                prompt_input_1=prompt_input[0], prompt_input_2=prompt_input[1]
            )
        else:
            pass
    messages = [
        {"role": "system", "content": system_prompt_content},
        {"role": "user", "content": evaluate_input},
    ]
    logger.info(f"Messages: {messages}")
    response: ChatCompletion = self._generate(messages, response_format="text")
    content: str = response.choices[0].message.content  # type: ignore

    logger.info(f"Content: {content}")

    return content

request_handling

Utility function for making robust synchronous HTTP POST requests with retries and error handling.

FUNCTION DESCRIPTION
post_with_retries

Makes a synchronous POST request with retries via httpx and parses the response into a Pydantic model.

post_with_retries
post_with_retries(url, response_model, config, request_options, service_name='service')

Makes a synchronous POST request with retries via httpx and parses the response into a Pydantic model.

PARAMETER DESCRIPTION
url

The URL to post to.

TYPE: str

response_model

Expected Pydantic response model.

TYPE: Type[ServiceApiOutput]

config

Configuration with timeout and max_attempts.

TYPE: PostConfig

request_options

Data and headers for the request.

TYPE: dict

service_name

Name of the service (for logging).

TYPE: str DEFAULT: 'service'

RETURNS DESCRIPTION
ServiceApiOutput | list[ServiceApiOutput]

An instance (or list) of the given response model populated with the response data.

RAISES DESCRIPTION
RuntimeError

If all retries fail or response validation fails.

Source code in docs/repositories-clones/evaluation/src/request_handling.py
def post_with_retries(
    url: str,
    response_model: type[ServiceApiOutput],
    config: PostConfig,
    request_options: dict[str, Any],
    service_name: str = "service",
) -> ServiceApiOutput | list[ServiceApiOutput]:
    """Makes a synchronous POST request with retries via httpx and parses the response into a Pydantic model.

    Args:
        url (str): The URL to post to.
        response_model (Type[ServiceApiOutput]): Expected Pydantic response model.
        config (PostConfig): Configuration with timeout and max_attempts.
        request_options (dict): Data and headers for the request.
        service_name (str): Name of the service (for logging).

    Returns:
        An instance (or list) of the given response model populated with the response data.

    Raises:
        RuntimeError: If all retries fail or response validation fails.
    """
    logger.debug(f"Communication with {service_name} configured using {config}")

    for attempt in range(1, config.max_attempts + 1):
        if attempt > 1:
            logger.warning(f"Retrying request to {service_name} ({attempt}/{config.max_attempts})")

        try:
            with httpx.Client(timeout=config.timeout_in_s) as client:
                response = client.post(url, **request_options)

            try:
                result_dict = response.json()
            except ValueError as e:
                msg = f"{service_name} returned invalid JSON. Response: {response.text}"
                logger.error(msg)
                raise RuntimeError(msg) from e

            if response.status_code == HTTPStatus.OK:
                logger.debug(f"Response from {service_name}: {result_dict}")
                try:
                    if isinstance(result_dict, list):
                        return [response_model.model_validate(item) for item in result_dict]
                    return response_model.model_validate(result_dict)
                except (TypeError, ValidationError) as e:
                    msg = f"Invalid response structure for {service_name}: {e}"
                    logger.error(msg)
                    raise RuntimeError(msg) from e

            else:
                msg = (
                    f"{service_name} failed with status {response.status_code}. "
                    f"Response: {response.text}"
                )
                logger.error(msg)
                raise RuntimeError(msg)

        except httpx.RequestError as e:
            if attempt < config.max_attempts:
                logger.warning(
                    f"Request to {service_name} failed (attempt {attempt}): {e}. Retrying..."
                )
                time.sleep(3)
            else:
                msg = (
                    f"Could not connect to {service_name} after {config.max_attempts} attempts: {e}"
                )
                logger.critical(msg)
                raise RuntimeError(msg) from e

settings

Loads the configuration and makes it accessible for other modules.

summarize

Script for extended and summarized evaluation.

FUNCTION DESCRIPTION
barplot_mean_limits

Plots a DataFrame as a bar chart.

merge_evaluation_results

Merges all evaluations stored in subfolders of the input folders.

summarize

The main function to run the summarization.

summarize_evaluation_results

Create and save summary plots and tables to the given output folder.

barplot_mean_limits
barplot_mean_limits(df, save_path=None)

Plots a DataFrame as a bar chart.

Includes minimum and maximum values and optional saving to file.

PARAMETER DESCRIPTION
df

Data frame containing data for plotting.

TYPE: DataFrame

save_path

Optional path to save the plot (default is None).

TYPE: Path DEFAULT: None

Source code in docs/repositories-clones/evaluation/src/summarize.py
def barplot_mean_limits(df: pd.DataFrame, save_path: Path | None = None) -> None:
    """Plots a DataFrame as a bar chart.

    Includes minimum and maximum values and optional saving to file.

    Args:
        df (pd.DataFrame): Data frame containing data for plotting.
        save_path (Path): Optional path to save the plot (default is None).
    """
    # Plot DataFrame using matplotlib by creating an Axes object, `ax`
    yerr = np.stack(
        [
            (df.mean() - df.min()).values,
            (df.max() - df.mean()).values,
        ],
        axis=1,
    )
    # ensure positive values (sometimes very small negative values occur
    # due to numerical errors)
    yerr = np.maximum(0, yerr)
    df.mean().T.plot(
        kind="bar",
        figsize=(9, 4),
        width=0.8,
        yerr=yerr,
        capsize=2,
    )

    # Setting chart attributes
    plt.title("Evaluation")
    plt.ylabel(r"Values $[\uparrow]$")
    plt.xticks(rotation=65)  # Rotate x-axis labels if necessary for readability

    # Setting legend position and title
    plt.legend(bbox_to_anchor=(1.04, 1), loc="upper left", title="Models", fontsize="small")

    # Adjusting the layout to fit all elements properly
    plt.subplots_adjust(bottom=0.35, right=0.65)
    plt.grid(visible=True)

    if save_path:
        # Save plot as an image file
        plt.savefig(save_path, dpi=300)
        plt.close()  # Close the plot to free memory if stored in a file.
    else:
        # Display the plot directly if no save path provided
        plt.show()
merge_evaluation_results
merge_evaluation_results(input_folders, *, combined_experiments)

Merges all evaluations stored in subfolders of the input folders.

PARAMETER DESCRIPTION
input_folders

The input folders with the experiment results.

TYPE: list[Path]

combined_experiments

If the merged results are from multiple experiments.

TYPE: bool

RETURNS DESCRIPTION
merged_evaluation_df

The data frame containing the evaluation results for all transformations, as well as the calculated score.

TYPE: DataFrame

Source code in docs/repositories-clones/evaluation/src/summarize.py
def merge_evaluation_results(
    input_folders: list[Path],
    *,
    combined_experiments: bool,
) -> pd.DataFrame:
    """Merges all evaluations stored in subfolders of the input folders.

    Args:
        input_folders (list[Path]): The input folders with the experiment results.
        combined_experiments (bool): If the merged results are from multiple experiments.

    Returns:
        merged_evaluation_df (pd.DataFrame): The data frame containing the
            evaluation results for all transformations, as well as the
            calculated score.
    """
    llm_results_files = get_files_with_name(input_folders, settings.evaluation_output_filename)

    eval_files = []
    for file_path in llm_results_files:
        with open(
            file_path.parent / settings.transformation_metadata_filename,
            encoding="utf-8",
        ) as metadata_file:
            metadata = yaml.safe_load(metadata_file)
            label = (
                metadata.get("label_combined_experiments", metadata["label"])
                if combined_experiments
                else metadata["label"]
            )

        dataset_name = file_path.parent.parent.parent.name
        eval_df = pd.read_csv(file_path, sep=settings.csv_separator)
        eval_df["Run"] = file_path.parent.name
        eval_df[settings.transformation_label_column] = label
        eval_df["Dataset"] = dataset_name
        eval_files.append(eval_df)

    merged_evaluation_df = pd.concat(eval_files)

    return merged_evaluation_df
summarize
summarize(input_folders, output_folder)

The main function to run the summarization.

The parameters are provided by configuration files.

PARAMETER DESCRIPTION
input_folders

The paths to the folders with the results.

TYPE: list[Path]

output_folder

The path to the folder were the results of this run are stored.

TYPE: Path

Source code in docs/repositories-clones/evaluation/src/summarize.py
def summarize(input_folders: list[Path], output_folder: Path) -> None:
    """The main function to run the summarization.

    The parameters are provided by configuration files.

    Args:
        input_folders (list[Path]): The paths to the folders with the results.
        output_folder (Path): The path to the folder were the results of this run
            are stored.
    """
    # Mappings
    ordered_transformation_labels = [t.label for t in settings.transformations.values()]

    combined_experiments = len(input_folders) > 1

    output_folder.mkdir(parents=True, exist_ok=True)
    merged_df = merge_evaluation_results(
        input_folders,
        combined_experiments=combined_experiments,
    )
    summarize_evaluation_results(
        output_folder,
        merged_df,
        ordered_transformation_labels,
        combined_experiments=combined_experiments,
    )
summarize_evaluation_results
summarize_evaluation_results(output_folder, merged_evaluation_df, ordered_transformation_labels, *, combined_experiments)

Create and save summary plots and tables to the given output folder.

PARAMETER DESCRIPTION
output_folder

The folder to store the summarized results.

TYPE: Path

merged_evaluation_df

The data frame with the evaluation results for all transformations.

TYPE: DataFrame

ordered_transformation_labels

The labels of the transformations in the order they will appear in the output.

TYPE: list

combined_experiments

If the merged results are from multiple experiments.

TYPE: bool

Source code in docs/repositories-clones/evaluation/src/summarize.py
def summarize_evaluation_results(
    output_folder: list[Path],
    merged_evaluation_df: pd.DataFrame,
    ordered_transformation_labels: list[str],
    *,
    combined_experiments: bool,
) -> None:
    """Create and save summary plots and tables to the given output folder.

    Args:
        output_folder (Path): The folder to store the summarized results.
        merged_evaluation_df (pd.DataFrame): The data frame with the evaluation
            results for all transformations.
        ordered_transformation_labels (list): The labels of the transformations
            in the order they will appear in the output.
        combined_experiments (bool): If the merged results are from multiple experiments.
    """
    if not combined_experiments:
        # restore the order given in the config, not implemented for multiple experiments
        merged_evaluation_df[settings.transformation_label_column] = pd.Categorical(
            merged_evaluation_df[settings.transformation_label_column],
            categories=ordered_transformation_labels,
            ordered=True,
        )
        merged_evaluation_df = merged_evaluation_df.sort_values(
            settings.transformation_label_column
        )

    # rename columns from identifiers to human-readable labels
    merged_evaluation_df = merged_evaluation_df.rename(columns=settings.map)

    # store as CSV and Excel
    print("Saving detailed results to file:")
    detailed_results_csv_filepath = output_folder / "detailed_results.csv"
    print(f"- CSV: '{detailed_results_csv_filepath}'")
    merged_evaluation_df.to_csv(detailed_results_csv_filepath, sep=settings.csv_separator)
    detailed_results_excel_filepath = output_folder / "detailed_results.xlsx"
    print(f"- Excel: '{detailed_results_excel_filepath}'")
    merged_evaluation_df.to_excel(detailed_results_excel_filepath)

    # select numeric and boolean columns for summary output
    numeric_bool_columns = merged_evaluation_df.select_dtypes(
        include=["number", "bool"]
    ).columns.tolist()
    # score is specified manually below so that it is always included, remove it here
    score_name = settings.map.get("Score", "Score")
    if score_name in numeric_bool_columns:
        numeric_bool_columns.remove(score_name)

    selected_columns = [
        "Run",
        settings.transformation_label_column,
        "Dataset",
        score_name,
    ] + numeric_bool_columns
    # filter out non-existing columns (e.g., Score might be missing)
    selected_columns = [name for name in selected_columns if name in merged_evaluation_df.columns]
    # calculate mean step by step, relevant if the number of replications
    # or number of examples differs between datasets
    run_vals = (
        # select columns for grouping and the relevant numeric values to average
        merged_evaluation_df[selected_columns]
        # average over examples (mean of a given dataset, model and replication)
        .groupby(by=["Run", settings.transformation_label_column, "Dataset"], observed=True)
        .mean()
        # average over datasets (mean of a replication and model)
        .groupby(by=[settings.transformation_label_column, "Run"], observed=True)
        .mean()
    )

    # save barplot of the averaged results
    barplot_filepath = output_folder / "summary.png"
    print(f"Saving summarized results as barplot to '{barplot_filepath}'")
    barplot_mean_limits(
        run_vals.groupby(by=[settings.transformation_label_column], observed=True),
        barplot_filepath,
    )

    print("Saving summary table:")
    run_vals = run_vals.groupby(by=[settings.transformation_label_column], observed=True)
    summary_table_markdown_filepath = output_folder / "summary.md"
    print(f"- Markdown: {summary_table_markdown_filepath}")
    save_evaluation_to_md(run_vals, summary_table_markdown_filepath, score_name=score_name)

    # store statistics calculated over the replications
    run_vals.describe().to_csv(
        output_folder / "summary_statistics_replications.csv",
        sep=settings.csv_separator,
    )
    run_vals.describe().to_excel(output_folder / "summary_statistics_replications.xlsx")

    # store averaged results as CSV and Excel
    summary_table_csv_filepath = output_folder / "summary.csv"
    print(f"- CSV: {summary_table_csv_filepath}")
    run_vals.mean().to_csv(summary_table_csv_filepath, sep=settings.csv_separator)
    summary_table_excel_filepath = output_folder / "summary.xlsx"
    print(f"- Excel: {summary_table_excel_filepath}")
    run_vals.mean().to_excel(summary_table_excel_filepath)

transform

Script for LLM-Evaluation.

FUNCTION DESCRIPTION
copy_column

Copies the texts in the input_column of a DataFrame to output_column.

execute_transform

Transform the text and save original and transformed text to disk.

generate

Sends a SimplifyInput object to a backend service and returns the simplified text.

transform

The main function to run the transformation.

transform_column

Transforms the texts in the input_column of a DataFrame using the backend.

copy_column
copy_column(df, *, input_column, output_column)

Copies the texts in the input_column of a DataFrame to output_column.

PARAMETER DESCRIPTION
df

The DataFrame containing the texts to be transformed in the input_column.

TYPE: DataFrame

input_column

The name of the column to be transformed.

TYPE: str

output_column

The name of the column where the result is stored.

TYPE: str

RETURNS DESCRIPTION
DataFrame

pd.DataFrame: The updated DataFrame with a new output_column containing the copied texts.

RAISES DESCRIPTION
ValueError

The input_column is not in the provided data frame.

Source code in docs/repositories-clones/evaluation/src/transform.py
def copy_column(
    df: pd.DataFrame,
    *,
    input_column: str,
    output_column: str,
) -> pd.DataFrame:
    """Copies the texts in the input_column of a DataFrame to output_column.

    Args:
        df (pd.DataFrame): The DataFrame containing the texts to be transformed in the input_column.
        input_column (str): The name of the column to be transformed.
        output_column (str): The name of the column where the result is stored.

    Returns:
        pd.DataFrame: The updated DataFrame with a new output_column containing the copied texts.

    Raises:
        ValueError: The input_column is not in the provided data frame.
    """
    print("Copying...")
    try:
        df[output_column] = df[input_column]
        return df
    except KeyError as e:
        raise ValueError(
            f"The column to copy ('{input_column}') is not present in the provided data frame.\n"
            f"Available columns are: {list(df.columns)}.\nPlease adapt your configuration "
            "or add the column to your data file."
        ) from e
execute_transform
execute_transform(*, name, label, data_filename, results_folder, transform_fn)

Transform the text and save original and transformed text to disk.

PARAMETER DESCRIPTION
name

The name of the transformation.

TYPE: str

label

The label of the transformation for end users.

TYPE: str

data_filename

The name of the data file in the data folder, without the .csv file ending.

TYPE: str

results_folder

The folder to store the transformed texts.

TYPE: Path

transform_fn

A transform function that receives the original dataframe and returns the modified dataframe. Usually the function adds a column with the transformed text.

TYPE: Callable

Source code in docs/repositories-clones/evaluation/src/transform.py
def execute_transform(
    *,
    name: str,
    label: str,
    data_filename: str,
    results_folder: Path,
    transform_fn: Callable,
) -> None:
    """Transform the text and save original and transformed text to disk.

    Args:
        name (str): The name of the transformation.
        label (str): The label of the transformation for end users.
        data_filename (str): The name of the data file in the data folder, without
            the .csv file ending.
        results_folder (Path): The folder to store the transformed texts.
        transform_fn (Callable): A transform function that receives the original
            dataframe and returns the modified dataframe. Usually the function
            adds a column with the transformed text.
    """
    # read the input data
    data_folder = Path("data")
    csv_data_path = data_folder / f"{data_filename}.csv"
    xlsx_data_path = data_folder / f"{data_filename}.xlsx"
    if csv_data_path.is_file():
        df = pd.read_csv(csv_data_path, delimiter=settings.csv_separator)
        data_path = csv_data_path
        if xlsx_data_path.is_file():
            print(
                f"{data_filename}: Both CSV ({csv_data_path}) and Excel ({xlsx_data_path}) files found. Using CSV..."
            )
    elif xlsx_data_path.is_file():
        df = pd.read_excel(xlsx_data_path)
        data_path = xlsx_data_path
    else:
        raise ValueError(
            f"Could not find input file for data filename {data_filename}. "
            f"Neither '{csv_data_path}' nor '{xlsx_data_path}' exist."
        )

    # execute the transformation
    print(f"Transforming text for: {name}...")
    transformed_df = transform_fn(df=df)

    if transformed_df is None:
        print("Transformation did not succeed.")
        return

    # ensure that the output folder exists
    results_folder.mkdir(parents=True, exist_ok=True)

    # store relevant metadata
    metadata = {
        "name": name,
        "label": label,
        # more detailed label to distinguish between experiments with the same name
        "label_combined_experiments": f"{label} ({settings.experiment_name} - {settings.experiment_suffix})",
        "data_path": str(data_path),
    }

    with open(
        results_folder / settings.transformation_metadata_filename,
        "w",
        encoding="utf-8",
    ) as file:
        yaml.dump(metadata, file, default_flow_style=False)

    transformed_data_filepath = results_folder / settings.transformed_data_filename
    print(f"Saving original and transformed text to '{transformed_data_filepath}'")
    save_dataframe(transformed_df, transformed_data_filepath)
generate
generate(input_text, endpoint)

Sends a SimplifyInput object to a backend service and returns the simplified text.

The function serializes the input Pydantic model to JSON, constructs the full URL from a base URL and the given endpoint, and performs a synchronous HTTP POST request with retries. The response is validated against the SimplifyOutput model.

PARAMETER DESCRIPTION
input_text

The input data to be simplified, as a Pydantic model.

TYPE: SimplifyInput

endpoint

The API endpoint to append to the backend base URL.

TYPE: str

RETURNS DESCRIPTION
str

The simplified text returned by the backend service.

TYPE: str

Source code in docs/repositories-clones/evaluation/src/transform.py
def generate(input_text: SimplifyInput, endpoint: str) -> str:
    """Sends a SimplifyInput object to a backend service and returns the simplified text.

    The function serializes the input Pydantic model to JSON, constructs the full
    URL from a base URL and the given endpoint, and performs a synchronous HTTP POST
    request with retries. The response is validated against the SimplifyOutput model.

    Args:
        input_text (SimplifyInput): The input data to be simplified, as a Pydantic model.
        endpoint (str): The API endpoint to append to the backend base URL.

    Returns:
        str: The simplified text returned by the backend service.
    """
    url = urljoin(str(settings.backend_endpoint), endpoint)
    json = input_text.model_dump()

    post_config = PostConfig()

    headers = {"Content-Type": "application/json", "Accept": "application/json"}

    request_options = {
        "json": json,
        "headers": headers,
    }

    result = post_with_retries(
        url=url,
        response_model=SimplifyOutput,
        config=post_config,
        request_options=request_options,
        service_name="Chat",
    )

    return result.simplified_text
transform
transform(results_folder)

The main function to run the transformation.

The parameters are provided by configuration files.

PARAMETER DESCRIPTION
results_folder

The path to the folder were the results of this run are stored.

TYPE: Path

Source code in docs/repositories-clones/evaluation/src/transform.py
def transform(results_folder: Path) -> None:
    """The main function to run the transformation.

    The parameters are provided by configuration files.

    Args:
        results_folder (Path): The path to the folder were the results of this run
            are stored.
    """
    input_column = settings.input_column_name
    output_column = settings.output_column_name

    transformations = {}
    for transformation_name, transformation in settings.transformations.items():
        match transformation.type:
            case "manual":
                transformations[transformation_name] = {
                    "function": partial(
                        copy_column,
                        input_column=transformation.column,
                        output_column=output_column,
                    ),
                    "label": transformation.label,
                }
            case "backend":
                transformations[transformation_name] = {
                    "function": partial(
                        transform_column,
                        endpoint="/simplify",
                        language_model=transformation.model_name,
                        input_column=input_column,
                        output_column=output_column,
                    ),
                    "label": transformation.label,
                }
            case _:
                print(f"Invalid configuration. Unknown transformation type '{transformation.type}'")
                sys.exit(1)

    results_folder.mkdir(parents=True, exist_ok=True)

    for name, transformation in transformations.items():
        for data_file in settings.data_files:
            for replication in range(settings.replications):
                execute_transform(
                    name=name,
                    label=transformation["label"],
                    data_filename=data_file,
                    results_folder=results_folder / data_file / name / f"{replication + 1:02}",
                    transform_fn=transformation["function"],
                )
transform_column
transform_column(df, endpoint, language_model, *, input_column, output_column)

Transforms the texts in the input_column of a DataFrame using the backend.

The function applies the text transformation to each entry in the input_column by converting the text into a SimplifyInput object and using the generate function to generate the transformed version of the text. The result is stored in a new column output_column.

PARAMETER DESCRIPTION
df

The DataFrame containing the texts to be transformed in the input_column.

TYPE: DataFrame

endpoint

Backend endpoint used to perform generation.

TYPE: str

language_model

The LLM from the backend used to generate text. Needs to be listed as active LLM in the backend configuration.

TYPE: str

input_column

The name of the column to be transformed.

TYPE: str

output_column

The name of the column where the result is stored.

TYPE: str

RETURNS DESCRIPTION
DataFrame

pd.DataFrame: The updated DataFrame with a new output_column containing the simplified texts.

Source code in docs/repositories-clones/evaluation/src/transform.py
def transform_column(
    df: pd.DataFrame,
    endpoint: str,
    language_model: str,
    *,
    input_column: str,
    output_column: str,
) -> pd.DataFrame:
    """Transforms the texts in the input_column of a DataFrame using the backend.

    The function applies the text transformation to each entry in the input_column
    by converting the text into a `SimplifyInput` object and using the `generate` function to generate
    the transformed version of the text. The result is stored in a new column output_column.

    Args:
        df (pd.DataFrame): The DataFrame containing the texts to be transformed in the input_column.
        endpoint (str): Backend endpoint used to perform generation.
        language_model (str): The LLM from the backend used to generate text. Needs to be listed
            as active LLM in the backend configuration.
        input_column (str): The name of the column to be transformed.
        output_column (str): The name of the column where the result is stored.

    Returns:
        pd.DataFrame: The updated DataFrame with a new output_column containing the simplified texts.
    """
    print("Transforming...")

    def parse_to_simplify_input(input_text: str) -> SimplifyInput:
        return SimplifyInput(input_text=input_text, language_model=language_model)

    df[output_column] = (
        df[input_column]
        .apply(parse_to_simplify_input)
        .progress_apply(lambda x: generate(x, endpoint=endpoint))
    )

    return df

utils

Utility functions.

FUNCTION DESCRIPTION
calculate_weighted_average

Calculates the weighted average for a given weighting.

convert_str_to_bool

Converts a string to a Boolean value.

get_files_with_name

Finds all files with a given filename in a directory and its subdirectories.

save_dataframe

Saves the given DataFrame as a CSV file.

save_dataframe_to_md

Saves a dataframe to a Markdown file.

save_evaluation_to_md

Saves a dataframe with mean, min and max to a Markdown file.

sort_columns_semi_manual

Arranges the data frame with the columns in front first and the rest sorted alphabetically.

calculate_weighted_average
calculate_weighted_average(df, weighting, result_column)

Calculates the weighted average for a given weighting.

The average is stored in a new column in-place.

PARAMETER DESCRIPTION
df

The dataframe containing the values

TYPE: DataFrame

weighting

The weighting as dictionary with the column names as keys and the corresponding weights as values (weights do not have to sum to one).

TYPE: dict[str, float]

result_column

Name of the column to store the result in.

TYPE: str

Source code in docs/repositories-clones/evaluation/src/utils.py
def calculate_weighted_average(
    df: pd.DataFrame, weighting: dict[str, float], result_column: str
) -> None:
    """Calculates the weighted average for a given weighting.

    The average is stored in a new column in-place.

    Args:
        df (pd.DataFrame): The dataframe containing the values
        weighting (dict[str, float]): The weighting as dictionary with the column
            names as keys and the corresponding weights as values (weights do not have
            to sum to one).
        result_column (str): Name of the column to store the result in.
    """
    weighting_keys = list(weighting.keys())
    weighting_values = list(weighting.values())
    df[result_column] = df[weighting_keys].mul(weighting).sum(axis=1) / sum(weighting_values)
convert_str_to_bool
convert_str_to_bool(value)

Converts a string to a Boolean value.

If the string is "true" (case-insensitive), True is returned. If the string is "false" (case-insensitive), False is returned. For all other input values, None is returned.

PARAMETER DESCRIPTION
value

The value as a string that should be converted to a Boolean.

TYPE: str

RETURNS DESCRIPTION
bool | None

bool or None: The converted Boolean value (True or False), or None if the string contains a different value.

Source code in docs/repositories-clones/evaluation/src/utils.py
def convert_str_to_bool(value: str) -> bool | None:
    """Converts a string to a Boolean value.

    If the string is "true" (case-insensitive), `True` is returned.
    If the string is "false" (case-insensitive), `False` is returned.
    For all other input values, `None` is returned.

    Args:
        value (str): The value as a string that should be converted to a Boolean.

    Returns:
        bool or None: The converted Boolean value (`True` or `False`), or `None`
            if the string contains a different value.
    """
    if value.lower() == "true":
        return True
    if value.lower() == "false":
        return False
    return None
get_files_with_name
get_files_with_name(directory, filename)

Finds all files with a given filename in a directory and its subdirectories.

PARAMETER DESCRIPTION
directory

The root directory/directories of the search.

TYPE: Path or list[Path]

filename

The filename to search for.

TYPE: str

RETURNS DESCRIPTION
list[Path]

list of Path: The files with name filename.

Source code in docs/repositories-clones/evaluation/src/utils.py
def get_files_with_name(directory: Path | list[Path], filename: str) -> list[Path]:
    """Finds all files with a given filename in a directory and its subdirectories.

    Args:
        directory (Path or list[Path]): The root directory/directories of the search.
        filename (str): The filename to search for.

    Returns:
        list of Path: The files with name filename.
    """
    paths = []

    if isinstance(directory, Path):
        directory = [directory]
    for root_dir in directory:
        for subdir, _, files in Path(root_dir).walk():
            if filename in files:
                paths.append(subdir / filename)
    return paths
save_dataframe
save_dataframe(df, filepath)

Saves the given DataFrame as a CSV file.

PARAMETER DESCRIPTION
df

The DataFrame to be saved.

TYPE: DataFrame

filepath

The path to save the dataframe.

TYPE: Path

Source code in docs/repositories-clones/evaluation/src/utils.py
def save_dataframe(df: pd.DataFrame, filepath: Path) -> None:
    """Saves the given DataFrame as a CSV file.

    Args:
        df (pd.DataFrame): The DataFrame to be saved.
        filepath (Path): The path to save the dataframe.
    """
    df.to_csv(
        filepath,
        index=False,
        encoding="utf-8",
        sep=settings.csv_separator,
    )
save_dataframe_to_md
save_dataframe_to_md(df, file_path, score_name='Score')

Saves a dataframe to a Markdown file.

PARAMETER DESCRIPTION
df

The dataframe to save.

TYPE: DataFrame

file_path

The location to save it.

TYPE: Path

score_name

The name of the score column which is listed first.

TYPE: str DEFAULT: 'Score'

Source code in docs/repositories-clones/evaluation/src/utils.py
def save_dataframe_to_md(df: pd.DataFrame, file_path: Path, score_name: str = "Score") -> None:
    """Saves a dataframe to a Markdown file.

    Args:
        df (pd.DataFrame): The dataframe to save.
        file_path (Path): The location to save it.
        score_name (str): The name of the score column which is listed first.
    """
    df = sort_columns_semi_manual(df, [score_name])
    tab = df.fillna("--").to_markdown(floatfmt=".2f")

    # Save to file
    with open(file_path, "w", encoding="utf-8") as f:
        f.write(tab + "\n")
save_evaluation_to_md
save_evaluation_to_md(grouped_df, file_path, score_name='Score')

Saves a dataframe with mean, min and max to a Markdown file.

PARAMETER DESCRIPTION
grouped_df

The DataFrameGroupBy or DataFrame object to save.

TYPE: DataFrameGroupBy | DataFrame

file_path

The location to save it.

TYPE: Path

score_name

The name of the score column which is listed first.

TYPE: str DEFAULT: 'Score'

Source code in docs/repositories-clones/evaluation/src/utils.py
def save_evaluation_to_md(
    grouped_df: pd.core.groupby.generic.DataFrameGroupBy | pd.DataFrame,
    file_path: Path,
    score_name: str = "Score",
) -> None:
    """Saves a dataframe with mean, min and max to a Markdown file.

    Args:
        grouped_df (pd.core.groupby.generic.DataFrameGroupBy | pd.DataFrame):
            The DataFrameGroupBy or DataFrame object to save.
        file_path (Path): The location to save it.
        score_name (str): The name of the score column which is listed first.
    """
    stats_df = grouped_df.describe(percentiles=[])
    stats_df = stats_df.drop(["count", "std"], axis=1, level=1)

    mean_df = grouped_df.mean()
    for col in mean_df.columns:
        if col not in stats_df.columns.get_level_values(0):
            continue

        mean_df[col] = [
            f"{stats_df[col].loc[row]['mean']:.2f} "
            f"({stats_df[col].loc[row]['min']:.2f} - "
            f"{stats_df[col].loc[row]['max']:.2f})"
            for row in stats_df[col].index
        ]
    save_dataframe_to_md(mean_df, file_path, score_name)
sort_columns_semi_manual
sort_columns_semi_manual(df, front=(), **kwargs)

Arranges the data frame with the columns in front first and the rest sorted alphabetically.

The function filters out non-existing columns in front.

PARAMETER DESCRIPTION
df

The data frame to sort.

TYPE: DataFrame

front

The first column names in the desired order.

TYPE: list of strings DEFAULT: ()

kwargs

additional arguments passed to the sorted function.

TYPE: dict DEFAULT: {}

RETURNS DESCRIPTION
DataFrame

pd.DataFrame: Data frame with the reordered columns.

Source code in docs/repositories-clones/evaluation/src/utils.py
def sort_columns_semi_manual(
    df: pd.DataFrame, front: list[str] = (), **kwargs: dict
) -> pd.DataFrame:
    """Arranges the data frame with the columns in front first and the rest sorted alphabetically.

    The function filters out non-existing columns in `front`.

    Args:
        df (pd.DataFrame): The data frame to sort.
        front (list of strings): The first column names in the desired order.
        kwargs (dict): additional arguments passed to the sorted function.

    Returns:
        pd.DataFrame: Data frame with the reordered columns.
    """
    # filter non-existing columns
    front = [col for col in front if col in df.columns]
    cols = list(front) + sorted([col for col in df.columns if col not in front], **kwargs)
    return df[cols]