API Reference¶

This document provides detailed API documentation for the QPERA project components. It is generated based on the source code to ensure accuracy.

1. Main Entry Point¶

`qpera.main`¶

This is the central orchestration module for running all experiments. It parses command-line arguments to select and execute predefined experiment configurations.

Key Function:

def main()

The main CLI entry point that parses arguments (--algo, --dataset, --privacy, etc.) to select and run experiment configurations from the EXPERIMENT_CONFIGS list.

Experiment Configuration: The core of this module is the EXPERIMENT_CONFIGS list, which defines the matrix of experiments to be run. Each entry is a dictionary specifying the algorithm, module, function, and dataset.

# Located in qpera/main.py
EXPERIMENT_CONFIGS = [
    {"algo": "CBF", "module": CBF, "func": "cbf_experiment_loop", "dataset": "movielens"},
    {"algo": "CF", "module": CF, "func": "cf_experiment_loop", "dataset": "movielens"},
    {"algo": "RL", "module": RL, "func": "rl_experiment_loop", "dataset": "movielens", "rows": 14000},
    # ... and more combinations for amazonsales and postrecommendations
]

2. Algorithm Implementations¶

This section details the core recommendation algorithm experiment loops.

Collaborative Filtering¶

`qpera.CF`¶

Implements the collaborative filtering experiment loop using the Cornac library.

Main Function:

def cf_experiment_loop(
    TOP_K: int,
    dataset: str,
    want_col: list,
    # ... and other parameters for privacy/personalization
) -> None

- Core Model: cornac.models.BPR (Bayesian Personalized Ranking). - Hyperparameters: k=100 (factors), max_iter=100, learning_rate=0.01. - Metrics Computed: Includes precision, recall, F1, MRR, MAE, RMSE, NDCG, coverage, and personalization scores.

Content-Based Filtering¶

`qpera.CBF`¶

Implements the content-based filtering experiment loop.

Main Function:

def cbf_experiment_loop(
    TOP_K: int,
    dataset: str,
    want_col: list,
    # ... (same parameters as cf_experiment_loop)
) -> None

- Core Model: A custom TfidfRecommender that uses TF-IDF vectorization on item features. - Features: Primarily uses the genres column for similarity calculation.

Reinforcement Learning¶

The RL implementation is distributed across several modules, orchestrated by RL.py.

`qpera.RL`¶

This module contains the main experiment loop for the Reinforcement Learning approach.

Main Function:

def rl_experiment_loop(
    TOP_K: int,
    dataset: str,
    want_col: list,
    # ... (same parameters as cf_experiment_loop)
) -> None

- Orchestration: This function coordinates the entire RL pipeline: data preprocessing, knowledge graph creation, TransE model training, agent training, and evaluation.

3. Data Processing¶

Dataset Loading¶

`qpera.datasets_loader`¶

Handles loading and basic preprocessing of datasets.

Core Function:

def loader(dataset: str, want_col: list, num_rows: int = None, seed: int = 42) -> pd.DataFrame

- Supported Datasets: "movielens", "amazonsales", "postrecommendations". - Functionality: Loads data from CSV, samples if num_rows is specified, and selects columns based on want_col.

Data Manipulation¶

`qpera.data_manipulation`¶

Contains functions to apply privacy and personalization transformations to the data.

Core Functions:

def hide_data_in_dataframe(data, hide_type, columns_to_hide, fraction_to_hide, records_to_hide, seed)

Simulates privacy scenarios by hiding or altering data.

def change_items_in_dataframe(all, data, fraction_to_change, change_rating, seed)

Simulates personalization scenarios by modifying user interaction data.

4. Evaluation & Tracking¶

Metrics¶

`qpera.metrics`¶

A collection of custom and third-party evaluation metrics.

Accuracy & Ranking Metrics: - precision_at_k, recall_at_k, f1, mrr, ndcg_at_k

Coverage & Diversity Metrics: - user_coverage, item_coverage - personalization (based on Jaccard similarity) - intra_list_similarity

Error Metrics: - mae, rmse (from Microsoft Recommenders)

MLflow Integration¶

`qpera.log_mlflow`¶

Handles all logging of experiments to the MLflow Tracking server.

Main Function:

def log_mlflow(
    dataset: str,
    top_k: pd.DataFrame,
    metrics: dict,
    # ... many other parameters for logging context
) -> None

- Experiment Naming: Organizes runs into experiments like "MLflow Collaborative Filtering". - Logged Artifacts: Logs parameters, all computed metrics, model files, and dataset samples to ensure full reproducibility. - Representative Sampling: Uses a _create_representative_sample function to log a stratified sample of the data for inspection.

5. Reinforcement Learning Components¶

This section details the modules specific to the RL-based recommendation approach.

Knowledge Graph Utilities¶

`qpera.rl_utils`¶

Provides constants and helper functions for the RL pipeline.

Key Constants: - DATASET_DIR, TMP_DIR: Define file paths for datasets and temporary RL artifacts. - KG_RELATION: A dictionary defining the structure and valid connections between entities in the knowledge graph. - PATH_PATTERN: Defines valid multi-hop paths for generating recommendations and explanations. - Entity and relation name constants (USERID, WATCHED, etc.).

Key Functions: - save_embed, load_embed: Save/load trained embeddings. - save_kg, load_kg: Save/load the constructed knowledge graph. - save_labels, load_labels: Save/load user-item interaction labels for training/testing. - cleanup_dataset_files: Removes temporary files generated during the RL pipeline.

RL Environment¶

`qpera.rl_kg_env`¶

Defines the reinforcement learning environment built on the knowledge graph.

Core Classes: - KGState: A class to construct the state representation for the agent, combining user embeddings with path history. - BatchKGEnvironment: Manages the agent's interaction with the knowledge graph, including state transitions and rewards, for a batch of users.

RL Agent & Evaluation¶

`qpera.rl_test_agent`¶

Contains the core PPO agent and evaluation logic.

Core Classes & Functions: - ActorCritic(nn.Module): The policy and value network for the PPO agent. - batch_beam_search(...): Performs beam search to generate recommendation paths in the knowledge graph. - run_evaluation(...): Evaluates the generated paths against test data and computes metrics.