API Reference¶
This document provides detailed API documentation for the QPERA project components. It is generated based on the source code to ensure accuracy.
1. Main Entry Point¶
qpera.main
¶
This is the central orchestration module for running all experiments. It parses command-line arguments to select and execute predefined experiment configurations.
Key Function:
The main CLI entry point that parses arguments (--algo
, --dataset
, --privacy
, etc.) to select and run experiment configurations from the EXPERIMENT_CONFIGS
list.
Experiment Configuration:
The core of this module is the EXPERIMENT_CONFIGS
list, which defines the matrix of experiments to be run. Each entry is a dictionary specifying the algorithm, module, function, and dataset.
# Located in qpera/main.py
EXPERIMENT_CONFIGS = [
{"algo": "CBF", "module": CBF, "func": "cbf_experiment_loop", "dataset": "movielens"},
{"algo": "CF", "module": CF, "func": "cf_experiment_loop", "dataset": "movielens"},
{"algo": "RL", "module": RL, "func": "rl_experiment_loop", "dataset": "movielens", "rows": 14000},
# ... and more combinations for amazonsales and postrecommendations
]
2. Algorithm Implementations¶
This section details the core recommendation algorithm experiment loops.
Collaborative Filtering¶
qpera.CF
¶
Implements the collaborative filtering experiment loop using the Cornac library.
Main Function:
def cf_experiment_loop(
TOP_K: int,
dataset: str,
want_col: list,
# ... and other parameters for privacy/personalization
) -> None
cornac.models.BPR
(Bayesian Personalized Ranking).
- Hyperparameters: k=100
(factors), max_iter=100
, learning_rate=0.01
.
- Metrics Computed: Includes precision, recall, F1, MRR, MAE, RMSE, NDCG, coverage, and personalization scores.
Content-Based Filtering¶
qpera.CBF
¶
Implements the content-based filtering experiment loop.
Main Function:
def cbf_experiment_loop(
TOP_K: int,
dataset: str,
want_col: list,
# ... (same parameters as cf_experiment_loop)
) -> None
TfidfRecommender
that uses TF-IDF vectorization on item features.
- Features: Primarily uses the genres
column for similarity calculation.
Reinforcement Learning¶
The RL implementation is distributed across several modules, orchestrated by RL.py
.
qpera.RL
¶
This module contains the main experiment loop for the Reinforcement Learning approach.
Main Function:
def rl_experiment_loop(
TOP_K: int,
dataset: str,
want_col: list,
# ... (same parameters as cf_experiment_loop)
) -> None
3. Data Processing¶
Dataset Loading¶
qpera.datasets_loader
¶
Handles loading and basic preprocessing of datasets.
Core Function:
- Supported Datasets:"movielens"
, "amazonsales"
, "postrecommendations"
.
- Functionality: Loads data from CSV, samples if num_rows
is specified, and selects columns based on want_col
.
Data Manipulation¶
qpera.data_manipulation
¶
Contains functions to apply privacy and personalization transformations to the data.
Core Functions:
def hide_data_in_dataframe(data, hide_type, columns_to_hide, fraction_to_hide, records_to_hide, seed)
4. Evaluation & Tracking¶
Metrics¶
qpera.metrics
¶
A collection of custom and third-party evaluation metrics.
Accuracy & Ranking Metrics:
- precision_at_k
, recall_at_k
, f1
, mrr
, ndcg_at_k
Coverage & Diversity Metrics:
- user_coverage
, item_coverage
- personalization
(based on Jaccard similarity)
- intra_list_similarity
Error Metrics:
- mae
, rmse
(from Microsoft Recommenders)
MLflow Integration¶
qpera.log_mlflow
¶
Handles all logging of experiments to the MLflow Tracking server.
Main Function:
def log_mlflow(
dataset: str,
top_k: pd.DataFrame,
metrics: dict,
# ... many other parameters for logging context
) -> None
"MLflow Collaborative Filtering"
.
- Logged Artifacts: Logs parameters, all computed metrics, model files, and dataset samples to ensure full reproducibility.
- Representative Sampling: Uses a _create_representative_sample
function to log a stratified sample of the data for inspection.
5. Reinforcement Learning Components¶
This section details the modules specific to the RL-based recommendation approach.
Knowledge Graph Utilities¶
qpera.rl_utils
¶
Provides constants and helper functions for the RL pipeline.
Key Constants:
- DATASET_DIR
, TMP_DIR
: Define file paths for datasets and temporary RL artifacts.
- KG_RELATION
: A dictionary defining the structure and valid connections between entities in the knowledge graph.
- PATH_PATTERN
: Defines valid multi-hop paths for generating recommendations and explanations.
- Entity and relation name constants (USERID
, WATCHED
, etc.).
Key Functions:
- save_embed
, load_embed
: Save/load trained embeddings.
- save_kg
, load_kg
: Save/load the constructed knowledge graph.
- save_labels
, load_labels
: Save/load user-item interaction labels for training/testing.
- cleanup_dataset_files
: Removes temporary files generated during the RL pipeline.
RL Environment¶
qpera.rl_kg_env
¶
Defines the reinforcement learning environment built on the knowledge graph.
Core Classes:
- KGState
: A class to construct the state representation for the agent, combining user embeddings with path history.
- BatchKGEnvironment
: Manages the agent's interaction with the knowledge graph, including state transitions and rewards, for a batch of users.
RL Agent & Evaluation¶
qpera.rl_test_agent
¶
Contains the core PPO agent and evaluation logic.
Core Classes & Functions:
- ActorCritic(nn.Module)
: The policy and value network for the PPO agent.
- batch_beam_search(...)
: Performs beam search to generate recommendation paths in the knowledge graph.
- run_evaluation(...)
: Evaluates the generated paths against test data and computes metrics.