[Poster] WebLLM - Adapting Large Language Models for Anti Tracking
Presented at PETS, 2025
Online tracking has become almost ubiquitous on the web, with 95% desktop and 94% mobile websites containing at least one tracker. Users rely on anti-tracking tools to protect themselves from trackers; however, trackers come up with new techniques to bypass detection by these tools, resulting in a need to develop new detection tools. Previous attempts to solve this problem rely heavily on ground truth, extensive feature engineering, and require a new pipeline for each form of tracking.
In this poster, we explore the feasibility of creating a classification framework to solve the web tracking problem. We use LLMs as they show great promise to domain-specific adaptation. First, we explore prompting engineering approaches a variety of models and find them to be insufficient to beat state of the art methods. Then, we develop a novel prompt optimization pipeline using LLM-based feedback, improving on prompt engineering. Finally, we fine tune Google’s GEMMA 3 models with LoRA and achieve better accuracy than state of the art models for . We find that LLMs require very little ground truth and do not require any feature engineering. We test this pipeline for different tracking tasks (URL, query parameter classification and cookie classification) and find it works without modification.