An Introduction to Trackers and the Data They Collect

An Introduction to Trackers and the Data They Collect

2024-09-05 15:00 / data privacy, trackers

Trackers are everywhere and come in many different forms. Some tracking methods are more invasive than others; this post aims to explain what "trackers" are, how they work, and give examples of the data they collect.

TABLE OF CONTENTS

A word on Fingerprinting

Fingerprinting, tracking, and trackers go hand-in-hand. Generally, fingerprinting focuses on reliably identifying unique users - this identity is often then used by tracking mechanisms.

Fingerprinting uses the information found on/sent by your device to identify you and subsequently track you. Identification by unique properties such as browser settings, device information, and connected networks (to include nearest cell towers) are the building blocks of creating a unique user profile.

fingerprint hud scan on blue background

Fingerprinting takes into account your entire device and factors directly surrounding it - your browser, connected networks, the operating system (exact build and version), browser settings, browser add-ons/extensions, hardware of the device, and more. Fingerprinting is as invasive as it is accurate; for example, the EFF has estimated that only 1 in 286,777 browsers will share the same fingerprint. Naturally, some fingerprinting techniques are more invasive than others.

Fingerprinting has evolved so much so that it is next to impossible to simply "block all fingerprinting" techniques. In fact, it has been demonstrated that on some devices - such as smartphones - some fingerprinting attacks are not possible to block.

What are trackers?

Trackers come in many forms. Generally, regardless of what form of tracking is used, trackers are designed to either 1) track you across different websites, web services, and web apps or 2) collect identifying information about you and your device(s). Frequently, they do both.

technology concept of a supercomputer server rack on a blue abstract background

Many proponents of tracking say that it is beneficial for users because they can get personalized recommendations. While this is true, it comes at the price of the user's privacy; often times, users don't understand just what this privacy cost entails even if they are aware of it.

Though as we collectively learn of tracking (and fingerprinting) technologies/methods, what data they can capture, and just who that data is shared with - we should strongly consider: Is the price worth it?

Tracking pixels

Tracking pixels (also known as web beacons or pixel tags) are 1x1 pixel-sized images typically embedded in web pages, ads, and emails. Though they are images and load like images, they're so small that they're invisible to the human eye.

Once loaded, tracking pixels capture data typically about user activity and engagement. For example, tracking pixels are almost ubiquitously used in emails - particularly, marketing emails - to track if and when a recipient opens an email; tracking pixels embedded in emails can report the exact date and time an email was opened, location of the device used to open the email, and in which email client the email was opened.

Source: Ionos

Tracking pixels are also commonly found in advertisements. Tracking pixels embedded in advertisements are often called "ad trackers" - though this term could also include other tracking methods, such as tracking cookies; they can track views, clicks, referral sources, and conversions in displayed advertisements.

Some websites use tracking pixels as well. They're typically used in combination with analytics software and tracking cookies, though of course this varies immensely between websites. Some may even use multiple tracking pixels, of which are commonly JavaScript code provided by third parties (like the Meta Pixel).

Tracking pixels are versatile and depending on where they are embedded and the entities using them, they can collect many pieces of data such as (but not limited to):

Browsing history
Search history
Location data
Device information (type, operating system, etc)
Conversions
Clicked ads within the ad network
Data submitted to websites
Email client information
Email open data, time
Information from cookies stored on your device (are you signed into another service?)
Social media profile information

Meta Pixel

A rather well-known and infamous tracking pixel is the Meta Pixel (formerly: Facebook Pixel). It is embedded on millions of websites and has shown to be ruthlessly efficient in collecting massive amounts of data - including sensitive data such as tax record information, veteran statuses, addresses, and even health information.

The Meta Pixel in recent times has been the subject of privacy litigation in 2022 and 2023 (and likely going forward), mostly due to it facilitating sensitive data and information sharing - especially where the data was not to be shared with third parties "by law" (think: violating HIPAA) without expressed consent from users.

image showing the meta logo on a phone with whatsapp, facebook, and instagram logos in the background

Tracking Cookies

Cookies are bits of data that get stored on your web browser by websites. You've probably heard of first-party and third-party cookies; first-party cookies are placed by the website itself whereas third-party cookies are placed with a different entity that loads on that same website, such as ads served by an ad network.

Some cookies are essential for certain websites to function. Some websites have many cookies that retain many bits of differing information that could get annoying for users to continuously define on subsequent visits. For example, session cookies may be used to reauthenticate you to a website on a subsequent visit - you don't have to login again. Session cookies may also retain saved preferences (dark mode or light mode?) and other details between visits.

Source: CookieYes

Of course, some cookies in your browser may also be tracking cookies. Note that tracking cookies don't do the active tracking themselves, but they store information websites interested in tracking your activity can retrieve and read later, often collecting personal information. Tracking cookies can be first-party or third-party, but are commonly third-party.

For collection by calling websites/hosts, tracking cookies can store:

Browsing history (visited sites)
Length of stay on a given site
Interaction data with a site (ex: how many pages you click through)
Advertisements clicked
Purchased products
Information put into web forms (even without pressing the submit button)
Location data
Personal information given to a website

Once upon a time, due to Safari and Firefox blocking third party cookies by default and laws such as the EU's GDPR coming into effect, third-party cookies were seemingly on their way out. Big Tech companies - such as Google and Microsoft - sought to kill the cookie and rolled out plans to do so. Specifically, the initiative to phase out tracking cookies was spearheaded by Google in anticipation of rolling out their Topics API/Privacy Sandbox (which was initially "FloC").

youtube video screenshot of topics API introduction

However, in July 2024, Google announced it was no longer planning to disable tracking cookies by default. By comparison to the years-long strategy of rolling out Topics API and simultaneously eliminating cookies, this decision was considered rather sudden and without much explanation from Google; a theory is that advertisers didn't or wouldn't readily switch to using Google's Topics API, which would be a blow to their revenue.

Tracking URLs

Tracking URLs, also commonly referred to as Tracking Links, embed unique identifiers in URLs. They're immensely popular in emails but are also commonly used when linking to external websites from a web page.

Generally, tracking URLs include added UTM parameters designed for tracking incoming traffic (and their source) to a website or other web property. However, they don't necessarily have to use UTM parameters to "track" clicks; for example, many URL shorteners double as tracking URLs, since every shortened URL is unique.

blue concept of a browser url bar

Depending on the defined parameters, a tracking URL can capture:

Clicks
Page views
Impressions
Bounce rates
Average time on site

Generally speaking, tracking URLs are the less invasive side of the spectrum. They're also "easy" to fix; UTM parameters are on the end of the URL and removing them prior to connection to the web server effectively removes the "tracking." Tracking URLs are often used alongside other tracking methods.

Analytics software

For simplicity's sake, this section lumps app analytics (like Google Firebase) and other related analytics under this category.

laptop with pie chart and bar graph in the background

First, I want to emphasize that not all analytics software is super invasive or otherwise tracking users across the web. As an example, this website uses open-source analytics software, doesn't capture data considered personally identifiable information, doesn't share or send analytics data to third parties, and only captures click-throughs on this site - feel free to visit the privacy policy for more details.

With that said, some of the most popular analytics software on websites - like Google Analytics - are generally invasive by default. Analytics software may capture data, such as, but not limited to:

Actions taken by users on a website
Transactions/conversions
Referrer information (which may include sensitive tokens, especially if the browser by default forwards the entire URL string to the server)
Time Zone information
Location information
Device information
Crash data information (typically for apps)
Information entered forms on the website
Information in third-party cookies
User demographics

In the case of Google Analytics, while Google "discourages" website owners from processing "personally identifiable information" using Google Analytics, it doesn't mean that it does not happen; what Google defines as PII may not match with privacy laws/legal definitions of PII. Additionally, Google itself may use and aggregate data collected via Google Analytics on websites - often in combination with their own tracking methods and insights - which could be used to target users with ads.

Of course, while Google Analytics is estimated to be in use on at least more than half of the world's most popular websites, it is not the only invasive analytics software/platform out there. Other analytics software may collect and track users similarly; and of the various different analytics software, different configurations (and in some cases, misconfigurations) can lean towards being more aggressive, further undermining user privacy in similar ways to Google Analytics.

google logo on a gray background with "analytics" in front

Apps also regularly use and include analytic software, implemented similarly to analytic software found on websites. App analytics may also capture crash data information (which can contain identifying device information), performance information (which can contain identifying device information), identifiers, and app usage information. When it comes to displaying targeted ads in an app, often analytics software, SDKs, and tracking pixels are used in combination.

Depending on the app and its analytics stack, collected data may be further shared with third parties - including for targeted advertisement, marketing, and may even wind up as part of state-level digital surveillance.

What data do trackers collect?

Since trackers come in different forms, the data they collect varies. For example, you could refer to the tracking methods described earlier in this post, though these lists are not exhaustive. Because of this, you'll find that many different websites use a combination of different tracking techniques - often to collect as much data on users and visitors as possible.

Rather than give an exhaustive list of what data trackers can and do collect, there are enough examples in the wild that demonstrate the data various tracking mechanisms collect - including the blunders of mishandling or straight up lying about what data certain tracking technologies can and do collect...

In July 2024, the US Department of Education was sued for sharing students' Free Application for Federal Student Aid (FAFSA) information with Facebook. Specifically, the Meta pixel transmitted student data from the FAFSA website in a hashed, but easily reversible format.

free application for student aid form on a background of american flag

In April 2024, Cerebral, a telehealth company, was fined $7 million for "misled (misleading) consumers into believing their health information was protected, while embedded trackers sent details about treatment and more to third parties" These third parties included Meta (Facebook, Instagram), TikTok, and Google.

In November 2022, an investigation by The Markup uncovered that tax preparation companies - H&R Block, TaxAct, and TaxSlayer - were sending financial information, names, and email addresses to Meta (via the Meta pixel). TaxAct also used Google Analytics on its website, which allegedly sent financial information such as income, refund amounts, and filing status to Google.

pen, calculator, and glasses on top of a spreadsheet

Enter: Server-side tracking

Most techniques described in this post thus far have been client-side tracking. However, server-side tracking also exists, typically functioning alongside (or in some cases, in place of) client-side tracking mechanisms.

Server-side tracking is harder for the end-user to block because the processing of interaction/collected data happens, well, on the server. For example, server-side tracking generally does not store cookies on a user's device; therefore, there is "nothing" for the client (user) to block. Versus relying on the client's device and browser to process and send data, server-side tracking typically uses application programming interfaces (API) to communicate with other common tracking/analytics platforms like Meta and Google Analytics.

Naturally, since data processing happens server-side, it makes blocking and/or stopping this data collection and tracking harder. Most adblocking technologies are countermeasures against client-side tracking methods, described in the various sections in this post. Generally speaking, server-side tracking is more complex and costly to set up and maintain, so client-side tracking is still highly prevalent.

Blocking trackers

Blocking trackers (which often come bundled with ads), comes with privacy and security benefits. Some display ads may actually be malicious, potentially delivering malware or malicious scripts in some contexts, but that is beyond the scope of this post.

the word "adblock" on a red enter key on a keyboard

Trackers can be blocked at the browser, device, or network levels. It's common to use any combination to block trackers - what is used mostly depends on the user. Two common methods for blocking trackers is setting the device or network to use a DNS provider offering domain filtering and using an adblocking extension/add-on in the browser.

Even so, it remains important to use blocking tools that don't in turn engage in their own data collection and are indeed effective.

Final thoughts

Tracking methods vary with some being more "naturally" invasive than others. Tracking pixels are highly versatile and exist across apps, websites, and email alike. Tracking URLs also exist across apps, websites, and links embedded in emails. Analytics, depending on the software and configuration in question, can also be used to track and profile users across the internet.

With that said, stay safe out there!

Next Post Previous Post