Metadata and Your Privacy

2023-10-13 13:00 / data privacy, metadata

The importance of metadata to user privacy is simply under emphasized.

Metadata can tell the whole story without ever reading the message contents; with files, Metadata can reveal additional and potentially sensitive information in addition to whatever is contained inside a file.

People break up over metadata, are arrested over metadata, and killed over metadata. The leaking/capture of metadata is just as privacy invasive as gaining access to and reading message contents in many cases, despite downplaying by the many private and public entities relying on metadata collection.

TABLE OF CONTENTS

What is metadata?

In its most basic form, metadata is the data associated with a message but not included in its contents. Common metadata frequently includes the time of a message sent and who it was sent to.

These may seem like two insignificant data points at first glance, but a lot of the “story” can be told just using these two data points.

In many cases, we don’t need to read the explicit contents of the message itself to get the “story.” Sometimes, who sent/received a message and when the message was sent/received answers important-enough questions that make the content of the message irrelevant, such as:

Who are they messaging?
What time was the message sent? Received?
How often are two people messaging?

For rather obvious reasons, who a message was sent to can be significant enough all on its own, especially if the users have been identified with rather unique identifiers, such as a phone number attached to a SIM card.

Knowing when a message was sent to a user can establish a pattern; especially if the central servers relaying messages are logging and storing these particular data points – as many do.

Over time, just with tracking these two metadata data points alone, we can establish patterns - for example, User X may message User Y every Sunday at 5:00pm for approximately an hour.

Even on end-to-end encrypted platforms, sometimes you’ll find that while the message itself is encrypted, the metadata is made available to the servers handling the message. In this case, at minimum, metadata is transmitted to the servers of whatever communication service you’re using - be it a messenger or an email service provider.

If the provider respects user privacy, then likely this data wouldn’t be logged (for an excessive amount of time), analyzed, aggregated, shared, or otherwise used outside of proper message routing.

Of course, some messaging platforms use metadata for more than just message routing. They may use user metadata for targeted advertising, user tracking across multiple platforms, data to feed to machine-learning algorithms like automated spam detection (or "AI"), or as a “commodity” for selling (data brokers, etc).

Many free and popular, yet privacy-unfriendly email providers also use metadata - which are usually called email headers. "Free" email providers often scan and use user metadata to train the machine-learning algorithms behind spam, phishing, and malicious message filtering.

If the email provider also offers other products - perhaps a calendar - the data captured from the platform’s email scanning may be used to create calendar events.

For example, if you’ve received a flight itinerary to your inbox, the platform may infer from the message metadata you are planning a trip and place it on the calendar.

While this may seem a rather innocuous and convenient action, it at least means the provider/server has access to the metadata in the email headers and is using this metadata outside of message routing purposes. We can also assume they have access to your calendar and it wouldn't be unreasonable to assume they have access to the messages in your inbox.

Metadata and communications

What specific metadata attached to communication can be especially broad, since technically anything outside the message contents can be labeled as metadata. Therefore, it would be exceedingly difficult to give a definitive and all inclusive list here, but common metadata does frequently include:

Who sent/received a message (including unique identifiers)
Time message was sent
Server handling the message (specifically in email)
Location of device when message was sent
Client used to send a message (specifically in email)

Metadata and files

Photos

Most of this post focuses on metadata associated with messages. However, photos and videos also have associated metadata.

For example, by default on iOS, pictures taken with the device’s camera automatically have metadata associated with it; the most notable of this photo metadata is GPS location.

File and photo metadata can include:

Location details of a photo
Timestamp/date details of a photo
Timestamp of author who last edited a file
File type and size

Exchangeable image file format (EXIF) is the specific metadata standard for digital images, and can also include much more detailed metadata such as:

Date/Timestamps
Device ID numbers
Camera settings
Image metrics (pixel dimension, resolution, file size)

In most cases, devices - such as smartphones - include exact timestamps and GPS coordinates in images taken with the device’s camera by default. It’s often difficult to remove the majority of this EXIF metadata without specialized software.

On iOS, users can disable the geolocation tagging using GPS via the Photos app by tapping on an image, tapping on the information tab, tapping “adjust” for the location, and then selecting “No Location.”

Most users are unaware their images contain this potentially sensitive data. As a result of this lack of awareness, users frequently share photos to the internet containing this data. This data can be harvested by anyone - including the platform on which it was shared - with the means and know-how to do so.

An obvious answer some may have to this problem is to tell users to “only share photos with trusted contacts.” While this is generally good advice (especially for more personal photos or videos), it is but one solution to a multi-faceted problem. Photos and associated metadata may be unintentionally shared with other parties…

Do you have iCloud enabled? Photos from your iDevice may be automatically synced with Apple’s iCloud servers and while stored encrypted, Apple has the keys for decryption. (This can be mitigated by enabling Apple’s Advanced Data Protection).
Syncing/storing photos on a non-privacy friendly or unencrypted storage provider like Google Photos? You’re sharing your metadata, like location data and timestamp of photos, with Google.
Uploaded a photo to Instagram? Instagram may ingest your photo EXIF metadata.
Used Imgur to share an interesting bird from your hike on Reddit? In addition to sharing metadata with Imgur or Reddit, anyone who views your photo may be able to download it and extract its EXIF data.
Used a photo in an AI generator app? The platform may have ingested associated metadata alongside using your photo beyond spawning that AI-generated image.

Files

Files (that are not photos) also have associated metadata. Common file types that frequently retain and attach metadata include Word files (.docx) and PDF files (.pdf).

These files can contain a lot of metadata, such as:

Author names
Which “author” last saved the document
Comments (which can contain additional information)
Tags
Title/subject
Details about the machine where the file was created (such as hostnames)

Fortunately, it is possible to remove file metadata, just like it is for EXIF data in photos prior to uploading to a cloud service. For example, in Microsoft Word and Adobe Acrobat, users can use an in-built function to erase document properties and PII.

Cloud Storage

Many users use cloud storage services - like iCloud, Google Photos, Dropbox, and others - to store files and photos and sync/access them across different devices. These services generally have access to your file and photo metadata.

If the cloud storage provider does not run their servers in-house, then third-parties such as the server infrastructure provider (hosting) may have access to PII gathered from file properties and metadata.

Even if the cloud storage provider runs their own infrastructure, if their servers know the decryption keys for files stored, then they could effectively access this data for themselves or share access with other third-parties.

Additionally, since their servers know the decryption keys, these services may also have direct access to your files themselves (not just the metadata). This access can also be given to third parties, such as government entities or server infrastructure providers (if applicable).

How is metadata used?

Metadata has valid uses, even if there are ways for communication transmission without metadata or minimized exposure of said metadata.

Usually, metadata use (and collection) is dependent upon the protocol and platform itself; some protocols use metadata and by extension some platforms generate/use/collect metadata more than others.

There are a wide variety of uses - some rather invasive, some legitimate - for metadata. Of course, what is ultimately acceptable to a user is dependent on that same user’s threat model.

1. Message routing

Very generally speaking, metadata is used by clients and servers to route (and in the process of routing, validating) communications to the appropriate parties. In some applications and protocols, using metadata information such as the recipient and time stamp is required to successfully route messages.

However, there are protocols and messaging implementations that 1) severely limit the amount of metadata exposed to servers or 2) eliminate the “need” for the server to have access to limited amounts of metadata, even for the purposes of message routing.

2. Investigations

Metadata is often used in investigations, ranging from law enforcement investigations to open source intelligence (OSINT) investigations.

Most commonly, for law enforcement, we see a high interest in location data; the biggest example as of writing is the revelation of Fog Data Science, where local police could purchase a relatively low-cost subscription tool that provided location histories of millions of devices.

Interestingly, Fog Data Science didn’t explicitly collect the data themselves. It's not as if Fog Data Science created the something like a downloadable app for end users and then accessed and collected data that way.

Rather, they aggregated data from hundreds of third-party apps that do collect location data. These weren’t necessarily apps “traditionally” associated with location data usage like GPS apps or ride-sharing apps. These apps collecting and sharing location data range wildly. Much of this data was collected in the background. Conveniently for Fog Data Science, these apps seemingly all shared location data with third parties - such as Fog Data Science!

So, since the apps collecting and then sharing data with third parties varies, apps collecting/sharing location data could include the likes of WhatsApp and email clients of privacy-unfriendly email providers…

Naturally, investigations may have interest in other communication metadata such as:

Who sent a message
When a message was sent
On which device a message was sent/received
Information associated with a handle/account

Even file and photo metadata prove valuable to investigations of all types, as photos containing rather sensitive EXIF data - like location coordinates - may be inadvertently shared/published publicly.

For example, John McAfee’s location in Guatemala was exposed after Vice Magazine posted a photo taken with an iPhone; the iPhone’s camera is a “GPS-enabled” camera and the photo was not scrubbed of location EXIF data prior to posting.

Shortly after Vice Magazine posted that photo, John McAfee, wanted by Belize authorities for his alleged involvement in a shooting, sought asylum in Guatemala (where he was located, as told by the posted photo).

Remember this: people are accused of many things based on metadata. Divorces happen over metadata; metadata can compromise operations security (OPSEC), metadata can “make” an open source investigation and take an investigation further. Many things can be correctly inferred from capturing and examining metadata.

People are even arrested and convicted based on metadata. According to General Michael Hayden (former director of the CIA and NSA), “We kill people based on metadata.”

3. Marketing

Metadata is often used in various marketing/advertising campaigns.

Data brokers may use metadata to compile central repositories on millions of people and then sell it to whoever has the desire and money to purchase it. AdTech may use metadata either directly collected from their own platforms or received from another party to then display targeted or re-targeted advertisements to users who fit a defined data profile.

It seems with the targeted and re-targeted marketing landscape (a hefty part of “surveillance capitalism”), any byte of data is sought after; this includes the metadata associated with communications.

If the 2023 landscape of WhatsApp has taught any of us anything: AdTech is likely highly interested in communication metadata across many different communication platforms; phone calls, contact books/addresses, emails, messaging, and social media direct messaging are all game. Specific examples can include:

Email subject, to, and CC lines
Location an email or message was opened
“High” engagement with specific social media “influencers”
Who and how often you communicate with other individuals on a given platform
Whether a link was clicked in an email or message

These data points (and many more) can be used to “hyper personalize” a marketing approach, which can include (but not limited to) displaying targeted ads, sending “lead” emails to your inbox, displaying similar ads to your frequent contacts, or recommending purchases based on email links you’ve followed.

As a note, like law enforcement, marketing and AdTech have a high interest in location data. When tied with communications, location data can provide a wealth of information, which can be used for further profiling, marketing, and sharing/selling.

Just collecting location data over time is significant; someone can easily learn where you live, work, go to school, workout, grocery shop, and more just from gathering location data. Combined with communication metadata, complete outsiders can gain real insight into who you communicate with, when communication happens, and from where communication occurs.

They can also, with reasonable infer or model, your future actions - will you message Bob from around the location of The Only Burger Joint in Town today at 1200 like you have every Monday and Wednesday for the previous three months?

4. Machine learning

Depending on the platform, metadata can be used to train machine learning algorithms; this is seen frequently with “free” and privacy-unfriendly services that take user data as “payment” for their services:

WhatsApp uses metadata to train its AI, which is partially responsible for content moderation and spam detection on the platform.
Google likely uses email message (header) metadata to train its AI-powered spam filter for its Gmail service (though it claims scanned email data is not used for advertising purposes). It also provides an application programmable interface (API) for developers to tap into and use Gmail metadata.
Microsoft uses metadata to train its AI-assistant, Cortana; Cortana shares data with Microsoft and can’t be easily disabled on Windows 10/11
Facebook Messenger uses metadata to feed and train its various algorithms across the Facebook platform. Until Facebook implements end-to-end encryption for its Facebook Messenger, it also has direct access to metadata and message contents.

Naturally, using real user metadata can have privacy implications.

For example, many of these “free” services collect metadata that allows accurate location pinpointing even if permission to GPS is denied. Perhaps your important email to your professor is never delivered because the machine learning algorithm driving the spam filter falsely flags your message as spam and prevents it from ever being delivered because you sent it from the metro instead of your apartment.

Perhaps an email provider inadvertently automatically adds an "event" to your calendar because the machine-learning algorithm detected a receipt plus a location - but the calendar is shared with others. Perhaps you are swept up in an law enforcement investigation because an algorithm used metadata from your communications with your brother - who is the suspect of a drug investigation.

The issue compounds because most privacy policies won’t specify their direct collection or use of metadata. Essentially, due to the lack of transparency, once your metadata is collected, you don't know the entire picture of how it will be used.

How to protect metadata?

Protecting metadata can be... fairly complicated.

As mentioned, sometimes metadata has valid uses and it’s possible to use metadata in a way that shows respect to user privacy.

Typically, the “best way” to protect metadata is to limit the amount of metadata produced; the easiest way to accomplish this is to not share metadata in the first place.

This may mean using a communication platform (email or messenger) that doesn’t require access to extensive metadata in order to render services in the first place. Or using a cloud storage provider that also end-to-end encrypts metadata on the client (device) side prior to upload. Or using an email provider that minimizes information transmitted via email headers.

In most cases, if metadata must be generated and/or used, it should be either 1) minimal or 2) encrypted so that it’s unreadable by the server handling the request(s).

Tools for metadata redaction

Files

Depending on the file type, sometimes potentially sensitive information can be redacted within the program or app that created the file. For example, Microsoft Word has built-in capability to remove metadata without relying on macros or any custom plugins.

However, this may not be enough for some users, which is understandable because Microsoft Word primarily deals with .docx files - additionally, it may not redact all metadata in the file. Other file types, such as .pdf files can also store metadata. Fortunately, software for removing (and examining metadata) does exist:

ExifTool

Available for Windows, Linux, and macOS. The original tool for reading, writing, editing (and removing), metadata information from many different file/photo formats. ExifTool is frequently used in computer forensic investigations. ExifTool does not have a graphical user interface (GUI) and can only be used via the command line.

Photos

As mentioned earlier in this post, photos can include EXIF data considered sensitive. If photos with this sensitive EXIF data are shared, then the sensitive EXIF data is also unintentionally shared with third parties.

Particularly, EXIF photo data taken with GPS-enabled cameras (which includes smartphones!) can reveal location data to anyone with the means to view the metadata. Photos can also include exact date and time taken and related Device ID numbers of the device that took the photo.

Metapho

Only available for iOS. Allows the edit and deletion of desired metadata, such as location and time taken. “Safe Share” feature allows sharing a photo without any metadata without making copies of the image.

ExifEraser

Only available for Android. Has support for removal of EXIF, XMP metadata, and ICC profile data removal.

Use a secure messenger

With messengers and messaging platforms, metadata can include information associated with an “account” and metadata attached to messages.

Some message platforms - even those advertised as end-to-end encrypted messengers - collect and store metadata, such as to whom and when a message was sent/received/opened. They may also require direct access to contacts, aggregate data attained from third parties, and share data (like location data) with third parties.

Account creation with some messengers may require a valid email address or a sim-connected phone number for use. This email address and/or phone number can become metadata as it is often an identifier for the user on the messaging platform.

It’s difficult to directly address these issues from the user side. Mitigations, such as denying GPS location and refraining from transmitting sensitive data over a messaging platform, can be taken; however, the most effective way to minimize metadata generation, collection, and transmittance on a messenger to switch to a secure messaging alternative.

While not all messaging alternatives alternatives eliminate transmitting metadata altogether, secure messaging alternatives do respect the privacy of the user by not using this metadata outside of message routing. Most secure messengers do not require excessive personal information and do not engage in data collection or sharing of their own.

For messaging platform suggestions, users are highly encouraged to review the options presented in avoidthehack’s recommendations for secure messenger alternatives to messaging platforms like WhatsApp and Facebook Messenger.

Use an encrypted email provider

Email, as a protocol in general, creates and transmits a wealth of metadata; this is an issue with the protocol itself and is tough for even secure email providers to wholly address. Like messaging platforms, email metadata frequently includes information associated with an email account and metadata (email headers) attached to messages.

Email headers, the metadata associated with send/receiving email messages, can expose a lot of information. Some of this includes the sender's IP address, server handling names, and the email client used to draft/send an email. Most encrypted email providers attempt to minimize metadata transmitted and/or even potentially available to sending/receiving servers.

With the popular email services providers, account creation frequently requires PII - which is associated with the account, which is associated with messages sent/received - such as a valid phone number and first/last name. Many popular email providers collect much metadata over time for their own use; in many cases, the email provider even has access to users’ inboxes, even if they are not reading each and every message sent or received.

Encrypted and privacy-respecting email providers do not collect metadata or require PII to establish/use an account, even if metadata is transmitted over/to the provider's servers. Encrypted email providers use zero-knowledge implementations, ensuring not even the service provider's servers have access to user's inboxes.

For email providers, users are encouraged to use an encrypted email provider - specifically one that implements zero-knowledge encryption on its servers, which helps prevent email scanning and "snooping" by the provider. Encrypted email providers may not totally encrypt transmitted metadata, but do strive to minimize metadata; metadata generally isn't actively collected/consumed by the provider's servers for purposes other than message routing.

Users are highly encouraged to review the options presented in avoidthehack’s recommendations for secure and encrypted email providers; at minimum, most encrypted email providers minimize metadata transmitted when sending an email message.

Use a secure-by-design cloud storage provider

As mentioned, files and photos have associated metadata. This metadata may be unencrypted, or encrypted but directly accessible to the cloud storage provider. This access may extend to other third parties like government entities or even the server infrastructure provider (hosting), if applicable.

Secure cloud storage providers implement zero-knowledge encryption for files uploaded to user's accounts; they go beyond just encrypting and securely storing the files because they also encrypt file metadata that would otherwise be made available to the provider's servers.

Many of the popular cloud storage providers do not engage in client-side encryption or encrypting file metadata of files prior to upload to their servers. These providers also know and hold the decryption key for the files stored on their servers, so they could decrypt and access the files on a whim -- for example, to investigate a suspected terms of service violation.

If using any other cloud storage service, users can encrypt their files prior to uploading to the cloud. This is good practice, even for recommended secure-by-design cloud storage providers. After all, the cloud is just someone else's computer!

For secure (and privacy-respecting) cloud storage providers, users are highly encouraged to review options presented in avoidthehack’s recommendations for secure cloud storage providers.

Minimize metadata sharing across online activities

Metadata is generated with nearly every activity performed online. Some of this metadata generated may be unavoidable, or exceedingly hard to minimize. However, this is not the case for all metadata!

In addition to using more privacy-friendly messengers, email providers, and cloud storage providers, users can take basic steps to minimize unintended (meta)data sharing for common online actions, such as but not limited to:

Scrubbing photos of GPS location prior to sharing, publishing, or posting with anyone on any platform.
- Users may also want to consider scrubbing the exact timestamp on the photo
Encrypt files prior to upload on any cloud storage service
Scrub files like .docx and PDFs of unnecessary metadata prior to sharing or publishing online.

Final thoughts

Metadata is just as valuable as direct message contents.

Metadata alone can tell most of the “story” without ever revealing the explicit content of a communication. File metadata, specifically photo EXIF, can add sensitive information to “regular” or seemingly innocent photos or shared files, which risks user privacy.

The best way for users to protect metadata is to either refrain from sharing it or using strong encryption. Users should rely on strong encryption to keep their data secure and private.

If the cloud storage provider does not run their servers in-house, then third-parties such as the server infrastructure provider (hosting) may have access toed secure messengers and encrypted email providers to reduce the likelihood their metadata is used for other purposes and to avoid unintended data sharing.

Users are also advised to use secure cloud storage providers where possible - ideally, regardless of the cloud storage provider, users would encrypt their files prior to upload to any file storage/sharing service.

Users should take care to at least remove location coordinates from photos prior to sharing them, whether between “trusted contacts” or on the internet.

With that said, stay safe out there!

Next Post Previous Post