Sanitizing User Agents

Normal, well behaving browsers and bots will send their user agents as you'd normally expect, however all sorts of issues can arise with user agents outside of the norm.

As well as finding problems with user agents, our preprocessor will sanitize and clean as much junk out of any malformed user agents as possible.

Fix encoding problems with user agents

User agents can get encoded strangely by a proxy or a script that isn't handling them correctly. Maybe they've been stored incorrectly in a database somewhere, maybe a script wasn't transmitting them properly or didn't escape the special characters properly, and now an already messy user agent is even worse.

When you send user agents to our API, we run them through our preprocessor first to remove as many issues as we can, including incorrectly encoded user agents, unescaped characters and more.

So for example, if your visitors are sending incorrectly encoded user agent strings, we can help you identify and fix these problems. Turn:

Mozilla/5.0+(Linux;+Android+10;+POT-LX1)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/80.0.3987.132+Mobile+Safari/537.36

In to:

Mozilla/5.0 (Linux; Android 10; POT-LX1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Mobile Safari/537.36

Remove multiple types of GUIDs found in some user agents

We've seen a lot of user agents like this:

Mozilla/5.0 (iPad; CPU OS 7_0 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) CriOS/30.0.1599.12 Mobile/11A465 Safari/8536.25 (3B92C18B-D9DE-4CB7-A02A-22FD2AF17C8F)

So we tidy them up like:

Mozilla/5.0 (iPad; CPU OS 7_0 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) CriOS/30.0.1599.12 Mobile/11A465 Safari/8536.25

Remove random identifiers in user agents

Many user agents have Device IDs or other random identifiers in them; we've seen a few "anonymizer" plugins which (for some strange reason) add unique random strings to user agents (...we have no idea why - surely this would make you more targetable?!) but it doesn't matter - our preprocessor does a great job at stripping them out.

This is very important when you need to save user agents to your database or system - otherwise you'll end up with thousands of "almost-identical" user agents - normal user agents that have a single random GUID or string in them which means your system will save it into a unique record.

For example, we identify and remove the random number in this user agent:

Mozilla/5.0 (Windows NT 10.0; Win64; x64; WOW64; rv:41.0) Gecko/20100101 Firefox/49.0.2 (x86 de) Anonymisiert durch AlMiSoft Browser-Anonymisierer 96034752

It becomes:

Mozilla/5.0 (Windows NT 10.0; Win64; x64; WOW64; rv:41.0) Gecko/20100101 Firefox/49.0.2 (x86 de) Anonymisiert durch AlMiSoft Browser-Anonymisierer

Hundreds of different user-agent specific cases

There are too many to list here, but we continue to identify user agents that have weird random fragments in them, and we extend our preprocessor to remove the random bits in them. For example, we turn something like:

Mozilla/5.0 (Linux; Android 5.1.1; S3 Build/LMY49F; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/77.0.3865.120 MQQBrowser/6.2 TBS/045230 Safari/537.36 scancode_vc/213 scancode_vcname/2.45.50 scancode_cuid/|0 scancode_token/1_XPXQH3c5HRPtFHkSwi3sCCURmT25QfxM scancode_channel/20200826huidu zyb_jsBridge/1, jsBridge_jsInterface/1 jsBridge_isNewJsBridge/1 jsBridge_vc/2.3.2 jsBridge_os_version/5.1.1

Into this still messy but no longer random user agent:

Mozilla/5.0 (Linux; Android 5.1.1; S3 Build/LMY49F; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/77.0.3865.120 MQQBrowser/6.2 TBS/045230 Safari/537.36 scancode_vc/213 scancode_vcname/2.45.50 scancode_cuid/|0 scancode_token scancode_channel/20200826huidu zyb_jsBridge/1, jsBridge_jsInterface/1 jsBridge_isNewJsBridge/1 jsBridge_vc/2.3.2 jsBridge_os_version/5.1.1

And instead of having thousands of records in your database like this:

Mozilla/5.0 CK={tiPpw0h4g7kjv0Od1HEGU6ReTr/lBjWQSeRYO97yzccRl5XVZTG8x2djaWnFc/e1WsZkHjr5bhkfEubfovS63SmK7tarn+nVGRMVWpLyANs=} (compatible; MSIE 7.0; Windows NT 10.0; WOW64; Trident/7.0; .NET4.0C; .NET4.0E;
Mozilla/5.0 CK={tiPpw0h4g7mshyRoltrFTQJg5Y391GuYZU3E7M5C9BlqrLzycZEnIGdjaWnFc/e1MWo8JPRze84fEubfovS63U63hi0J8jSpJNhrhR0we8A=} (compatible; MSIE 7.0; Windows NT 10.0; WOW64; Trident/7.0; .NET4.0C; .NET4.0E;
Mozilla/5.0 CK={tiPpw0h4g7l3Hp4Wa8dl/R3YcIY+vvrSZGejye2RwiA1ymrRJOuR2mdjaWnFc/e1bx+qsNhXdjAfEubfovS63b+WCBL0tLhZclpKWAk343I=} (compatible; MSIE 7.0; Windows NT 10.0; WOW64; Trident/7.0; .NET4.0C; .NET4.0E;

Our preprocessor keeps it neat for you by reducing them all down to:

Mozilla/5.0 CK={} (compatible; MSIE 7.0; Windows NT 10.0; WOW64; Trident/7.0; .NET4.0C; .NET4.0E;

We've got lots of useragent specific checks and fixes for all the weird user agents we see.

Some unknown web browser extension or script adds strange random fragments to it's user agent, resulting in user agents like:

Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; W3M0Xunz-49; rv:11.0) like Gecko
Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; md436sgz-83; rv:11.0) like Gecko
Mozilla/5.0 (Windows NT 6.2; WOW64; Trident/9.0; a3h54kgz-57; rv:11.0) like Gecko

And so on. In our extensive collection of user agents we've found these fragments appear in several different places in the user agents - in the middle, at the end etc. We identify and remove them, returning a far neater user agent string in the user_agent_sanitized which you can optionally use if you want.

User agents with MAC addresses in them

We've noticed what looks like MAC addresses in a number of user agents. This is very strange, browsers, scripts, and programs shouldn't be sending their MAC address in their user agents, so we identify and remove them if you want.

User agents with checksums in them?

We've seen a lot of user agents with what look like hashes or checksums in them, and we work to remove them, for example, turning:

Podbean/iOS (http://podbean.com) 4.5.1 - 42e1a53e413871204ad43bf95c9c5c94

In to:

Podbean/iOS (http://podbean.com) 4.5.1

We built the preprocessor for maintaining our own user agent database and now you can get the same features if you want.

Find out about how we check user agents for problems

As well as sanitizing user agents, we can also spot many different things that are "wrong" with a user agent, so that you don't get tricked.

Get started now

The API is free to use and easy to set up, so why not get started right now.

Do you have a question? Get in touch! We'd love to help you.

Explore other API Features