Normal, well behaving browsers and bots will send their user agents as you'd normally expect, however all sorts of issues can arise with user agents outside of the norm.
As well as finding problems with user agents, our preprocessor will sanitize and clean as much junk out of any malformed user agents as possible.
User agents can get encoded strangely by a proxy or a script that isn't handling them correctly. Maybe they've been stored incorrectly in a database somewhere, maybe a script wasn't transmitting them properly or didn't escape the special characters properly, and now an already messy user agent is even worse.
When you send user agents to our API, we run them through our preprocessor first to remove as many issues as we can, including incorrectly encoded user agents, unescaped characters and more.
Many user agents have GUIDs, Device IDs or other random identifiers in them; we've seen a few "anonymizer" plugins which (for some strange reason) add unique random strings to user agents (...we have no idea why - this would make you more targetable!) but it doesn't matter - our preprocessor does a great job at stripping them out.
This is very important when you need to save user agents to your database or system - otherwise you'll end up with thousands of "almost-identical" user agents - normal user agents that have a single random GUID or string in them which means your system will save it into a unique record.
For example, instead of having thousands of records in your database like this:
Our preprocessor keeps it neat for you by reducing them all down to:
This is just one of many, many things our user agent sanitizer can fix for you.
We built the preprocessor for maintaining our own user agent database and now you can get the same features if you want.
As well as sanitizing user agents, we can also spot many different things that are "wrong" with a user agent, so that you don't get tricked.