We've been deciphering user agents for 11 years and during this time we've handled hundreds of millions of user agents in all sorts of formats and states (good, broken, spammy, malicious and just plain weird).
This is 11 years of experience distilled down to a list of bullet points to help you make sure that the user agent for the software or script you're building serves you, your users and the internet well.
Lots and lots of spam bots do this; some websites output a list of "recently seen" user agents on their site, so spam bots have tried to exploit this by putting links they want visited (in order to drive up the Google PageRank) in the user agent.
Including the full HTML link tags makes it very clear that you're not being legitmate and are trying to scam backlinks out of websites that don't handle dodgy user agents properly; in fact, the WhatIsMyBrowser.com parser/API will always detect a user agent like this as being abusive/spammy.
It's very common (and helpful!) for bots (crawlers, analysers etc) to include a link to an informative page about the bot, so that curious webmasters can find out more... just don't enclose that link in anchor tags.
While our systems are modern and have no problems with UTF-8 or even Emojis in user agents, many others systems on the web that handle user agents are quite old and can't deal with extended character sets. Emojis in user agents are a definite "no"!
So if you're creating a new user agent for your browser, bot, app or script, it's a very good idea to keep your user agent as simple as possible - only use letters, numbers and basic symbols (brackets, hypens, slashes etc).
As per the previous guideline, you should only ever use a very basic character set - A to Z, 0 to 9 and some basic symbols like forward slashes, hyphens, underscores and parentheses - you should also take care to never include any "encoded" symbols, eg %2F or %20.
Our API includes a user agent preprocessor to get rid of these problems as best as possible, but it may still introduce issues and cause our user agent system to mark the user agents as "weird".
Always include the full version number of your software, don't abbreviate it to the major version.
Firefox breaks this rule, so instead of reporting 50.1.2 in it's user agent, it always reports 50.0. The rationale the developers give for this is that by showing your full version number it somehow helps attackers know if your browser is vulnerable to a particular exploit; however this doesn't really make sense. If you were malicious, you could just attempt the attack regardless and it would either work or not.
The problem is that it prevents pages like our homepage from determining if you're actually up to date or not. As such, we've had to scale back our version checking for Firefox - because now we only know if you're running the correct major version, not the revision as well.
Generally speaking, you shouldn't make your the entire user agent something like MyExampleBrowser/4.21.
Sometimes bots (crawlers etc) do have very simplistic user agents consisting of nothing but a fragment to identify itself, so unless your software is a bot, you should at least include some kind of fragment/s which also indicate the Operating System (Windows/macOS/Linux/iOS/Android etc). It's also great if you include a relevant version number as well; our parser will then be able to display much more detailed and helpful information for your users.
You should do this even if you only ever plan to develop your browser for one OS/Platform. It lets us tell your users that they have "My Example Browser 4 on Windows 10" instead of just "My Example Browser 4". This can be really helpful for tech support to have as many relevant details as possible.
If it's relevant, you could also include the hardware architecture of the device as well; although unlikely there might be a slight difference in functionality between hardware architectures, so it's helpful to tech support to know if you're running My Example Browser on Intel or ARM. We try to detect hardware architecture so if you can, you could include it too.
When you include a Windows OS string, it's great to use the Windows NT x; fragment, as we will translate those into actual versions of Windows. If you're releasing it on linux, it's actually not much help to include the kernel version (although it doesn't hurt...). A better solution would be to include the distribution name fragment, so that we can show that to your users, eg. Ubuntu/18.01.
As a general guideline, here's a list of handy fragments to choose from. It doesn't matter to us where/which order you put these fragments in the user agent:
|Windows NT 10.0;||Running on Windows 10|
|Windows NT 6.2;||Running on Windows 8|
|Windows NT 6.1;||Running on Windows 7|
|Intel Mac OS X 10_14_4;||Running on Mac OS X 10.14.14 - Note that traditionally Safari's OS (and iOS) version fragments use underscores instead of periods to seperate version fragments; our parser works with either, but other parsers may not be as forgiving, so it's probably safest to stick with underscores so your browser is correctly detected by other user agent parsers.|
|iPhone OS 12_2||Running on iOS 12.2 (on an iPhone)|
|iPad OS 12_2||Running on iOS 12.2 (on an iPad)|
|Android 9.0;||Running on Android 9.0|
|Linux Ubuntu/10.04||Running on Ubuntu 10.04 - note that just the "Ubuntu" fragment is enough for our parser to pick up that it's Linux, but other user agent parsers don't seem to do this, so including the "linux" fragment helps them too.|
These fragments are mutually exclusive - so don't make your user agent say that it's running on Windows and macOS! If your browser supports both of those operating systems, only include the correct fragment for that operating system.
It's nice to include a fragment to indicate which rendering engine you're using; WebKit, Blink, etc as well as the version number of the rendering engine that it's using.
Unless your software is a bot or crawler, don't put the fragments "bot", "crawler" or "spider" anywhere in the user agent. Many web masters will apply filtering they want applied to bot software based on whether the user agent contains fragments, and so if you're making an actual "web browser", your users may run into these filters by mistake.
Even if you're writing a very well behaved and sensible bot, it may encounter issues that you need to know about: maybe it's gotten stuck in some infinite loop and is endlessly crawling a page or section on a website. Or, sysadmins may see your bot in their logs and want to know more about it, in order to decide to allow or block it.
You should always include a URL for your bot (whether it's a crawler, analyser, site monitor or otherwise. It will let confused, frustrated or curious sys admins contact you to enquire or to help you.
Because we display a big listing of user agents (all of which have been submitted to our site by various visitors many of which submit fake or modified user agents), we take special care to keep it work-safe. To prevent your user agent getting blocked by our sanitiser, make sure that your user agent doesn't contain anything which might cause it to appear "rude". We're by no means puritans, we just don't want it shown on our user agent listing.
Make sure to never include any sort of name, user name or company name in the user agent. This might seem strange to even need to specify, but we've had to hide a whole collection of user agents where it seems that the systems administrator has set some kind of group policy to append the name of the organisation the users work for to the user agent!
This isn't the point of the user agent string in the first place, and also it's a terrible security and privacy issue. We make an effort to not allow these kinds of user agents to show on our listing or in our user agent database, but it's much better if software (and sys admins!) don't leak this info in the first place. Look after your users!
It's fine to group certain related fragments together with brackets if you want, but some user agents get send through entirely enclosed by brackets ( ) or quotation marks or apostrophes. We mark those user agents as "weird" and don't show them in our listing.
We see a fair few truncated user agents, eg: Mozilla/5.0 (Linux and looking for mismatched brackets is one of the ways we mark those user agents as "weird" (which among other things stops them appearing in our user agent listing).
If your user agent contains brackets, just make sure they are matched - that is to say, for every opening bracket there's also a closing bracket too.
This one might seem a little weird, but one thing you can do to avoid getting caught by our "weirdness" detector is to include some spaces in your user agent!
The reason for this is that we've seen heaps of user agents that are just mangled together strings of characters and symbols. We've got code which detects this as "weird" and subsequently won't show them on our user agent listing. We also mark them as weird if any of our customers parse them through our user agent API.
For the most part, this works great, but occasionally we see user agents like: A/9/Compal/TicWatch#C2/skipjack/unknown/QCX3/l15942351593945735227/-/413721875/-/mobvoi/64/65/- which is a legitmate user agent for a smart-watch - however because it doesn't have any spaces in it and also has a fair amount of symbols/slashes in it, it gets picked up as "weird" and won't be shown on our site.
We've manually coded exceptions for this case, but if you don't want your user agent flagged for our customers and hidden from user user agent listing, it's a great idea to break it up with normal spaces and punctuation.
For example, we've seen user agents like:
You can see the "MyExampleBrowser" with the same version number as AppleWebKit. This doesn't look very good and is probably wrong anyway! You should include the software version number behind your fragment.
If it's at all possible, try to make the version number that comes after your main software fragment look or be the actual version number (eg. don't have just have a build number or big string of numbers, and include the point-release as well when possible).
Some software seems to use a build number or some other kind of internal marker after its software fragment. Our parser likes to show users the Major version number (eg "My Example Browser 4") and so if you just include a big long string of numbers it looks like "My Example Browser 64619842" which doesn't look as nice.
For an example of what not to do, consider RealPlayer's fragment in this AOL Browser user agent:
Did you notice the "R1" at the very end? It's not clear at first glance that this relates to the Real Player extension. It should probably have been something like RealOnePlayer/1.3).
It's common for browser user agents to mimic more popular web browser's user agents whilst also adding their own fragment to differentiate it too - especially if your browser is actually based on a mainstream browser.
In other words, a web browser team may base their web browser off the Chromium/Blink rendering engine, and so under the hood it's essentially the Chrome browser... as such, they choose to make their user agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36 FooBar/1.5.2. If any web server is looking through logs, it's likely that they'll interpret that user agent as Chrome (and ignore the FooBar fragment right at the end).
Because you told us about your new user agent, our user agent parser would already realise that your user agent actually belongs to the FooBar browser and would decode it as such.
So far, so good... what this tip is suggesting to you is that your imitation user agent should not be something like:
So if you're basing your user agent on a more popular one, that's fine, just don't include contradictory fragments.
In a very similar point to the one above about not mimicking two different browsers, don't include fragments that indicate the software is running on Windows and macOS, (or iOS and Android, etc). This will cause the user agent to be marked as "weird" and handled differently.
Our database of user agents has thousands and thousands of user agents which we've detected as "weird" and won't display in our listing. A very common issue with user agents we see is that they're something like:
Note the "User Agent=" fragment at the start of both of them. Clearly, some malfunctioning bit of software sent these requests, as these are not normal Firefox or Internet Explorer user agents, so since they also aren't "real" user agents, and we don't want them displayed on our listing, so the system automatically flags them as "weird" and won't display them unless we manually "okay" them.
Sometimes we run into a user agent which is real but has been caught by our system and has to be manually marked as being fine, for example:
Mozilla/5.0 (compatible; Yahoo! SearchMonkey 1.0; http://developer.yahoo.com/searchmonkey/useragent)
We've manually okayed it, but until then, we weren't including it in our listing. It's best to avoid this term. If you've got a user agent like this that you want us to manually set to display in our listing, use the contact form and we'll mark it as good.
Make sure the Mozilla fragment is right at the start of the user agent.
Why? Because we sometimes see user agent strings in our database which are actually two user agents combined together; usually it's because someone has tried to manually change their user agent but has made a mistake and pasted in two different user agents.
Sometimes we see user agents like:
Notice how there's a second Mozilla fragment in the middle? That's bad. If you look at the whole string, it's quite obviously two user agents which have somehow been joined together.
We detect this by noting "Mozilla" fragments anywhere but at the start and we tentatively hide them from appearing in our user agent listing or database.
To avoid this, if you decide you need "Mozilla" in your User Agent, make sure to put it at the very start.
Overly long user agents tend to indicate that there's something wrong with your user agent: perhaps it's intentionally malicious/spammy, or perhaps there was a problem that it was sent to our servers (eg. One of our API customers sent it malformed). Any user agents longer than 512 characters get marked as "weird" by us and won't appear in the database. We often see user agents with long repeating fragments, long totally random strings or several user agents joined together.
Our database can handle the long user agents (although we don't include them in our user agent database dumps), however other websites which also record or handle user agent strings may struggle if their database isn't configured properly. It's best to keep your software's user agent well under 256 characters long.
Let whatismybrowser.com know about your new software! We are always happy to add detection for developers who contact us to let us know of the software that they're developing. Please include a user agent (including examples of any variations) along with basic explanations of each and any unique fragments so that we can test that we're detecting it the right way.
Doing this will ensure that the thousands of companies who use our API will know what software their users are using.
Seriously, we live and breath user agents (yes, as weird as that sounds) and we're always happy to contribute our 2 cents on your thoughts about your user agent. If you're developing a new browser, app, script or anything else that sends user agent headers, we're happy to tell you what we think and give you some constructive feedback.
Just head over to our contact form and send us your thoughts!
We hope this guide helped; if you have any questions or suggestions for it, let us know through our Contact Page.
If you need help parsing user agents, then please check out our User agent parsing API. It's extremely powerful and detailed and does more than just parse user agents - you can get the latest software version numbers for all the major web browsers too!