This article will cover some advanced tips and tricks on how to include/ exclude exactly what you want using property hashes.
Every banner contains a hash property which is the numeric hash of the data property. You've probably seen other hashes such as MD5 or SHA1 but at Shodan we use a numeric hash for more efficient storage/ searching. For example, here is a simplified banner showing you both its data property and the hash value calculated based on the property:
{
"data": "SOMETHING",
"hash": -82460412
}
As you'd expect, there's a corresponding hash filter that lets you search for banners based on the hash value. It may not seem obvious at first but there are some cool things you can do with this hash filter.
If you've been searching Shodan for services like Telnet you've probably encountered the issue of results that don't contain any main banner text:
Empty banners always have the same hash value of zero which makes it possible to exclude empty search results by adding -hash:0 to the search query. For example, here's a search query to find Telnet services on port 23 which don't have an empty banner:
port:23 -hash:0
Some banners are too short or only contain generic information. Check out the following HTTP banner as an example:
How would you search for more devices? A search query such as HTTP/1.1 200 would return nearly all the web servers in Shodan since it would match all banners that partially contain those terms:
Instead, we can simply search using hash:-1212692304 to get more results. The hash value was obtained from looking at the full banner via the API. Finding more devices running such a banner would be nearly impossible without the hash filter.
For web servers, the crawlers also calculate a numeric hash of the website. The HTML from the web server is stored in the http.html property and as such isn't part of the top-level hash property (the hash property is a hash of the data field). And the HTTP headers often contain date/ time information which means the hash property is always different even if it's the same type of website/ device. To find identical websites there's the http.html_hash property, search filter and facet. http.html_hash is a numeric hash of the HTML for the website (as stored in the http.html property).