Make Search Engines Great Again!
Search engines are part of everyday life and help us quickly find information we are looking for. But anyone who has used search engines for a long time realizes its pitfalls. Companies have figured out how to optimize their websites and fight for the top spaces of your search results. As a consequence, we end up with results that are not necessarily high quality and instead contain websites that have gamed the system to get on top.
In our field (information security) we use search engines on a daily basis to look up technical write ups, documentation, vulnerability details, and more. Wouldn’t it be great if we could at least improve our tiny corner of the internet to become more productive and surface high quality information that is actually relevant?
With the recent release of Brave Search Goggles we have been given the opportunity to do just that. I highly recommend that you start off by reading Brave’s quickstart guide linked above and also dive into their whitepaper if you have a chance.
The description below is a TL;DR of what Brave Goggles does.
Goggles enable anyone, be it individuals or a community, to alter the ranking of Brave search by using a set of instructions (rules and filters). Anyone can create, apply, or extend a Goggle. Essentially Goggles act as a custom re-ranking on top of Brave’s search index.
So Goggles lets the end-user modify how the ranking works by applying rules to the index server-side. More specifically it is applied to their “expanded recall set”, which is described below.
The instructions defined in a Goggle are not applied to Brave Search’s entire index, but to what we call the “expanded recall set,” which in turn is a function of the query. The set of candidate URLs can be in the tens of thousands, which is often more than enough to observe a noticeable effect; however, there are no guarantees that all possible URLs are surfaced (in search terminology, we have no guarantees on recall).
Goggles do not apply to the whole Brave Search index, but to the expanded recall set which is a function of the input query. So if the target pages aren’t in the recall set, or even be in the Brave Search index, they won’t be captured by the Goggle.
I’ve been waiting for the release of Goggles since the whitepaper was first published. Once it came out I jumped on the opportunity to solve this for my daily searches. As a result the Netsec Goggle was born. This was created using submissions sourced from the subreddit /r/netsec. I’ve been having a great experience using this Goggle ever since. The Goggle was somewhat reproducible, but it wasn’t very easy to update and modify the data and algorithm. What if someone wanted to reproduce what I did but with different subject matter?
We’re excited to release narwhalizer to the public, and not just for information security. This tool combines the work of Ethan Dalool, who created timesearch, and Jason Baumgartner, who created pushshift.io. Generating a custom Goggle from a subreddit(s) should be easy for anyone with even a little Docker experience.
Subreddit moderation and a voting system allows you to use submissions from popular subreddits to curate your search engine experience. Maybe you want to create a Goggle that surfaces domains that are popular on both /r/programming and /r/unix. With narwhalizer you can easily do that. You can also tweak some options when creating your Goggle:
- Score threshold – Set the minimum score for a single submission to be included.
- Cut-off time – Only include submissions after a specific date.
- Frequency of domain – Only include domains that appear in multiple submissions.
- Top domains exclusion behavior – You can choose to treat the Top 1K Alexa domains in a few ways such as: exclude, include, discard, and downrank.
We tried to make it easy enough for people to experiment and create their own Goggles. Give it a try and see if it improves your search engine experience. We know that it did for us.
Forces Unseen is a specialized cybersecurity consulting firm helping companies with application and infrastructure security.
Check out our other blog posts as well.