Websurfx - An open source alternative to Searx which aggregates results from other search engines (metasearch engine) without ads while keeping privacy and security in mind. (programming.dev)

submitted 2 years ago* (last edited 2 years ago) by neon_arch@programming.dev to c/rust@programming.dev

32 comments fedilink hide all child comments

Introduction

Hello everybody, About 5 months ago I started building an alternative to the Searx metasearch engine called Websurfx which brings many improvements and features which lacks in Searx like speed, security, high levels of customization and lots more. Although as of now it lacks many features which will be added soon in futures release cycles but right now we have got everything stabilized and are nearing to our first release v1.0.0. So I would like to have some feedbacks on my project because they are really valuable part for this project.

In the next part I share the reason this project exists and what we have done so far, share the goal of the project and what we are planning to do in the future.

Why does it exist?

The primary purpose of the Websurfx project is to create a fast, secure, and privacy-focused metasearch engine. While there are numerous metasearch engines available, not all of them guarantee the security of their search engine, which is critical for maintaining privacy. Memory flaws, for example, can expose private or sensitive information, which is never a good thing. Also, there is the added problem of Spam, ads, and unorganic results which most engines don't have the full-proof answer to it till now. Moreover, Rust is used to write Websurfx, which ensures memory safety and removes such issues. Many metasearch engines also lack important features like advanced picture search, which is required by many graphic designers, content providers, and others. Websurfx attempts to improve the user experience by providing these and other features, such as providing custom filtering ability and Micro-apps or Quick results (like providing a calculator, currency exchanges, etc. in the search results).

Preview

Home Page

Search Page

404 Page

What Do We Provide Right Now?

Ad-Free Results.
12 colorschemes and a simple theme by default.
Ability to filter content using filter lists (coming soon).
Speed, Privacy, and Security.

In Future Releases

We are planning to move to leptos framework, which will help us provide more privacy by providing feature based compilation which allows the user to choose between different privacy levels. Which will look something like this:

Default: It will use wasm and js with csr and ssr.
Harderned: It will use ssr only with some js
Harderned-with-no-scripts: It will use ssr only with no js at all.

Goals

Organic and Relevant Results
Ad-Free and Spam-Free Results
Advanced Image Search (providing searches based on color, size, etc.)
Dorking Support (in other words advanced search query syntax like using And, not and or in search queries)
Privacy, Security, and Speed.
Support for low memory devices (like you will be able to host websurfx on low memory devices like phones, tablets, etc.).
Quick Results and Micro-Apps (providing quick apps like calculator, and exchange in the search results).
AI Integration for Answering Search Queries.
High Level of Customizability (providing more colorschemes and themes).

Benchmarks

Well, I will not compare my benchmark to other metasearch engines and Searx, but here is the benchmark for speed.

Number of workers/users: 16
Number of searches per worker/user: 1
Total time: 75.37s
Average time per search: 4.71s
Minimum time: 2.95s
Maximum time: 9.28s

Note: This benchmark was performed on a 1 Mbps internet connection speed.

Installation

To get started, clone the repository, edit the config file, which is located in the websurfx directory, and install the Redis server by following the instructions located here. Then run the websurfx server and Redis server using the following commands.

git clone https://github.com/neon-mmd/websurfx.git
cd websurfx
cargo build -r
redis-server --port 8082 &amp;
./target/debug/websurfx

Once you have started the server, open your preferred web browser and navigate to http://127.0.0.1:8080 to start using Websurfx.

Check out the docs for docker deployment and more installation instructions.

Call to Action: If you like the project then I would suggest leaving a star on the project as this helps us reach more people in the process.

"Show your love by starring the project"

Project Link:

https://github.com/neon-mmd/websurfx

top 32 comments

sorted by: hot top controversial new old

[-] asdfasdfasdf@lemmy.world 7 points 2 years ago

Based on the benchmarks, it looks like it's not running searches concurrently?

[-] neon_arch@programming.dev 6 points 2 years ago

Thanks for pointing this out, I just improved this by upgrading the algorithm to use tokio::spawn so I think I will update this benchmarks soon.

[-] asdfasdfasdf@lemmy.world 1 points 2 years ago

Nice! Happy to help.

[-] neon_arch@programming.dev 2 points 2 years ago

Thanks, I am very grateful for that :).

[-] smollittlefrog@lemdro.id 3 points 2 years ago

Hey, which engines are intended to be supported in the future?

Under src/engines I could find files for duckduckgo and searx. Are both already fully supported? Do you intend to support Google and Yandex in the future?

[-] neon_arch@programming.dev 4 points 2 years ago

Hello :).

The searx and duckduckgo engines are fully supported right now, and we are already looking forward to having more engines supported as well. Just, that we are in need of some help with the process because you know there are too many engines too work on :).

[-] orizuru@lemmy.sdf.org 3 points 2 years ago

Interesting, I'll be keeping an eye on this. Thanks for sharing!

I'm currently self hosting SearXNG. The must-have features for me are the custom filters and the actively maintained docker image. Will definitely give it a go if they get implemented.

[-] neon_arch@programming.dev 2 points 2 years ago

Thanks for taking a look at my project :).

The custom filter is about to be added soon, just the PR for it waiting to be merged. Once that is merged. We will have custom filer feature available. Though about the docker image feature it is available, I would suggest taking a look at this section of the docs:

https://github.com/neon-mmd/websurfx/blob/rolling/docs/installation.md#docker-deployment

Here we cover on how to get our project deployed via docker.

[-] orizuru@lemmy.sdf.org 1 points 2 years ago

Ah cool, thanks!

Will definitely try it now. It's good to have options (Searx just recently became unmaintained).

Are there any plans to have an official docker hub image? I'm asking because my workflow involves keeping the containers up to date with watchtower.

[-] neon_arch@programming.dev 1 points 2 years ago

Sorry for the delay in the reply.

Ok, thanks for suggesting this out. I have not thought about particularly in this area, but I would be really interested to have the docker image uploaded to docker hub. The only issue is that the app requires that the config file and blocklist and allowlists should be included within the docker hub. So the issue is that if a prebuilt image is provided, then is it possible to edit it within the docker container ?? If so then it is ok, otherwise it would still be good, but it would limit the usage to users who are by default satisfied by the default config. While others would still need to build the image manually, which is not very great.

Also, As side comment in case you have missed this. Some updates on the project:

We have just recently got the custom filter lists feature merged. If you wish to take a look at this PR, here.
Also, recently there has been ongoing on getting new themes added, and an active discussion is going on that topic and some themes' proposal have been placed. Here is a quick preview of one of the theme and what it might look like:

Home Page

Search Page

[-] orizuru@lemmy.sdf.org 1 points 2 years ago* (last edited 2 years ago)

Sorry for the delay in the reply.

No need to apologize! Thank you for working on this. :)

The only issue is that the app requires that the config file and blocklist and allowlists should be included within the docker hub. So the issue is that if a prebuilt image is provided, then is it possible to edit it within the docker container ?? If so then it is ok, otherwise it would still be good, but it would limit the usage to users who are by default satisfied by the default config. While others would still need to build the image manually, which is not very great.

I'm not familiar with the websurfix codebase, but I don't see why it wouldn't work.

I'm currently self-hosting SearXNG on a VPS, but I started by having it just locally. The important bit of that blog post is this:

docker run -d --rm \
              -d -p 8080:8080 \
              -v "${HOME}/searxng:/etc/searxng" \
              -e "BASE_URL=http://localhost:8080/" \
              searxng/searxng

I use the -v flag to mount a directory in my home to the config directory inside the docker container. SearXNG then writes the default config files there, and I can just edit them normally on ~/searxng/.

By using a mounted volume like this, the configs are persistent, so I can restart the docker container without losing them.

[-] neon_arch@programming.dev 2 points 2 years ago* (last edited 2 years ago)

Ahh, I see, Why didn't I remember this before that I can do something like this. Thanks for the help :). Actually the thing is I am not very good at docker, and I am in the process of finding someone who can actually work on in this area like for example reducing build times, caching, etc. One of the things we want to improve right now is reducing build time like I am using layered caching approach but still it takes about 800 seconds which is not very great. So if you are interested then I would suggest making a PR at our repository. We would be glad to have you as part of the project contributors. And Maybe in future as the maintainer too. Currently, the Dockerfile looks like this:

FROM rust:latest AS chef
# We only pay the installation cost once,
# it will be cached from the second build onwards
RUN cargo install cargo-chef

WORKDIR /app

FROM chef AS planner
COPY . . 
RUN cargo chef prepare --recipe-path recipe.json

FROM chef AS builder
COPY --from=planner /app/recipe.json recipe.json
# Build dependencies - this is the caching Docker layer!
RUN cargo chef cook --release --recipe-path recipe.json

# Build application
COPY . .
RUN cargo install --path .

# We do not need the Rust toolchain to run the binary!
FROM gcr.io/distroless/cc-debian12
COPY --from=builder /app/public/ /opt/websurfx/public/
COPY --from=builder /app/websurfx/config.lua /etc/xdg/websurfx/config.lua # -- 1
COPY --from=builder /app/websurfx/config.lua /etc/xdg/websurfx/allowlist.txt # -- 2
COPY --from=builder /app/websurfx/config.lua /etc/xdg/websurfx/blocklist.txt # -- 3
COPY --from=builder /usr/local/cargo/bin/* /usr/local/bin/
CMD ["websurfx"]

Note: The 1,2 and 3 marked in the Dockerfile are the files which are the user editable files like config file and custom filter lists.

[-] orizuru@lemmy.sdf.org 2 points 2 years ago

You're welcome!

I can have a look in my free time for fun. Will let you know if I manage to do it. 😅

[-] neon_arch@programming.dev 2 points 2 years ago

Ok no problem :). If you need any help regarding anything, just DM us/me here or at our Discord server. We would be glad to help :).

[-] Eikichi@lemmy.ml 3 points 2 years ago

Ty ! Once on my PC, I will definitely give it a try,

[-] neon_arch@programming.dev 2 points 2 years ago

Thanks for trying out our project :). If you need any help, please feel free to open an issue at our project.

https://github.com/neon-mmd/websurfx/issues

[-] caralice@mas.to 2 points 2 years ago

@neon_arch is searx not open source??

[-] lynx@sh.itjust.works 5 points 2 years ago

It is https://github.com/searx/searx but you should use searxng instead: https://github.com/searxng/searxng

[-] neon_arch@programming.dev 2 points 2 years ago* (last edited 2 years ago)

Yes, it is, but I just wanted to emphasize that my project is also open source because if I don't add this then it can raise some doubts whether it is open source or not. So to make it clear, I added it.

[-] lynx@sh.itjust.works 2 points 2 years ago

How do you rank the results from different search engines?

[-] neon_arch@programming.dev 2 points 2 years ago

Hello again :)

Sorry for the delayed reply.

Right now, we do not have ranking in place, but we are planning to have it soon. Our goal is to make it as organic as possible, so you don't get unrelated results when you query something through our engine.

What the project does is it takes the user query and various search parameters if necessary and then passes it to the upstream search engines. It then gets its results with the help of a get request to the upstream engine. Once all the results are gathered, we bring it to a form where we can aggregate the results together and then remove duplicate results from the aggregated results. If two results are from the same engine, then we put both engine's name against the search result. That's what is all going, in simple terms :slight_smile: . If you have more doubts. Feel free to open an issue at our project, I would be glad to answer.

[-] INeedMana@piefed.zip 2 points 2 months ago

If you don't mind i have a few noob questions

From a very high POV, how does it work?
I write a query and it passes it to a bunch of engines and then does some internal ranking of results?
can I configure engine weights?
Is some metadata stored (for example which engine response I chose) or is it rather stateless?
technically speaking it would still be my IP querying those external engines, right?
So any kind of "bubbling" they do based on IP is still in effect?
The general idea is that we would have a few shared instances of those, or rather everyone should be hosting an instance for themselves?
Do you foresee how the external engines might block Websurfx once they discover their ads don't land?

[-] neon_arch@programming.dev 2 points 2 months ago* (last edited 2 months ago)

Yes, sure, no problem, I will answer all you questions one by one:

Yes, that's how it works at high POV but also in a privacy oriented similar to searxng (so only your IP address is shared with the upstream engine but you can spoof that using VPN or by using the proxy feature in websurfx). Additionally we also rerank the search results after fetching search results from the upstream engines, to make it more organic (though, we admit not the best but contributions will help us alot in that area for sure. 😅 )
No, currently no, but you can open a feature request issue about it at our project here.
Yes, it is stored as a config, cookie and also you can export the cookie from the UI and then import it again (this can be useful if you change browser or clear cookies for some reason).
Yes, I explained about it and how to get around with it in the first point. 😅
I think it is better to self-host then depend on like an instance because you still have many privacy issues with VPSs (because it is still someone else's computer), but we do provide privacy enhancing features like encypting cache results which can help in evading the VPS from spying your/users search results in the cache.
Not really, as far as I know they don't block on the basis of that their ads not being displayed but yes there is still a chance that you could flagged by some engines because they want users to use their search engines because they do want to sell the user as their product (which is the same problem, searxng does have too, so there is nothing we can do about it but yes we can improve the bot evasion system, so that even less engines do it and we would welcome contributions in that area). 😅

Also, if you have more questions about anything feel free to ask, we would be very glad to answer them 😊 and we appreciate it 👍 too as it also helps answer and clarify other people's doubts too.

[-] INeedMana@piefed.zip 2 points 2 months ago

Thank you :)

5. What I meant by that question was rather "what's the design vision?"
Is the intended usage more like everyone should have their own, or rather those should be shared with more people (to lump up different topics originating from the same machine)

[-] neon_arch@programming.dev 2 points 2 months ago* (last edited 2 months ago)

Sorry for the delay in the response.

Thank you :)

Your welcome 💐

What I meant by that question was rather “what’s the design vision?” Is the intended usage more like everyone should have their own, or rather those should be shared with more people (to lump up different topics originating from the same machine)

The design vision for the websurfx engine is to protect user's privacy, and to have a really secure and fast search engine along with that also provide a really customizable, safe (like the level of protection you want for the safe search to go even allowing you to use filter lists to filter out specific websites from the search results), highly theme-able, and with organic and ad-free search results. So that the user can get a search engine which is entirely their own.

And for the case whether you should self-host it or use an instance of it depends on your threat level. Like many people because of your threat model may prefer self-hosting as they may not want to have their data served from a computer that is not their own.

[-] hascat@programming.dev 1 points 2 years ago* (last edited 2 years ago)

Ad-Free Results

How are you compensating the search engines you query?

[-] neon_arch@programming.dev 1 points 2 years ago

Hello again :)

Sorry for the delayed reply.

It is essentially, how we are achieving the Ad-free results is when we fetch the results from the upstream search engines. We then take the ad results from all of them, bring it to a form where it is aggregatable and then aggregate it. That's how we achieve it.

[-] oyzmo@lemmy.world 1 points 2 months ago

Looks great, I will try this 🤩

[-] neon_arch@programming.dev 1 points 2 months ago

Thanks ❤ for taking a look at our project, if you need any help then ask them here or DM me. I would be very glad to help. 😊

[-] oyzmo@lemmy.world 0 points 2 months ago

Hmmm, got it up and running but it times out on search. Logs show: Error(Io(Custom { kind: TimedOut, error: "timed out" }), "http://www.useragentstring.com/pages/useragentstring.php?name=Firefox") Error(Io(Custom { kind: TimedOut, error: "timed out" }), "http://www.useragentstring.com/pages/useragentstring.php?name=Firefox")

probably some settings I have gotten wrong 😅 why does it need to contact http://www.useragentstring.com/ ?

[-] neon_arch@programming.dev 1 points 2 months ago

Sorry, for the delay in the response.

Do you have the useragentstring.com website blocked by anything? Like using nextdns or filter lists of any sort. 😅

It needs to contact this website to get the random user agent string to spoof the user agent string of the user's search query to protect the user's privacy (that's why it is important.).

[-] oyzmo@lemmy.world 0 points 2 months ago

thanks. ah, probably my pihole then. weird though, I was able to access the webpage from my browser. any ideas? I just moved from my old qnap to a terramaster nas this weekend. I'll give it another shot in a week or two :)

this post was submitted on 07 Sep 2023

38 points (93.2% liked)

Rust

7924 readers

1 users here now

Welcome to the Rust community! This is a place to discuss about the Rust programming language.

Wormhole

!performance@programming.dev

Credits

The icon is a modified version of the official rust logo (changing the colors to a gradient and black background)

founded 2 years ago

MODERATORS

snowe@programming.dev

Ategon@programming.dev

EdTheLegendary@programming.dev

torcherist@programming.dev