This case study explores the development and implementation of a Sybil detection algorithm for Farcaster based on the samples of OP Airdrops and Citizen House governance. We’ve analyzed various data sources including on-chain activities: POAPs, Attestations, multisig and Farcaster data. The analysis of the Airdrop 5 sample revealed that 76% of addresses showed some on-chain activity, with OpenRank being the best predictor for human activity in our opinion. Still the behaviours identified might have only limited contribution to predicting Sybils and accuracy needs vary significantly between cases. So it requires the research teams to redefine the assumptions and questions for further steps.

Need

Since the end of last year our team has been working on an OP grant request to develop a Sybil detection algorithm for Farcaster, enhancing decentralisation by improving accuracy in identifying Sybil accounts. The project involves defining requirements, designing and implementing the algorithm, integrating it with existing infrastructure, and evaluating its effectiveness. Success was to be measured by the algorithm’s accuracy, adoption rate, and impact on governance security.

We’ve spoken with a few potential users and partners (OP, OpenRank, Giveth, Farcaster etc) for the algorithm and managed to identify 2 key scenarios for the use:

OP Airdrops (main scenario) There have been 5 airdrops to date, each drop reached ~50k addresses. While some addresses looked suspicious and there was an effort to restrict those in the past, false positive ranking led to significant dissatisfaction from the community. Also the rewarding mechanics change from drop to drop, and some rewards (e.g. for gas spending) has no incentives to Sybil Attacks. The team could use the algorithm to monitor the level of Sybils in the group at the moment to make better decisions in the future
Citizen House (secondary scenario) Recently Citizen house started to invite guest voters in the RetroPGF in small batches handpicked by github or farcaster. There’s a high confidence that current samples have no Sybils, but for scalling it potentially from 2 hundreds to thousands need a way to leave out Sybils to avoid governance captures. In this case false positives are much better than allowing some Sybils in

Solution

Our initial capacities included:

Tracking on-chain data as POAPs, Attestations (e.g. citizens) and delegates in the OP chain
Membership in SAFE multisigs (unfortunately their API went down under attack lately, so we couldn’t collect some data)
Delegates data (from Agora)
Farcaster data (regarding following and OpenRank)

So already from researching the use cases we saw

Different needs in accuracy between cases, even a request to give raw data of subscore to accommodate formula to different situations
Also both cases had more analytical needs at the moment, rather than a specific process integration request (also nice to have, rather that must have)
In order to research the usefulness of model to predict Sybils we’ve decided to use Airdrop 5 sample as having much more on-chain data traces found (compared to guest voters and other airdrops), prioritizing also the Airdrop use case at the moment

So we’ve

Collected the data on various sample (airdrops, retropgf voting)
Compared the samples and identified a bunch of insights (specifically on the Aridrop 5 sample)
Created an interface for individual testing and API for the developers:

Test interface: https://farcaster-sybil-rank-algo.vercel.app/
API: https://farcaster-sybil-rank-algo.vercel.app/api-docs
Github: Embed GitHub

Results

While not a sole predictor for Sybil, OpenRank could be the most useful in identifying real humans

76% of addresses showed at least some on-chain activity (POAPs, attestations or OpenRank), but it doesn’t mean others (or those with low activity) are clearly Sybils
Between the identified parameters OpenRank seemed to be the hardest to fake, while multisig, POAPs and attestations could provide some meaningful data but are more prone to hijacking
Also OpenRank proves to be best across other parameters in coverage (57% of addresses has some score compared ~43% who have POAPs or attestations)
While some % of Sybils is expected in the sample, potentially it should be lower than 24% that have no activity we tracked (not mentioning those who have minimal)
So Identified parameters might not be an accurate predictor for Sybils just on their own

Still parameters show a correlation with the rewards received and can advance understanding of the sample health

Attestations have the strongest correlation with reward amounts (0.21), followed by OpenRank (0.19), and POAPs show the weakest correlation (0.1)

Average reward per attestation count

Average reward for addresses with OpenRank was 90% higher than for those who had no such rank

This additional parameters thus could be used as a health check for further airdrops

Interesting Facts

We’ve found that airdrop samples had some good amount of “health” in the terms of sybil prediction, as they didn’t encourage such behaviour and have vast data on addresses. So the airdrop data could be used for Sybil identification in Farcaster where needed
Some record breakers: Wuestenigel has 3059 POAPs andmrbreadsmith.eth has 835 attestations
There’s only 11 citizens and 10 delegates in the sample

Next steps

Discuss the findings with the community
Analyse limitations and improvements

using more proof of human protocols (e.g. Holonym)
picking address activities also on other chains
interaction with smart contracts (Gitcoin etc)

Identify new questions / use cases
Decide if the support/further development is needed