This case study explores the development and implementation of a Sybil detection algorithm for Farcaster based on the samples of OP Airdrops and Citizen House governance. We’ve analyzed various data sources including on-chain activities: POAPs, Attestations, multisig and Farcaster data. The analysis of the Airdrop 5 sample revealed that 76% of addresses showed some on-chain activity, with OpenRank being the best predictor for human activity in our opinion. Still the behaviours identified might have only limited contribution to predicting Sybils and accuracy needs vary significantly between cases. So it requires the research teams to redefine the assumptions and questions for further steps.
Need
Since the end of last year our team has been working on an OP grant request to develop a Sybil detection algorithm for Farcaster, enhancing decentralisation by improving accuracy in identifying Sybil accounts. The project involves defining requirements, designing and implementing the algorithm, integrating it with existing infrastructure, and evaluating its effectiveness. Success was to be measured by the algorithm’s accuracy, adoption rate, and impact on governance security.
We’ve spoken with a few potential users for the algorithm and managed to identify 2 key scenarios for the use:
- OP Airdrops (main scenario) There have been 5 airdrops to date, each drop reached ~50k addresses. While some addresses looked suspicious and there was an effort to restrict those in the past, false positive ranking led to significant dissatisfaction from the community. Also the rewarding mechanics change from drop to drop, and some rewards (e.g. for gas spending) has no incentives to Sybil Attacks. The team could use the algorithm to monitor the level of Sybils in the group at the moment to make better decisions in the future
- Citizen House (secondary scenario) Recently Citizen house started to invite guest voters in the RetroPGF in small batches handpicked by github or farcaster. There’s a high confidence that current samples have no Sybils, but for scalling it potentially from 2 hundreds to thousands need a way to leave out Sybils to avoid governance captures. In this case false positives are much better than allowing some Sybils in
Solution
Our initial capacities included:
- Tracking on-chain data as POAPs, Attestations (e.g. citizens) and delegates in the OP chain
- Membership in SAFE multisigs (unfortunately their API went down under attack lately, so we couldn’t collect some data)
- Delegates data (from Agora)
- Farcaster data (regarding following and OpenRank)
So already from researching the use cases we saw
- Different needs in accuracy between cases, even a request to give raw data of subscore to accommodate formula to different situations
- Also both cases had more analytical needs at the moment, rather than a specific process integration request (also nice to have, rather that must have)
- In order to research the usefulness of model to predict Sybils we’ve decided to use Airdrop 5 sample as having much more on-chain data traces found (compared to guest voters and other airdrops), prioritizing also the Airdrop use case at the moment
So we’ve
- Collected the data on various sample (airdrops, retropgf voting)
- Compared the samples and identified a bunch of insights (specifically on the Aridrop 5 sample)
- Created an interface for individual testing and API for the developers:
- Test interface: https://farcaster-sybil-rank-algo.vercel.app/
- API: https://farcaster-sybil-rank-algo.vercel.app/api-docs
- Github:
Embed GitHub
Results
While not a sole predictor for Sybil, OpenRank could be the most useful in identifying real humans
- 76% of addresses showed at least some on-chain activity (POAPs, attestations or OpenRank), but it doesn’t mean others (or those with low activity) are clearly Sybils
- Between the identified parameters OpenRank seemed to be the hardest to fake, while multisig, POAPs and attestations could provide some meaningful data but are more prone to hijacking
- Also OpenRank proves to be best across other parameters in coverage (57% of addresses has some score compared ~43% who have POAPs or attestations)
- While some % of Sybils is expected in the sample, potentially it should be lower than 24% that have no activity we tracked (not mentioning those who have minimal)
- So Identified parameters might not be an accurate predictor for Sybils just on their own
Still parameters show a correlation with the rewards received and can advance understanding of the sample health
- Attestations have the strongest correlation with reward amounts (0.21), followed by OpenRank (0.19), and POAPs show the weakest correlation (0.1)
- Average reward for addresses with OpenRank was 90% higher than for those who had no such rank
- This additional parameters thus could be used as a health check for further airdrops
Interesting Facts
- We’ve found that airdrop samples had some good amount of “health” in the terms of sybil prediction, as they didn’t encourage such behaviour and have vast data on addresses. So the airdrop data could be used for Sybil identification in Farcaster where needed
- Some record breakers: Wuestenigel has 3059 POAPs andmrbreadsmith.eth has 835 attestations
- There’s only 11 citizens and 10 delegates in the sample
Next steps
- Discuss the findings with the community
- Analyse limitations and improvements
- using more proof of human protocols (e.g. Holonym)
- picking address activities also on other chains
- interaction with smart contracts (Gitcoin etc)
- Identify new questions / use cases
- Decide if the support/further development is needed