Should machine-generated proposals be accepted?
I think we should figure out how to deal with them, because last week I processed 8 new proposals, and some of them had signs of being machine-generated.
Should machine-generated proposals be accepted?
I think we should figure out how to deal with them, because last week I processed 8 new proposals, and some of them had signs of being machine-generated.
Good topic. AI is having major impact atm on FOSS project development. Just as a FYI for now Iâll drop in the AI Policy that Software > Fedify uses: https://github.com/fedify-dev/fedify/blob/6c1f6e6f6410d62cada116fce21b47adb6defe30/AI_POLICY.md
I think it would be fair to ask whether a proposal with signs of AI/ML demonstrate that there is an implementation signed on.
We don't have this requirement for human authored FEPs. Perhaps it's a good idea there too.
Iâve seen multiple appeals by developers that FEPs must be related to an implementation. And I understand the reasons too, and they are not invalid either. Yet I donât think it is a good idea for 2 reasons..
It exuberates an âonly if you code you countâ techbroist vibe that is antithetical to what fediverse stands for. Many people already complain about the tech-heavy culture and the gap that exists between them and the developers, where it is very hard to make their voices heard.
It serves to keep fediverse firmly into app-centric territory, and in a way that increases the risks that fediverse derails over time. Where the app has the highest priority, and âusersâ (i.e. people) can appeal for features with individual devs or teams who own said apps, who will then on secundary priority figure out how to pragmatically hammer the feature onto the fedi wire. Interoperability is of tertiary concern. While this may lead to great apps, it not only risks recentralization around app platforms, but does not account for the healthy direction and good social experience of the social network as a whole. The ecosystem languishes while apps thrive.
Thereâs a big difference between machine-generated and machine-assisted.
For example, I use LLMs to sanity-check ideas, find gaps, and polish wording, but the actual thinking and context are mine. In that case, calling it âmachine-generatedâ feels wrong.
To me, the line is authorship: if the person understands, owns, and can defend the proposal, it should be fine. If itâs just AI output with little real understanding behind it, thatâs where it becomes a problem.
How would we go about challenging those proposals?
Arnold: there are certainly many with ideas and the will to produce FEPs. I don't wish to disincentivize or restrict that work, but merely state that I feel standards work without implementation (or even intent to implement) is work produced in a vacuum.
@skavish It's hard to draw a line between machine-generated and machine-assisted.
Some open source projects now require contributors to disclose when a part of the work was done by a machine. I am wondering if we could use a similar approach with FEPs.
Not necessarily. Filing a FEP has the intent to collect the feedback from fediverse and developer community and get the best outcome. And even devs might design something in advance, before the coding is well underway. This relates to mention of post-facto interoperability, which is good as a work method, but not if it is used exclusively and with a burden to the dev or others (in case they donât create one) to have FEPâs catch up after the fact. There is a sweet spot between post-facto interop and accepting protocol decay versus upfront design like e.g. ActivityPub API intends to do based on lessons-learned in the past. In an ideal situation implementation and standardization evolve in lockstep to each other.
Also I would not directly equate FEP-related work with âstandards workâ. FEP Process is part of a larger bottom-up standardization process, and it exists at the lower end of it, where âraw materialâ from the ecosystem is further chiseled. W3C SocialCG gives further formalization and standardization in this model. It does not work perfectly atm, and there may be better mechanisms to evolve Grassroots standards. These have my interest and focus re: Social experience design.
I agree. Thatâs why I think @skavish âs suggestion is more apropos.
Spelling correction, grammar checking, markdown formatting, FEP template instantiation are all âwork done by a machineâ. And whatâs the granularity of the disclosure (document, paragraph, sentence, phrase, word, general concept)?
I know you probably mean more specifically âwork done by, or assisted by, an LLMâ but I think this demonstrates the challenges in defining an effective policy.
This sounds reasonable to me and I think this could be applied to both human and AI-assisted FEP output. The FEP process currently doesnât require any minimum quality level for submissions. There are some Iâve seen with only a few paragraphs for a complex topic and where the author explicitly refuses to discuss it further. Those FEPs effectively die. I think the same thing would happen with a low-quality LLM-generated slop FEP that the submitter couldnât or wouldnât defend.
What suggestion? To focus on authorship?
How that should work in practice?
Yeah, it wasnât meant as a practical suggestion, more like an idea.
If you want to make it practical, Iâd probably focus on behavior:
Author has to actually respond to questions
They should be able to explain decisions in their own words
The proposal should evolve based on feedback
If someone canât or wonât do that, it doesnât really matter if it was AI or not, itâll die anyway. Thatâs probably enough without trying to police how it was written.
That said, Iâd still reject something outright if it clearly looks fully AI-generated, like generic wording, no concrete details, no trade-offs, no awareness of existing work or lots of confident but vague statements.
I do admit this is subjective and easier said than done, but weâre living through a pretty confusing and transformative time, everybody is trying to figure out how to deal with it.
Spelling correction, grammar checking, markdown formatting, FEP template instantiation are all âwork done by a machineâ
@stevebate I think "machine-generated" captures the essence of the problem very well. This means a submitter didn't do the work that is required to gain the deep understanding of a subject, so a meaningful discussion will not be possible.
There are some Iâve seen with only a few paragraphs for a complex topic and where the author explicitly refuses to discuss it further
I am not aware of such cases. In general, even a few paragraphs could be enough, because the discussion that follows is just as important as the text itself.
-----
Related: Wikipedia now prohibits generated articles:
https://en.wikipedia.org/wiki/Wikipedia:Writing_articles_with_large_language_models
Related: Wikipedia now prohibits generated articles:
âŠ
Some editors may have similar writing styles to LLMs. More evidence than just stylistic or linguistic signs is needed to justify sanctions, and it is best to consider the textâs compliance with core content policies and recent edits by the editor in question.
Given FEPs have very few and very loose content policies, how do you propose judging that a submission is completely machine generated (versus assisted)? How do determine how deep the submitterâs understanding is of the subject? One personâs meaningful conversation is sometimes anotherâs nonsense.
Who makes the decision and how? AFAICT, youâre currently the only active facilitator of the FEP process (and thanks for that). Is it your sole decision? Is a panel of judges going to be created? Or do you have something else in mind? And why wouldnât we also apply the same criteria (understanding of the subject, ability/willingness to have a meaningful discussion, etc.) to human-generated submissions?
@stevebate@socialhub.activitypub.rocks said in Machine-generated FEPs:
And why wouldnât we also apply the same criteria (understanding of the subject, ability/willingness to have a meaningful discussion, etc.) to human-generated submissions?
We absolutely should. If you spew out an FEP â machine generated or not â and you disappear afterward, don't be surprised when the FEP gets withdrawn?
This hauls back to important discussion from the early days of the FEP Process. An inclusive process that involves QA before submission on the substance of a FEP cannot be sustained, unless with a different organization structure and much more direct involvement from ecosystem participants. Currently W3C SocialCG is the organization best positioned where such activities should take place.
I am not proposing anything yet, and I am not the only active facilitator. But yes, I could be the one who makes the decision if everyone else leaves.
Maybe we should. I started this thread to figure it out.
Currently the FEP policy is to withdraw FEPs after two years of no updates. This is a long time, but sometimes even 2 years is not enough, see this discussion for example: https://codeberg.org/fediverse/fep/issues/95.