FEP-c118: Content licensing support

indieterminacy · May 8, 2023, 1:10pm

[this got lost in the either but was cached, Im pushing… maybe the conversation has moved on or I should have checked this draft]
A practical suggestion, which looks like a good start.
https://spdx.org/rdf/terms/

Its of course worth pointing out that the definitions are in RDF, thereby permitting a user/team to create their own definitions and linking.

Id posit that this is a useful thing in the domain of licensing and the fediverse, especially when bespoke expectations can be interrogated using things like Sparql.

From what I see SPDX usually is operating within a file (as opposed to a dedicated field in support of a document/content). As such there are subtleties regarding how to chain components (and the conclusion in terms of licensing) as well as explicit/implicit representations.

For instance, Ive been thinking about ‘injecting hashes’, whereby content is hashed excluding annotations or notation (if not breaking down the content into smaller chunks). If such a hash is identified elsewhere then (casually) such inferences can be added (with all the carnets of provinence and context).

aschrijver · May 8, 2023, 5:12pm

I like that idea @indieterminacy Pinging @TimBray on this one as author of the FEP.

bobwyman · May 8, 2023, 5:31pm

@indieterminacy, Why do you propose reliance on SPDX rather than W3C ORDL? What about SPDX makes it superior to ODRL?

See:

TimBray · May 8, 2023, 5:40pm

Meh. The notion of using something that already exists is attractive, but the failure-to-success ratio of everything involving RDF is terrible.

ODRL doesn’t look terrible.

aschrijver · May 8, 2023, 6:05pm

Yeah, and as externality railroading full-blown DRM into the Fediverse, establishing it as “commercial space”

(PS: To me that looks terrible. But maybe I’m seeing this wrong, or a subset of ODRL may avoid this)

When talking “Social Web” we’d until now like to expand that to “Open Social Web” and public data being open data. With “content licensing” that would boil down to “This is the open license I publish this under”.

It may be that ODRL is optimal for expressing “digital rights”, but that goes much further than “content licensing”. And in that light other suggestions, like SPDX or whatever, gain an advantage imho if they avoid the externality.

bobwyman · May 8, 2023, 7:33pm

Arnold wrote:

It may be that ODRL is optimal for expressing “digital rights”, but that goes much further than “content licensing”. And in that light other suggestions, like SPDX or whatever, gain an advantage imho if they avoid the externality.

I may be missing something subtle in the distinction you make between “digital rights” and “content licensing.” I don’t understand how the two are different. What am I missing? What externality is avoided by using SPDX rather than ODRL?

I can’t see how the use of SPDX prevents, or even discourages, the use of any license terms that might be expressed via ODRL. However, I believe that the use of ODRL is more likely to result in users’ intent being respected because ODRL provides a machine-readable expression language for individual granted rights, while SPDX usually just links to licenses that must be read and interpreted by humans. My sense is that if SPDX is used, many implementers would simply declare that it is unreasonable to expect them to read and translate into code the terms of dozens of license documents written in impenetrable legalese.

Another issue is that SPDX, since it typically just references existing non-machine-readable licenses, makes it hard to express exceptions. As we all know, one of the reasons motivating this discussion is the desire of some to prevent search engine indexing of their content. But, as far as I know, there aren’t any existing licenses that explicitly address search engine indexing as something distinct from other uses. Thus, we might find that an SPDX solution would lead to people choosing licenses that prohibit more than they really want to prohibit. (Note: ODRL has vocabulary for controlling indexing.) There are dozens of licenses in use today because there are a wide variety of needs. Instead of proliferating the number of licenses as new needs are discovered, why not support a level of expression granularity that affords users an ability to compose machine-readable grants that address their unique needs?

In any case, any use of ODRL should probably be defined via a profile document which could remove anything in the default ODRL that was considered inherently bad. That profile would also define SocialWeb vocabulary that isn’t already in the default ODRL Vocabulary. And, it should make it very clear that a SocialWeb Rights Expression Language should only be used to grant rights which are otherwise withheld or reserved by law. Use of the SocialWeb Rights Expression Language would only be supported when it makes data more “open” than it otherwise would be by default.

bob wyman

aschrijver · May 9, 2023, 7:05am

Thank you for this response, which offers good handholds. I am not for or against any particular format, just don’t want so see full-blown DRM and its horrors enter the Fediverse. A “content license” is a subset of things that constitute “digital rights”. DRM horror scenario’s involving digital rights might be instance admins receiving threatening emails from a lawyer firm with a list “illegally boosted copyrighted content” and demands to immediately unboost and provide stern warning to users, or else… (just saying, idk what copyright/patent trolls can dream up).

I agree the intent is important. Something that specifically doesn’t have scope into DRM realms, and only allows the culture of openness we love on the Fediverse, is also communicating intent. Even safeguarding it.

If such a profile could do the trick, then it may be okay. But I do feel that the barrier to DRM has been lowered. After all implementations will support ODRL. Now you only need to bring in a different Profile.

how · May 9, 2023, 7:33am

Attribution can simply be a reference to the original post — whatever helps the reader to link the content to its origin. On the web we call it a hyperlink.
Share Alike is more difficult, as it should convey the conditions, so the licensing terms should be passed on.

Having an SPDX reference seems to be shorter and more compact than a full ORDL policy. However, it should be interesting to describe each SPDX license in ORDL terms. This would provide a number of advantages:

SPDX licenses would be machine-readable
Licensing terms would become explicit (e.g., the difference between GPL-3.0-or-later and AGPL-1.3-or-later would be shown to cover distribution over the network, making the AGPL a very straightforward and simple license to understand)
A license category of so-called permissive and restrictive licenses would become obvious and enable people to understand what is permitted and what is restricted, encouraging further reciprocal sharing and fading out extraction (e.g., DRM).

One thing we certainly want to avoid is to have a one-liner becoming burdened with licensing terms, and a grotesque inflation of bandwidth usage to satisfy lawyers and paranoid or repressive data regimes.

Indeed, under copyright law, if the license expires, then copyright is enforced (until the ever-growing limit of author death + n years – n = 75 at this point, but Disney might want to bribe some Congressmen into extending it again, although Mickey Mouse® might be superseded with iconoclast superheroes® who can joyfully save the world by destroying it.)

That said, I think a common charter is better than individual licenses, and so expressing conditions and policies consistently for what politics the Fediverse wants to achieve seems superior to me that any tinfoil hat legalese that may be conveyed by overly cautious corporate lawyers to protect the self-interest of their, hmm, assets. If fedizens want to enable federation while removing extractive powers of corporations or derivative use by AI, data extractivists, marketing ploys and abusive government agencies, then it’s important to be able to describe “licensing terms” in terms of machine-readable policies — but of course, no machine will respect that, since they’re operated by humans working for corporations. You know you’ve seen this before.

I would be curious about what Ted Nelson would do And also how @cwebber’s goblins handle permissions, since Spritely includes object capabilities, and content licensing terms might be part of it.

Maybe what we need is a Commons Data Profile that removes all market incentives over the social data generated by the Fediverse.

indieterminacy · May 9, 2023, 11:01am

ODRL looks pretty decent.

In the end such distinctions (such as from @TimBray regarding RDF being terrible) end up falling into differences of philosophy rather than technical or economic concerns.

As an analogy, the pervasive use of plastics is practical and allows people to produce, distribute and consume but is abhorrent to those who want to ‘do the right thing’ and dont mind doing things slower, at greater cost and with more convenience.

As a practical concern, trying to remember why I had originally wanted to suggest SPDX, it has capabilities for checksums and a reference for the checksum algorithm
(which a scan for ‘checksum’ and ‘hash’ in the ODRL documentation you provided didnt show up).

I believe that just as one has functional package management for domains such as building coding for toolkits, one should be doing this for content.

For example, Guix is excellent for reproducible research, with the inputs chained and verified not only with respect to inputs but the hashes of the coding and scripts.
We should be mindful about the need to have content assets as well as coding assets provided with such objective terms - it should not only be confined to the rigours of scientific enquiry but capable of modelling content or more practically for being mindful of edits, updates or blocking content from a hashing.

I guess this is not the most common type of concern but I consider hashes an important thing for a wide range of uses.

I could of course have overlooked ODRL’s approach for this, similarly there are other important facets concerning SPDX for or against.

I should feel that the complexity for addressing any potential added complexity for SPDX could be mitigated by hashing such impressions and then using it for an individual entity to whitelist/blacklist things according to economic and legal criteria.

I can expand further and can attempt to be clearer, just thought Id express this anyway.

indieterminacy · May 9, 2023, 11:10am

I think time sensitive rights is an imporant area, though I wouldnt say that such things are ‘giving them up’, so long as they are within the purview of somebody rationally making choices and them being fulfilled.

When it comes to rights, Im mindful of conflicts.

I recall that Fred Astaire put in his will that he must not have a biopic about his life, which the bounty of his wealth was no doubt sufficient to make his estate oblige.

The problem here is that complying meant that his dance partner, Ginger Rogers in effect received a veto, as its impossible to tell her tale without Fred Astaire featuring.

When it comes to rights its often that peoples rights and community rights are taken away because of inequality within systems.

bobwyman · May 9, 2023, 7:42pm

ODRL can be just as terse as SPDX if one uses the odrl:hasPolicy tag to point to an external file containing ODRL. For instance, all you need is:

“odrl:hasPolicy”: “http://example.com/policy.odrl”

To make things easy for people, what we could do is define a SocialWeb profile (like the Creative Commons ODRL Profile) that includes the definition of those policies most commonly expected to be used for SocialWeb content. These policies would differ from existing ones in that they would address SocialWeb specific concerns like search-engine-index rights, etc. The ODRL-encoded policy files would be hosted at some well-known location in much the same way that namespaces, etc. are commonly hosted. Ideally, most people would select from those consensus policies, but those who had unusual requirements would be free to define them either in their entirety or as exceptions (Constraints) amending the standard policies. Over time, if patterns or trends were found in the use of Constraints, then we could add new policies to the profile and to the shared repository.

Absolutely! Of course those who want anything other than minor departures from existing, pre-defined policies should be strongly encouraged to create them in ODRL files that are referenced using by using odrl:hasPolicy, rather than embedding them in each piece of content published. One might implement instance-specific rules that say things like: “Your post will be delayed or rejected if your policy statement is larger than your message content…”

I believe ODRL covers what you’re looking for. An ODRL Asset is any resource or a collection of resources that are the subject of a Rule. Assets can either be stand-alone, or “partOf” an AssetCollection. But, since Assets can be just about anything, ODRL doesn’t define any type-specific attributes for describing or naming Assets. So, if we wanted to use hashes to identify parts of an object as individual assets, or as partOf some AssetCollection, we could define those Asset attributes in a SocialWeb profile.

Such horror stories are more likely to occur if we remain stuck with today’s human readable and often incomprehensible licences. If we can move to a requirement that grants must be machine-readable, then it will be much harder to trap or trick people. The rule for admins should be: “Don’t do anything not clearly allowed under copyright law with any object that has policies or rules that you don’t understand.” Admins might even decide to handle content with opaque policies. So, if some lawyer links to a non-machine readable license, or uses a machine-readable policy that includes non-standard vocabulary, they should expect that their content simply won’t get distribution. We can use ODRL, as a machine-readable syntax, to force people to be explicit and open about the rights they are withholding.

Remember also that, in the absence of a contract between publisher and consumer, the use of any Rights Expression Language is inherently limited to granting rights which would otherwise be withheld. Thus, the Rights Expression Language can only be used to make content more open and more freely used. Without a contractual relationship, nothing anyone inserts in their content allows them to restrict rights that are not already reserved to them by law.

indieterminacy · May 10, 2023, 9:38am

Thanks for articulating these reassurances.

I suppose given your emphasis on machine-readable as a criteria then there would be a benefit of crowd-sourcing opinions regarding how the addition or removal of conditions inside license components should impact criteria for being able to use (or not use) an asset.

I expect there is a natural cost of investigation and risk regarding using ambiguous licensing arrangements - which means that people gravitate towards the opinion of larger organisations/cooperatives .

For example, Ive seen discussions in the Guix-Devel mailinglist about being risk adverse about certain licenses being ambiguous, as well as potential incompatabilites wrt using multiple licenses together.

I recall something about OpenZFS once.

Here is an example about mixing licenses as a quick concern:
https://mail.gnu.org/archive/html/guix-devel/2016-08/msg00308.html

Is there a possibility that we could check in with specialist communities such as Guix with very specific questions to ensure that any adoption is in line with pro libresoft best practices and likely to encourage better reproducability and interoperability of content?

Any suggestions as to what this would look like?

bobwyman · May 10, 2023, 8:43pm

Soliciting input and expertise from a broad range of sources is always a good idea!

There are sometimes issues with license compatibility. This is one of the very important reasons that it is necessary to provide mechanisms to allow exceptions to standard policies. As you pointed out, there is an issue when using OpenSSL in systems licensed under GPLv3. A similar issue arises when you try to combine code licensed under GPLv2 with GPLv3 code. (The GNU site states clearly: “there is no legal way to combine code under even GPLv2 with code under GPLv3 in a single program.”) So, in order to avoid such conflicts, exceptions are often used (e.g. “I invoke GPLv3 for this system, excluding the [x, y, …] components which are separately licensed”).

The conflict between OpenSSL and GLPv3 arises whenever OpenSSL is used as a component of a system licensed under GPLv3, or whenever OpenSSL is a component of a system that also uses some GPLv3 component(s). This is because GPLv3 is a “strong copyleft” license and applies to the entire software system, not just to one or more of its components. Without a stated exception, GPLv3 must apply to all of the components of a system, and thus must also apply to OpenSSL – but it can’t. So, those who use OpenSSL as a component, but would like to use GLPv3 licenses for their own work, will often invoke GPLv3 with an exception that specifically excludes coverage of the OpenSSL component. This need to do this is common enough that “GPL linking exception” has its own Wikipedia page.

Fortunately, the strong v. weak copyleft issues appear much most frequently in the context of building software from components (linking, compiling, etc.). The idea of “copyleft” does have parallels in the “content” or “data” world, but they are less common.

Yes. I’d like to see a SocialWeb ODRL profile that describes, in detail, the meaning of each machine-readable permission, constraint, obligation, etc. and that also includes a definition of the “non-standard” term. In general, those using non-standard terms should expect that systems that don’t have some out-of-band method for determining granted rights would refuse to process objects with non-standard terms.

jmking · May 17, 2023, 1:32pm

The maintainers of a Calckey fork (Blajkey) are implementing AP extenders to accommodate personal content licensing and are soliciting feedback: Kaity A (@supakaity) | Blåhaj Zone

strypey · September 20, 2024, 4:39pm

I’m aware I’m a year and half late to the party here, but as a co-founder of Aotearoa Indymedia and CC Aotearoa/ NZ I have a strong interest in this issue. A few comments of a general nature.

Firstly, I think it’s important we surface and map out the mismatched expectations and submerged assumptions about social norms that underlie discussions about privacy/ redistribution of fediverse posts. For example;

I’m shocked that you think Public posts are not public. This is an example of mismatched expectations, and points to the differing assumptions we bring into the discussion about what using a “Public” posting scope means.

To me, it means they’ve publishing their comment, like a letter to the editor in a newspaper, for anyone in the world to read. I think it’s fair to expect people to either stand behind statements they’ve published; or apologise and withdraw them. Anything perceived as placing arbitrary limitations on established rights to link to and quote public-facing web comments will be subject to furious pushback (I know because I’ll be one of the cranky citizen journalists pushing back).

But because I think this, I want software UI to make publishing (eg to the open web) an informed choice, every time. If people think they’re just chatting privately with their friends in the park, when they’re actually being livestreamed on a public-facing website at publicpark.live, that’s a problem that needs solving. And the solution is not to trying to insist, King Canute style, that permission is needed from each person who happens to appear on this public livestream to view it.

I expanded on this a couple of months ago, in a comment on a Fediverse Ideas issue about Quoting Fediverse discussions.

Secondly, IANAL but I think there’s an important distinction here; whether a fediverse post is treated as publishing (like a newspaper article), in which case copyright law applies, or communication (like a private letter), in which case it probably doesn’t.

Eg I’ve never seen anyone suggest that a comment in a public IRC channel is a copyrightable work, requiring permission to copy or archive (although it might be). Even though there are probably IRC clients that allow browsing of public IRC rooms on the web, without authentication. Blog comments OTOH do seem to be treated as publishing, because creating an account to comment in a commercial Walled Gardens usually includes some kind of rights grant to allow the host to store and display it (and usually less innocent things too …).

Thirdly, I agree with the comments that concerns about copying, indexing, etc, of fediverse posts are mainly about privacy, not the sorts of publishing rights addressed by copyright law. @bobwyman makes a good point that a robust system for allowing people to express how they do and don’t want their posts to be used, can address both sets of concerns. Also that this can be done in a way that bakes in open content defaults, and that automates as much of this as possible, reducing the amount of time fedizens and admins have to spend thinking about it.

But I suspect this is more complicated than it needs to be. As I said in that Fediverse Ideas comment, I think it would be simpler to make a clear distinction between public and non-public posts, at both the protocol and UX levels;

A public post is publishing. Like anything else on the open web, it’s fair game. It’s legitimate to index, archive or quote it, with or without consent. If someone doesn’t like me doing that, as part of an independent media operation, they can send me a cease and desist letter citing their copyright monopoly, and I’ll start talking to someone like the Freedom of the Press Foundation et al about pro bono presentation. Because in the digital age, those rights are essential to journalistic freedom, and we will not surrender them easily.
A private post is not publishing. Even if it’s posted using a hypothetical “fediverse-only” scope that means it can be seen by anyone signed into a fediverse app. Like an email to a private mailing list - or anything else published within an authenticated communication space - consent is required to do anything with it. Except distribute it within whatever defines that space (eg subscribers to the mailing list), and quote it in a reply within that same space.

If fediverse apps make a clear distinction between the two when people are posting, and fediverse software consistently respects the distinction, I suspect a lot of the debates around this will dry up.

However, that still leaves the issue of what license applies to posts from publishing-orientated apps like PeerTube, FunkWhale and CastoPod, and blogging apps that federate over AP, like WriteFreely, Plume, WordPress-plugin and Ghost. PeerTube, for example, allows a license from the CC suite to be added to a video. It would be great for that license to propagate through the verse with the video, in machine-readable and human-readable ways. I think @bobwyman is right that ODRL sounds like a good standard to wrap in an FEP for this.

jmking · September 20, 2024, 5:46pm

I think one of the ways this is different that publishing an op-ed in a newspaper is people are publishing “public” posts to a network, and that network is defined (by them, perhaps incorrectly or not entirely understood) as the connected servers. It’s not publishing to a freely-accessible newspaper, it’s more like publishing to a closed subscription list of opted-in readers on a group mailing list.

I am aware that in Mastodon’s implementation, a “blocked” or “defederated” server is not in fact blocked, and user members are mostly of the belief that their public messages are only being broadcast to the non-blocked servers on the network, but a Mastodon block (and I’m unaware what the implementation may be in other ActivityPub platforms) only stops content coming in, not going out, and allegedly defederated servers are still able to GET any and all messages from the originating server. I understand Mastodon has an option for authorized fetching which can curtail that activity, but it’s strongly recommended by the Mastodon team to not enable that feature without significant impact to the underlying service.

My understanding of many user’s intent is that “public” means “public to the network I believe I am connected to”, not “public to absolutely anyone in the world”. I believe a heck of a lot of people think “public” means sending a message to some subset of the however many million accounts known to exist on AP services, and certainly are always surprised to see or hear that their messages are easily consumed by blocked servers. When I was subjected to a couple of days of hate, calls to kill myself, videos of beheadings, and various other torrents of abuse and obscenity earlier this year, I found it originated from a server replying to what I believed to be unable to consume my messages, which then led to off-network abuse. Imagine my surprise. Like @TimBray I was shocked to learn of this, and I’m a competent technologist. The growing audience using ActivityPub is not necessarily a technically-savvy actor. I have personally had to counsel and educate folks who believed their harassers were blocked but have been nonetheless subjected to extreme harassment and abuse, that the protocol and its implementations freely make their content available to individual and server actors that have been “defederated” by the author. This is a not-infrequent occurrence.

Publishing content to a public web site is not cognitively the same as publishing to a network of connected services, no matter the technical truth beneath that assumption.

I would suggest therefore that

Activities may additionally be addressed to the special “public” collection, with the identifier https://www.w3.org/ns/activitystreams#Public […] Activities addressed to this special URI shall be accessible to all users, without authentication.

be made clearer that this is a special collection, and possibly not the desired default, and/or amended to exclude users or actors that are known to be blocked by the originating service. If public shall always mean public including known sources of abuse and harassment, then it’s likely public as a collection should be reserved for folks who are willing to take that chance with fully informed consent.

In other words, some work remains to be done to either (a) educate the user audience that public means public, it’s equivalent to publishing on the open web; and/or (b) educate the implementers that public might not be the desired default; and/or (c) revisit the intent and impact of the public collection.

I understand this is not in scope for content licensing - which is the intent of this FEP - but the context is, I believe, relevant.

Edit, I also want to point out that the value proposition for many is “post this content to anyone on this network who uses this protocol and isn’t blocked by me or my server”. It seems the protocol doesn’t really accomodate this value proposition, which is a shame.

strypey · September 20, 2024, 6:48pm

Great comments @jmking. All examples of what I meant when I said;

If we have clarity about what everyone’s assumptions and expectations are, and where they clash with the technical reality, then we’re in a place where we can educate fedizens about assumptions and expectations we may not be aware of, and reform the technical reality so that it meets everyone’s needs, in an unambiguous way.

Yes. Exactly what I meant by requiring that;

Because the alternative is to remove the public-to-open-web publishing scope from the fediverse entirely, which would devolve it into a janky Matrix without encryption. I can’t see the value in that. We might as well all fold up our tents and stand up Matrix servers, and then we get E2EE into the bargain.

EDIT: This comment was unfair to Matrix. Messages in public Matrix rooms can be linked on the open web. So removing the public-to-open-web publishing scope from the fediverse would reduce it to a very janky version of Matrix indeed.

bobwyman · September 20, 2024, 8:28pm

Copyright law makes no such distinction. Copyright applies to all creative acts of speech, writing, etc. Yes, even “a comment in a public IRC channel” is subject to copyright.

jmking · September 20, 2024, 8:35pm

Agreed with your points, no question, but I think that even with increased education to implementers and/or fedizens, there’s still a gap between followers-only and public that means “public to this network” that the protocol could address. So I suppose we are then at the point you raise of

surface and map out the mismatched expectations and submerged assumptions

… barring some maybe-not-far-enough-reach polls on Fediverse, how best to approach this?

WRT

Just to nitpick on this a bit, “all users” in this context is following the Overview that describes

A client to server protocol (so users, including real-world users, bots, and other automated processes, can communicate with ActivityPub using their accounts on servers, from a phone or desktop or web application or whatever)

and

In ActivityPub, a user is represented by “actors” via the user’s accounts on servers.

My read of the document is that a user is not a random public http request, but a user of ActivityPub (“including real-world users, bots, and other automated processes”) with an account on a server, represented by an Actor.

jmking · September 20, 2024, 8:36pm

even “a comment in a public IRC channel” is subject to copyright

So is this post.