Who Owns Information in the Era of AI?

Published 02/23/2024

Originally published by CXO REvolutionaries.

Written by Tony Fergusson, CISO in Residence and Sam Curry, VP & CISO in Residence, Zscaler.

Mark Twain, the distinguished American author, once wrote, “The kernel, the soul, let us go further and say the substance, the bulk, the actual and valuable material of all human utterances is plagiarism.” This statement appeared in a letter written to Helen Keller in 1903, as the twelve-year-old girl was accused of copying another author’s work. One hundred and twenty years later, Hollywood writers ended a strike where they claimed artificial intelligence was doing the same.

You may be surprised to learn that Helen Keller, deaf and blind author, educator, and co-founder of the ACLU, was accused of plagiarism. It seems absurd to think someone struggling with her disabilities was stealing other’s written material. Yet, that is precisely what some say happened in 1892. Helen Keller produced a short story, The Frost King, which bore striking similarities to a tale by Margaret Canby. The ensuing public drama ultimately led to Helen Keller and her interpreter leaving their special needs school in disgrace.

What does a 19th-century plagiarism accusation have to do with cybersecurity? It reinforces that intellectual property (IP) is an ambiguous (but legal) concept that can damage your organization’s reputation and finances. For a sense of how serious this issue is, read what the NDAs of IP giants have to say about “intellectual contamination”. The advent of AI tools introduces a new dynamic to concerns over IP theft. If a disabled child deprived of sight and hearing can be accused of stealing ideas, is AI destined for the same fate?

Who owns an idea?

Where do ideas come from, and who owns them? Those answers are beyond the scope of this article, but let’s identify a couple of primary factors for our purposes. Many of our thoughts and ideas germinate from our sensory experiences in the world. We reference this database of personal experiences while navigating life. It helps us anticipate events, predict probable outcomes, avoid danger, and develop preferences that guide our behavior.

Another primary source of our thoughts comes from exposure to other ideas. We encounter an idea and then reference, improve, or innovate upon it. One scientist’s discovery lays a foundation for the next. A writer pens a tale that spawns countless derivative stories following a similar theme. This source of ideas is where IP theft and legal risk enter the picture. Ideas can often follow common themes but when they appear too similar, they invite the risk of legal retribution.

Helen Keller’s ordeal is relevant to modern cybersecurity because her experience was largely bereft of the first source of ideas - personal sensory experience. Without sight or hearing, her ability to build a sensory database of experience was limited. As such, she relied heavily on the second source, exposure to ideas from others.

Now, consider the plight of modern AI tools such as ChatGPT. Artificial intelligence does not have any natural senses, and it does not interact with the physical world. In fact, the only place it can derive “ideas” from is by exposure to information from others. All of its “thoughts,” predictions, and analyses must be derived from information provided by outside sources. This makes AI a primary candidate for accusations of plagiarism and intellectual theft.

Yet, is AI doing something that humans do not? How many of humanity’s stories follow the pattern of the Heroes’ Journey monomyth? How many college graduates repurpose, adapt, and reproduce knowledge they attained through higher education? Do we question the originality of their thoughts, since much of what they know is based on the ideas, observations, and discoveries of others?

Some endeavors such as music and writing use common resources for creative expression. Western music is rooted in major and minor scales. Songs may follow the same chord progressions but are differentiated by other factors like melody, rhythm, and lyrics. Likewise, works of literature use many of the same words but tell different stories by arranging them in a unique way.

Large language models (LLMs) like ChatGPT are adept at arranging words in a convincing and useful manner. They are intended to mimic human capabilities, which is why they use the same lexicon as human writers and parse words into familiar patterns. If an LLM generates a familiar-looking response it is because it is mimicking the style and verbiage of content people have produced. Yet, if you ask it to write a murder mystery it will not reproduce a story by Conan Doyle or Agatha Christie.

AI-generated property risks

Accusations of intellectual theft will likely plague AI-generated content and create business risks for the foreseeable future. Organizations using the same AI tools will get similar results when prompting the algorithm with near-identical queries. For example, people asking ChatGPT, “How do I bake chocolate chip cookies?” will receive remarkably similar answers. If some of these people post the cookie-baking steps on their website, it may appear that they have copied each other. Who owns the cookie-baking guide? ChatGPT generated the steps, but the AI’s “knowledge” came solely from other’s content.

This problem becomes more complex when AI handles the personal data of customers, partners, and third parties. Several nations have regulations strictly governing the use and storage of personal data. Any AI-generated content that leaks or exposes someone’s sensitive data creates legal and financial risks. In 2019, I argued that personal data, like private property, should require the informed consent of its owners to be used. While this is not the current state of the law, you can shield your organization from many business risks by acting as if it were.

Some companies currently try to hedge copyright and plagiarism risk exposure through internal policies. For example, a business may make it a fireable offense for their engineers to look up copyrights. This rule is in response to intentional patent violations costing three times as much as accidental infringement. A policy ensuring engineers remain unaware of existing patents is a reasonable protective measure under such circumstances. However, AI tools are not worried about losing their jobs, so they must be governed in other ways.

Perhaps the most well-known risk of using AI tools is data leakage. This problem was widely publicized in May 2023, when Samsung discovered its sensitive data leaked to ChatGPT. Imagine the nightmare of determining who owns the sensitive data now. Once Samsung employees submitted it to ChatGPT, they volunteered it as training data for the AI. Now, when ChatGPT produces content based on that data, who owns it? This is precisely the costly kind of legal dilemma your organization should seek to avoid.

Safely using AI

In this article, we’ve identified many problems involving AI’s relationship with intellectual property, but what is the solution? This question is especially relevant as a recent industry survey showed that 95% of organizations use AI tools. Furthermore, 57% of IT leaders allow their use without restrictions. This is a recipe for disaster.

An excellent first step in reducing AI-based business risk is determining where and how to use the technology. If your organization uses third-party AI tools, set policies and restrictions that protect your business from intellectual property-based risks. Block any sites hosting AI tools you do not need. Monitor the tools you allow and ensure they operate within secure business parameters.

Of course, there is always the danger that people will use unmanaged devices to submit sensitive material to AI platforms. In this case, having a robust DLP strategy (and toolkit) can be extremely useful. It is essential to have visibility into your environment and insight into where and how data moves. Knowing where your sensitive data resides and how it is accessed makes governing its interaction with AI simpler. Fortunately, the AI revolution is bringing new capabilities to DLP. It creates massive new advancements in helping organizations discover, classify, and manage dark data. This has been a major roadblock to many DLP projects in the past. AI can also assist with building new policies to protect and secure your data.

Stanford and MIT released a study showing AI can be a workforce multiplier. Unfortunately, many companies are taking a laissez-faire approach to adopting AI and assuming all will be well. If all does not go well, AI-generated problems may stack up faster than your organization’s ability to remediate them. It is worth mentioning that Helen Keller suffered long-term effects from the Frost King incident. In the years that followed, she mentioned she was unsure the thoughts she was expressing were her own. She dealt with this anxiety and self-doubt by never writing fiction again. Many organizations may likewise consider imposing a ban on AI tools to avoid potential negative consequences. While this approach may offer short-term relief, it can ultimately lead to your organization falling behind the competition.

Artificial Intelligence