In today’s competitive innovation economy, digital technologies and artificial intelligence (AI) tools are increasingly used in business transformation and growth. In-house counsel at companies across industries need to be prepared
to understand how to identify and mitigate commercial issues and risks related to AI, data and IP in a digital business context.
Recently Smart & Biggar partnered with the ACC Ontario Chapter to bring together a diverse panel of IP experts and in-house
counsel for an engaging webinar roundtable discussion to explore strategies for managing IP and compliance risks in a digital world.
Moderated by Smart & Biggar principal Graham Hood, who is a lawyer and trademark agent, panelists included Neil Padgett, who is a lawyer and patent agent and currently Associate General Counsel for intellectual
property at Shopify, life science compliance and regulatory lawyer Alice Tseng and lawyer and patent agent Patrick Roszell both also principals at Smart & Biggar.
This article provides a summary of key insights and takeaways from the panel, with strategies for in-house counsel to leverage and protect a company’s valuable portfolio, both offensively and defensively, to mitigate risk and secure commercial value
Please note the speaker views expressed in this presentation summary are for guidance only and do not reflect the views or position of the companies they represent. Furthermore, those views may be general in nature and should not be taken as legal advice.
Roundtable Discussion Replay
- Defining and identifying artificial intelligence (AI)
- Rights to data: AI tools and data ownership
- How to protect against data use issues in future AI projects
- Managing IP Rights in AI
- Conclusion: Advice to in-house counsel
1. Defining and identifying artificial intelligence (AI)
Question: How would you characterize AI and what are some examples of AI in a business context?
(Neil Padgett): Artificial intelligence is quite a broad term in computer science, encompassing a variety of computing techniques with many different views and approaches. At its core, Artificial Intelligence (AI) refers to enabling computers to perform tasks that would otherwise require the application of human intelligence. In practice, this often involves the use of Machine Learning. Machine learning refers to algorithms
and techniques that allow machines to learn to perform tasks based on data rather than on explicit instruction. There are a number of different types of machine learning, suited to various types of inputs and desired outputs.
Typically, machine learning involves collecting a large data set–called a training data set) and using that training data set–to generate a model of a specific phenomenon or relationship. Once trained, the model can be used to make
inferences or predictions about the future. Each item in the data set represents an instance of the thing you are modelling and is accompanied with a label designating a known outcome, defined classification, etc. The training data set is effectively a large data set of input data that you use as a sample for your model. How you train a model can vary. Once you’ve built a model, you can feed it new input data to use the model to predict the appropriate outputs.
(Alice Tseng): An example of AI in the healthcare space
An example of AI in the healthcare space is a company creating an AI tool for lung cancer by building a training data set from digital images of x-rays sourced from various hospitals in Ontario. Once the AI model has been developed, radiologists can upload new x-ray images into the AI tool to get diagnostic results in near real time.
- Purpose of the AI tool: to detect lung cancer; or classify lung cancer (Stages 1-4); or segment (i.e., draw borders around areas of interest on the image, such as tumours, to make further processing and analysis by clinician easier).
- Benefits of the AI tool:
- Automated – faster results (in seconds after images have been uploaded) lead to cost savings in the long term (fewer radiologists required)
- More consistent, accurate – given radiologist error rates at around 3% to 5% in image interpretations, AI may play an important role in preventing medical errors. This is because AI applications are able to process large amounts of data without being affected by radiologists’ lapses in memory, fatigue, illness or other human factors
- Objective output – potentially less bias since training data set is from multiple sources (e.g., multiple hospitals and multiple health care professionals, not just one person)
- Fact-specific benefits – this AI tool can result in less radiation exposure for patients since can impact the care that a patient receives. If the AI tool is very accurate, it may allow a patient to be assessed using an x-ray only and not require additional screening (e.g., CT or MRI) after the x-ray to be certain of diagnosis
- How can it be used:
- Secondary diagnostic tool – this tool is likely to be “secondary diagnostic tool” in Canada in the near term (i.e., support radiologist’s decision-making). In other jurisdictions, such as Nigeria, this type of AI tool may be a “primary diagnostic tool” since Nigeria has a population of 200 million people but an insufficient number of radiologists (5 to 500 radiologists).
Question: How would engineers build an AI system to solve a real-world problem like this?
(Patrick Roszell): Building AI systems to solve real-world problems involves machine learning and training using a data set
When using an AI system to solve a business problem, it will likely involve machine learning and require a training data set. The data set would be used to “train” a machine learning model that the AI system would then employ to generate
results or solutions.
“Training” - machine learning models are created by inputting numerous instances of input data, each paired with a known corresponding output. A machine learning algorithm uses those pairs of inputs and outputs to infer relationships
between input data and the desired type of output data. There are various types of algorithms, such as “deep learning”, “neural network” and “classifiers.”
In the context of the examples mentioned above, the input data set would consist of individual images such as x-rays and the output data would be known as corresponding diagnoses. The x-rays may be characterized by specific features (not just
the image), such as the relative proportion of light and dark grays as well as shapes in an x-ray image.
That data set would be input to an algorithm to generate a model that links characteristics of input images to one or more possible diagnoses. Later x-rays characterized by the same features could then be input into the model to get a predicted
diagnosis. The data input from the training set is critical for the machine to do its job properly and generate useful results and solutions.
(Neil Padgett): Data and machine learning are the core of any AI tool
At the core of the “tool” is the machine learning model, which is produced or “trained” by the learning algorithm based on the training data set. Essentially, the data set combined with the machine learning techniques is what yields the tool. It’s very important to consider who owns the rights to the data.
2. Rights to data: AI tools and data ownership
Question: What about rights to data, and data ownership in an AI tool?
(Patrick Roszell): Data ownership and rights to data used in AI tools is a key legal issue
In industrial settings, AI tools may be built on top of data sets that are licensed in, or existing systems that generate or collect data for other purposes. Often, data may be sourced from, collected or generated by some kind of specialized system.
For example, data may be generated by process control or automation software or hardware, or by sensors installed for such controls. Sometimes, third-party technology may be involved in this data generation or collection, which raises
the possibility that third parties may have rights in the data.
In the context of our example, consider that at least some of the imaging databases come from hospitals that have contracts with a particular supplier of imaging machines and associated software. Those contracts may include contracts of sale for
the physical hardware, and license agreements, software as service agreements or similar agreements that touch on software used to operate the imaging machines and the associated data.
(Alice Tseng): Personal information
With current privacy laws, organizations can typically only use personal information (PI) for the purposes for which the individual consented, unless otherwise permitted. However, an organization may want to use PI collected for
one purpose for a secondary purpose.
In healthcare (e.g., x-rays taken for healthcare purposes sold to a third party to develop AI software for commercial purposes), questions include: can PI be used for a new purpose when consent was originally obtained for a different purpose,
or can the information simply be de-identified and used without express consent?
The analysis depends on various factors, including:
- Which law applies – there are many provincial privacy laws governing personal health information held by a health information custodian, or PI held by public bodies (e.g., hospitals).
- What does the privacy consent originally agreed to say – does it consider de-identification or anonymization of PI at all (do you already have de-identified information or are you dealing with the PI in the form originally
collected); if de-identified for one purpose (e.g., for safety review purpose), even though no specific consent for use of the de-identified information for AI purposes, as long as the consent didn’t affirmatively say de-identified
information would not be used for other purposes not already specified, may be reasonable to use the already de-identified information for AI purposes too
- How will de-identified information be used? To improve the quality of care for your hospital’s patients or to provide to law enforcement?
- Are you a service provider or organization with control of the PI? Service provider is typically only allowed to collect, use and disclose PI as set out in the contract between it and the organization. Even if the organization is permitted from a privacy perspective to de-identify PI and use it for AI purposes, whether the service provider also has this right must be considered from a contractual perspective.
These are only some of the considerations regarding whether, from a privacy perspective, you can use PI (which was originally collected for one purpose) for AI purposes without seeking consent again.
Current legal landscape for Personal Information in Canada
This past summer, Bill C-27, the Digital Charter Implementation Act, 2022 was tabled (1st reading June 16, 2022), re-introducing Consumer Privacy Protection Act along with the new AI and Data Act.
20 An organization may use an individual’s personal information without their knowledge or consent to de-identify the information.
Policy reasons to permit de-identification without consent include:
- AI requires large volumes of data – huge administrative burden to seek consent, including obtaining updated contact information or if patient is inaccessible (e.g., elderly or has died), etc.
- If consent is obtained, the data may be skewed because individuals who consent tend to have different characteristics than those who do not consent (e.g., disease severity, age, education, race, etc.).
Overall, the government is leaning towards making it easier to de-identify PI to be used for AI purposes.
(Patrick Roszell): Contractual data use restrictions and ownership issues to consider and watch out for
It's very possible that the rights of technology suppliers may be implicated in datasets. As a practical matter, the likelihood of these rights coming into play may depend on the specialization or intelligence of the products used for data acquisition, as well as the method by which the data is stored.
For instance, in the imaging example, a supplier of imaging devices may have its own special image processing to clean up images acquired using its machines and may be concerned about how that processed data is used. Therefore, it is not uncommon for vendors of specialized equipment to assert some measure of control over data collected using the equipment. Another possibility is that the imaging devices are operated using specialized software which is itself subject to restrictions – if data is generated by the software, it raises a possibility that “use” of the data is akin to “use” of the software and therefore subject to the same restrictions. Contrast that with something simpler – such as an off-the-shelf digital thermometer that simply outputs a stream of temperature values. The vendor may not care what is done with those temperature values.
So, an important thing to look for in contracts, and software contracts in particular, is what do they say about who owns the data and what restrictions do they place on the ways in which the data can be used? This can vary depending on the parties involved and range from favourable to the technology vendor, e.g., the technology vendor owns data, and the customer has only a license for limited use, to less favourable to the end user: the user owns all data. Whichever side of the relationship your organization is on, it is important to consider whether these provisions suit your needs.
Common terms are limitations on external use; limitations of use to specific purposes; prohibitions on commercial use; and prohibitions on creating competing or derivative products. It is therefore important to consider where and how the AI tool will be offered and used. Also be aware that the usage and purposes of the AI tool may be very different from the uses originally envisioned for the underlying data, which may be a potential source of problems.
In relation to our example, a prohibition on creating derivative products of images, or limitation to internal use may not pose an issue for clinical use of the images by physicians and may not trouble a hospital contracting with an imaging supplier. However, those limitations could be much more problematic if the data is to be used to create a competing AI product.
There are also rights that apply not to the data itself, but to the products derived from the data or associated IP rights.
Often contracts relating to data may refer to derivative works or improvements and either prohibit development of such works or define ownership. In the context of our example, you could imagine a contract for providing clinical imaging systems or software that provide derivative products, improvements or IP produced using the data is to be owned by the vendor. When the contract is negotiated, derivative products may be envisioned, e.g., as covering things such as producing enhanced versions of images, and not independent software tools. However, if the images are used to build a software tool that uses AI, the AI model itself could be considered a derivative work and may be subject to limitations or ownership provisions. In the case of AI, the algorithm and the tool that is ultimately produced are derived from the data, and in a sense may be seen as more tightly coupled to the data it uses than in some other more traditional examples of software tools.
3. How to protect against data use issues in future AI projects
Question: There are various pitfalls and risks that can stem from contracts that aren’t ‘forward-looking’ or don’t strongly consider future uses of data. What are some of the steps that in-house counsel can take to protect against improper data use and avoid these ownership issues for future AI projects?
(Patrick Roszell): Taking a broad view of how data might ultimately be used within your organization is helpful. It’s important to recognize the data itself as an asset and securing your freedom to use it as it fits your organization down the road is worthwhile.
First, organizations are more aware of AI and the associated concerns with data. The awareness is more than even a few years ago and is an excellent first step to preventing issues with data use.
Ultimately, you want to secure broad rights to data, and avoid significant restrictions. This tends to be more easily done at the outset of a relationship, as opposed to later, when new applications have been created and value crystallized.
Operational questions may also arise. You may need to keep track of the data you have within your organization, which data is subject to the rights of others, and which isn’t; and how to maintain separation between the two (how to maintain which is which). If data is restricted, maintaining a separation, and not mixing it with less encumbered data will be helpful to prevent data issues in the future.
(Alice): Organizations should consider ensuring there is a broad right to de-identify information (ideally, do not restrict language and say will de-identify, for example, “for safety purposes” only).
(Neil Padgett): Training a machine explicitly to do its job In most cases (e.g., to some engineers building systems), AI algorithms are a black box: training data is plugged in and a model can make predictions or recommendations when given future data inputs. Generally, algorithms do this by inferring relationships between input training data and intended outputs. However, the basis for the resulting model is not explicit or is not easily understood by a human operator and this can raise some issues.
For example, generalization – how well does the model perform for previously-unseen input data? Error in generalization can creep in many ways. A model may not perform well because of a lack of diversity in a training set: a model trained on
images of different breeds of dogs probably won’t perform when faced with a picture of a cat. Or, even if there was an appropriate training set, the training of the model may have been done in a way that resulted in what is called overfitting,
where the model may perform very well on the training data but can perform poorly on previously unseen input data.
Beyond issues related to the performance of the model, a model that performs well in terms of achieving the outputs it was trained for based on the inputs it was trained with may still be considered flawed from a legal perspective because that training
data might have, in effect, have trained the model to reproduce bias in the input data set. Consider, for example, a training set of mortgage loan underwriting decisions. There's a risk that the model trained on that data might reproduce
the bias of the human underwriters, or in the previous example, if there were biases in the radiologist's judgment, and you use that data to build your training, set, then your AI tool might reproduce the same radiologist's bias in its outputs.
(Alice Tseng): How bias can impact results generated by AI tools
A goal of AI in healthcare is to ensure medical treatment is as objective and accurate as possible. Difficult to achieve this if the data used to train the AI model is biased. An AI model can only be as good as the data it is fed. Different types
of AI bias include racial bias, gender bias, bias in linguistics, socioeconomic and geographic bias, for example.
An example, from a non-healthcare perspective, involves Amazon. Amazon attempted to create its own recruiting tool (i.e., the AI tool would identify top candidates from a pool of resumes). The training data set was resumes sent to Amazon
in the last 10 years, but since these resumes were disproportionately male (given the prevalence of men in the tech industry generally), the AI tool was found to disadvantage women. For example, resumes saying the candidate studied at women-only
universities (e.g., Wellesley), or was captain of women’s chess club were disadvantaged. [Please see the video recording for Alice’s discussion of 3 other examples of bias.]
The proposed AI and Data Act states that organizations who are responsible for AI systems are required to determine if it is a“high-impact system” and if so, must establish measures to identify, assess and mitigate the risk of
harm and biased output (which refers to situations when individuals could be adversely affected, without justification, from a human rights perspective).
What are ways to minimize bias? Be aware of the potential for bias and try to ensure data sets used are as complete as possible and assumptions are as accurate as possible (e.g., health costs incurred by an individual in the past being used as a proxy
for their medical need, as described in the first example in the video recording). AI is in its infancy and hopefully progress will be made on the issue of bias as AI is better understood. Increasing transparency will help to address/correct AI
bias (e.g., in the first example in video recording, the flawed assumption was only realized because the company shared its algorithm with researchers). Finally, increased diversity among those who help develop AI tools will increase awareness
of issues with bias in AI since those negatively impacted are often able to identify problems related to bias earliest.
(Neil Padgett): Consequences of relying on a biased model
The consequences of relying on a biased model will depend on how much risk is associated with the model, reproducing the bias or having other errors.
Determine what you are using the AI for. Is it diverse, or is it from a limited set of circumstances? Is the AI highlighting a diagnosis? Is there a risk of bias in the input set? If so, what types and steps have been taken to mitigate? How often
will you update the model once the AI is deployed? These are all things that can measure the quality of the model and also translates into the risk. A higher-quality model may have a lower risk.
You can also ask questions about the model. Will it get things wrong? What happens if it does? Is there liability or reputational risk? In some circumstances, there is a higher impact if the AI makes certain types of errors. In other circumstances,
even if the AI makes an error, the impact may be lower.
(Alice Tseng): Opacity with respect to AI can be considered in two ways
- lack of awareness that AI is being used to make a decision; and
- if we know AI is being used, how did the decision arise, what data and algorithm were used to make that decision?
Currently, there is no law in Canada requiring transparency when an AI system is being used, but there are proposed laws to change that. Under both the proposed Consumer Privacy Protection Act and the AI and Data Act, there are transparency
As part of the principle of Openness and Transparency, the proposed Consumer Privacy Protection Act imposes disclosure obligations regarding an organization’s use of any “automated decision system” to make predictions,
recommendations or decisions about an individual which could have a “significant impact” on them, similar to how privacy legislation already requires organizations to disclose information about how they handle an individual’s
personal information (e.g., privacy policies). The proposed Act also allows individuals to make requests to organizations for an explanation of predictions, recommendations or decisions made about them using an “automated decision system”
which could have a significant impact on them, similar to how current privacy law allows individuals to make requests to organizations regarding how their personal information has been handled.
Under the AI and Data Act, persons who manage the operation of (as well as make available for use) a “high-impact system” (not defined) must publish on a publicly available website certain information, including how the system
is used, as well as the types of content the system generates and the decisions, recommendations or predictions it makes.
(Alice Tseng): What kinds of liability might an organization face associated with the use of AI?
Under the proposed AI and Data Act, liability can be quite high - $10 million or 3% of global revenue in penalties. There are three types of provisions in the proposed AI and Data Act which can result in an even more severe penalty
(e.g., $25 million or 5% of global revenue in penalties). Same as GDPR in linking fine to a percentage of global revenue.
(Alice Tseng): What other types of liability exist with AI?
Misdiagnosis or delayed diagnosis (e.g., through poor triaging of patients) can attract liability. If the AI tool prioritizes patients in a discriminatory manner, there can be a liability. Professional liability outside of the medical context is also
possible, such as a lawyer who only uses an AI tool to predict litigation success or for due diligence purposes but the AI tool is not sufficiently accurate. AI tools in consumer products like self-driving cars may attract liability such as property
damage or personal injury in the event of accident. Liability may also be found in the form of economic loss (e.g., an insurance company relying on an AI tool provides incorrect quotes) and reputational loss (e.g., Microsoft chatbot tweeted racist
and misogynistic comments).
Question: When there is a liability, who is responsible? Identifying and allocating legal liability in AI faces many challenges
Second, complications of causality due to various contributors typically involved in the creation and operation of an AI system, including data providers, developers, programmers, users, and the AI system itself. The multi-stakeholder problem. Difficult
to determine which stakeholder or stakeholders to hold responsible.
(Alice Tseng) Liability arising from failure to follow best practices and industry standards
Legal liability will be influenced by best practices and industry standards. Was there informed consent by patients for use of the AI? If the use of AI is standard of care, is the standard to use AI to assist MDs with medical diagnosis or to replace
MDs? Liability ironically may sometimes be greater for MD if MD maintains discretion in the decision-making process. Can or should an MD go against an AI recommendation, including ignoring a recommendation that a patient has lung cancer and should
be treated? Conversely, if the use of AI in the practice of medicine becomes standard of care, MDs could be liable if they don’t use AI.
One reason why managing AI risk is so important – there are few insurance products which currently cover AI risk since it’s challenging for insurers to assess risk given the lack of experience with loss due to AI. Similar to cybersecurity
insurance eons ago.
4. Managing IP Rights in AI
Question: What kinds of IP issues can subsist and arise in AI?
(Patrick Roszell): Patent rights on the underlying technology, data and methods are key
As with any kind of technology, patent rights are important. One thing that is unique in the context of AI is that there are a number of different aspects of a particular AI system that could be protected with patents.
One can think about AI systems in terms of three stages or sets of functions: data acquisition, data processing and data output. In a given case, there could be patentable subject matter in any one of those sets of functions or in a combination of those sets of functions. The development of AI tools is generally premised on assembling high-quality, ideally large data sets. In some cases, there may be inventive subject matter in how the data sets are assembled.
Sometimes, a new type of data may be what enables the use of AI – like a new type of imaging. Other times, existing data has to be transformed in order to enable effective AI treatment. For example, images may be translated into histograms showing
pixel brightness data, or a signal trace may be converted to frequency domain.
Data processing might mean advances in how the actual model is constructed. For example, the algorithms used to produce the model may be new, or there may be inventiveness in the features defined to characterize the data set. Output might mean advances
to broader products or processes that result from the use of AI. For example, there may be artifacts of AI use in a product, or the steps of a process may be changed.
There are lots of ways AI or AI-adjacent technology can be protected with patents and in turn, there are also potentially many aspects that could be subject to third-party patent rights. Companies that are working on or with AI tend to be aware that
they are breaking new ground and tend to be working to patent what they can.
(Neil Padgett): Challenges to protecting AI with patents
In many countries, patents cannot be obtained for mere abstract ideas, although there are subtle and not-so-subtle differences in the way we draw the lines.
In Canada, the law requires that a patent be directed to a practical application of an abstract idea, rather than an abstract idea itself. There are several ways of establishing a practical application in a given circumstance. For example, one way
to establish physicality – a discernible physical effect or change, is by describing a given application of AI detail as to how it works, how it is used, how the data is retrieved, and the problem you're using it to solve and how each of
these may interface with the physical world.
Patent Offices generally may view that simply applying standard machine learning techniques to a problem may not be enough for a patent. What this means is that a patent application related to machine learning is going to need to include a lot of
good detail in order to satisfy the traditional requirements of patentability like non-obviousness and enablement. How do you generate the training data set? Is there any nuance in that? What about how you generate the trained model? How is the
model used? How does it fit into the overall system that provides the practical application? This means patent applications for machine learning-related inventions cannot be prepared “on the cheap”: you will need to make detailed disclosure
and this costs in terms of both time and agency fees.
This also means you cannot necessarily hold back when you are filing a patent application related to machine learning and so you may need to consider whether you should be filing a patent application and making detailed disclosure, or would it be
better to maintain the invention as a trade secret?
(Neil Padgett): Considerations for deciding whether to patent or keep a trade secret
Firstly, a threshold question may be whether your system is going to be likely to be amenable to patenting. If not, then trade secret may be the best option you’ve got.
Second, even if you could get a patent, you need to ask yourself, how broad a scope of protection will you be able to obtain? If you get a patent, will you be able to determine if competitors infringe?
Making this kind of assessment generally requires a detailed understanding of the invention and how it relates to the previous state of the art, from data acquisition all the way to how AI recommendations are used as well as your business objectives
and the competitive landscape.
5. Conclusion: Advice to in-house counsel dealing with IP and compliance risks in AI and a digital world
There are many legal and compliance issues and IP risks to be considered when developing an AI tool and this article has touched on many of the common themes involved, including:
- Understand the implications of developing and implementing an AI tool
- Understand the use of data from a privacy perspective and whether you have the right to use the data (e.g., in a training data set)
- Keep track of the personal information you collect and the privacy policies/consents you have
- Consider and address potential bias issues in the proposed data set and algorithm
- Look ahead to issues which may arise in future AI projects
It’s critical for in-house counsel to engage proactively with their business teams and ensure they are aware of the risks that could apply. It’s also important for in-house teams to seek outside advice from experts in technology and IP
law, regulatory compliance and data privacy at the outset of new initiatives involving digital technologies and AI.
The preceding is intended as a timely update on Canadian intellectual property and technology law. The content is informational only and does not constitute legal or professional advice. To obtain such advice, please communicate with our offices directly.
1. 6(5) This Act does not apply in respect of personal information that has been anonymized.