6 best practices to reduce bias in AI

Collection data selection, input from business experts, correction method etc. An overview of the machine learning or deep learning model drift reduction method.

Information science experts know this. Bias is inherent in any manipulation of statistical data. Like human cognitive bias, bias in AI exists in various forms and can have more or less significant effects. High profile case forces. We spontaneously think of Amazon’s hiring robots that discriminate against women until it closes in 2018. Thanks to the e-commerce giant’s hiring history of training that primarily recruits men, AI assumed it would give preference to male candidates. AI can also create unfortunate interactions. Reliable AI scientific leadership in quantummetry, Gregor Martinon noted in the case of patients in a hospital emergency room. An AI performs this preference instead of a test physician. “This attempt at automation has yielded counter-intuitive results, he observed.” The history of AI counseling suggests that asthma patients with pneumonia may wait longer.

In addition to these easily identifiable deviations, there are, according to experts, more “subtle” but profound biases. “If you type ‘CEO’ into a Google image, 85% of the photos are of men. Search engines can hide the fact that men are actually over-represented among senior executives.” However, this only creates a fossil of a socially unacceptable situation. ” Says Gregor Martinon.

1. Be aware of the existence of multiple biases

Assessment bias, exclusion, consolidation, presentation, confirmation, attribution … According to a post published by KPMG France, some 180 biases that have changed our judgment have been identified to date. The biases that arise from data collection and selection, then during the training of the algorithm and finally, once it is in production, in the interpretation of the results. The first step in becoming a victim of prejudice is to become fully aware of their existence and their diversity. Gregor Martinon identifies three large families diagonally. Historical bias that reproduces past (Amazon) bias, causal bias that establishes unfortunate relationships without causal reasoning (asthmatics), representation bias due to lack of diversity in the data set (Google image).

2. Be careful about data acquisition

Input data is the first source of bias. Therefore, special care must be taken while collecting information. Gregor Martinon recalls, “The dataset that trains the model must represent the target population so as not to forget to include ethnic minorities in an equal part of the representation of the whole of humanity.” “Otherwise, we perpetuate existing biases.” A simple answer would be to say “the more data there is, the more accurate the signal will be” but it will be of limited range in his eyes. “In this case, the definition of the image would actually be better but the perspective would remain the same.”

So according to him, the acquisition process needs to have a 360 perspective. How was the information obtained? Are they representative? “Today, most image and text processing models are trained in poorly cleaned datasets that are sucked from the Internet, with all the bias in them,” lamented Gregor Martinon. For experts, it is a question of focusing on sensitive variables such as gender, ethnic origin or habitat. “Based on a level of diploma or place of residence, it is unfortunately possible to assume an ethnic origin and recommend product or job offers according to a specific sociotype. LinkedIn has, for example, a location proxy that differentiates its job offerings.”

3. Measure the effect of bias

Once we have representative data, we can start modeling. Defining a matrix to measure bias and determine any deviation is appropriate at this stage.

There are interpretive methods such as the ICE (personal conditional expectation) of the graphical approach and the PDP (partial dependency plot) which proposes to visually establish the relationship between a variable and an interpreted event. The holistic approach LIME (for local interpretable model-agnostic interpretations) and SHAP (Shapley additive interpretation) suggests the effect on results when one or more of these variables are modified. A post from the firm Avisia helps to choose the best method. MLOps platforms such as French Dataiku or American DataRobot integrate these methods. Hyperscalers offer their own bias detection solutions as a cloud service. This is the case with Sagemaker Clarify for Amazon Web Services. Microsoft and Facebook offer open source tools, including InterpretML and Fairness Flow, respectively.

Once biases have been identified, which method should be adopted? Gregor Martinon takes the example of a recruitment strategy that aims to recruit as many women as men. “To overcome a bias in representation, the contribution of female candidates to achieving a track equilibrium may be overweight,” the expert underlined.

4. Choose explanatory

The victim of bias refers to the concept of interpretability of AI, which is one of the seven requirements defined by the European Commission for achieving so-called trusted AI. Gregor Martinon believes that “all the predictions of an AI must be able to be fair.” A bank is required, for example, to explain the criteria for granting or denying bank credit. “A counselor must be able to explain to Miss Michu that credit rejection is related to her age, her debt level or her inadequate income.”

In this case of Explainable AI or X-AI (Explained AI), the choice of algorithm is decisive. Offers easily interpretable results based on business rules, decision trees, linear regression or expert systems. However, they are not suitable for complex uses such as image recognition. Neural networks then occupy but at the cost of high opacity.

To avoid the “black box” effect, one needs to find a balance between performance and transparency. “Artificial intelligence, stay in control of your future!” In a white paper titled, ESN Business and Decision proposes to select algorithms from a comparative table based on specific interpretations and concepts of risk. Bias.

5. Involve business experts at all levels

The best way to reduce bias is to constantly control human exercise on the model. KPMG France’s partner, Romain Lamot, emphasized the importance of engaging end users. “Business experts must be in the loop throughout the life cycle of an AI project, from the definition of the field of use and the design of the model to its observation over time,” he explained.

Beyond their business acumen, these non-experts in AI also have the advantage of taking a more perspective on the results obtained. This involvement is all the more important as AI’s concept of democratization and civic development is gaining ground. “Operators will adopt pre-built AI models”, observes Frederick Como “It assumes a primary function of sensitivity and learning so that they can develop a critical attitude.”

Gregor Martinon expands this awareness effort for end users. “A bank adviser, doctor or police officer assisted by an AI must be aware of its limitations and be able to question its arbitration.

6. Raise awareness among teams of data scientists

Let’s start with the first one in the search for bias: the data science team. “Ethical elements are not always integrated with engineering school curricula and AI specialists are occasionally trained on the job,” Gregor Martinon lamented. “They were educated above all in the culture of performance.” In addition, he raised the issue of representation in the data science team. The robot portrait of a data scientist is of a white man between the ages of 25 and 35. As a result, some questions about equality or the inclusion of ethnic minorities may get in the way. “Data or usage can be harmful to some and traumatic to others,” the expert points out.

However, initiatives to raise awareness have multiplied. Associations such as Women in AI or Black in AI are trying to diversify the data science community. Impact AI Collective, which brings together companies, ESN, specialized players and start-ups, is committed to ethical and responsible artificial intelligence. We can also cite the data for a new label, good association and its Hippocratic oath of Data Scientist or Labelia, responsible and trusted AI.

Leave a Comment