December 29, 2019 Data Science 0

Brief recap of the Open Data Science Conference – San Francisco, October 2019

On AI ROI: the questions you need to be asking Kerstin Frailey Metis

Success is unpredictable in AI – feasibility is often unknown before a project has begun. Projects are esoteric – require highly specialized training. Application is new – methods to track ROI haven’t been adjusted for AI; managing AI is a challenge

Performance is volatile – and there’s an iterative lifecycle. Feedback loops and response to AI intervention speed up the expiration of data and it’s dependent models

Targets are fuzzy – executives don’t have experience with AI projects so they can’t set clear expectations. Data scientists inevitably miss the intangible goals

Data science teams strive to achieve good data science; which doesn’t always translate to achieving business goals

Data scientist are not informed of the strategy of the business

AI ROI is urgent – companies continue to double down on investments even without seeing ROI on their investments. Investments will dry up

Planning – what type of ROI should we expect? Some projects are explorations for gaining knowledge, some are recommendations or informing decisions, and some are automation interventions; e.g. fraud prevention project – to interfere and not allow fraudulent transactions from going through. Identifying fraudulent transactions isn’t impactful – reducing the costs due to fraud is the desired impact. Sanity check – is this pure automation play? Could descriptive statistics do the job without AI? Does it just seem fun but promises no clear impact? Is it more cost-effective than pre-exiting or vendor-based solutions? ensure that the project is strategically positioned and if the system can be leveraged elsewhere

Development  – what kind of errors do we prefer, when we compare models and presenting the models to the business, have to show we’ve thought about this. What kind of volatility can we handle? Volatility in performance is expected; ask your stakeholders what they can accept. Benchmark- simulate, use ghost mode and control group to see how performance compares. Sanity check – is the solution user friendly? An the solution be scaled to address the entirety of the problem? Are we building feedback loops? Will the data become stale quickly? Will we need to update this model often?

Deployment – is it performing close to expectations – are business metrics moving as expected in response to the data science metrics? Is variability close to expectations?

Governance – Sanity check – are we monitoring feedback loops, are we iterating to the point of overoptimizing, are we duct-taping updates and iterations?

Peter Welinder OpenAILearning Robot Dexterity

In order to interact with the world; we have to make contact with it. Open AI wants to combine learning and manipulation to allow robots to carry out useful tasks. They decided to test a rubrics cube – they can do this in a fraction of a second

Past – high robotics expertise

Future – all you need is learning

  1. Deep reinforcement learning – learning by trial and error; like teaching a dog. Drawback is that it takes a long time to get results
  2. Simulation to reality
  3. Cool results

Harry Glaser Sisense – sources of bias: strategies for tackling inherent bias in AI

AI judge developed by UCL computer scientists

AI could identify gang crimes and ignite an ethical firestorm

False positives are a societal issue here; a machine treats a specific geography as “gang related” and results in severe punishments

Another example is using facial recognition during TSA security check. This was an issue for Asian race; you have to build a team that represents the wider world that you plan to apply your model on; the broader and more diverse team you can get – the better the outcomes of the models.

AI unchallenged runs a strong risk of delivering immoral outcomes. It is your job and responsibility as a data professional to use your skills to be the moral compass of your organization and make it right.

Turning data teams into superheroes – Sisense – think about human outcomes

Google employees quit over controversial pentagon work; targeting ads vs targeting drone strikes – got to the right outcome because the engineers that developed the system took ownership and knew the differences between outcomes

How can you incorporate data ethically?

Holistic metrics – grade yourself on metrics you think match well with societal outcomes

Representative & diverse teams – key to building models with a positive outcome on society

Check your sources – get diverse sources and consider the bias in your data

Who do you report to?

Data professionals reporting directly to sales lean towards the biases of sales objectives. Centralized data teams are more likely to remove bias.

You need a Chief Data Office that thinks holistically about the ethical use of data; CDO is conscious of the organization.

You are the conscience of AI – this is your responsibility

Building AI Products: delivery vs. discovery

Companies face challenges with getting data science to work for them;

Information technology – integrate deploy and mange finished products in production

Software engineering design and code new products using best practices

Data engineering – build data pipelines that collect, organize and validate data

Data science- discover the unknown patterns in data and algorithms that add business value

Data science is different – cross functional engineering, product, marketing, finance – must work autonomous – separate from the traditional engineering product lifecycle, self-organizing and self-managing. It’s also experimental; – form a hypothesis, analyze data, make predictions, run back tests, a/b testing. It’s also self-sustaining – not a cost center, generates a revenue

Problem is companies hoard data; data stores are a cost sink.

Rapid prototyping is key for data science; back of the envelop calculations; simple experiments; don’t make plans – make tests. Repeat until it works.

Kirk Borne Adapting Machine Learning Algorithms to Novel Use Cases

It’s not about telling the business about the coolness of your algorithm – it’s about connecting it to their needs and using storytelling to do this

Innovations are inspired by data, informed by data, enabled by data, and create value from data

Confucius says “study your past to know your future” – machine learning

Travel sites raise prices for mac users because they assume they make more money and would be willing top pay more

Most important thing in your data is metadata

Hilary mason – getting specific about algorithm bias

Facial recognition products fail to do a good job with darker shades of skin; 99.7 white male and 65.3% darker female.

Sources of bias enter at different stages; machine learning can amplify bias

People are more likely to assume algorithms are objective or error free -even if they’re given the option of a human override

Algorithms are more likely to be implemented with no appeals process in place

Algorithms are often used at scale

The privileged are processed by people; the poor are processed by algorithms (Cathy O’Neil)


Leave a Reply

Your email address will not be published. Required fields are marked *