DATAcated Conference – May 2021 Summaries
Thank you to the DATAcated Volunteers for providing these amazing summaries of the DATAcated Conference sessions: industry edition (May 18-19, 2021).
FINANCIAL SERVICES: Written by Maisie Tan
DATA INTENSIVE APPLICATIONS IN FINANCIAL SERVICES
By: Jordan Tigani
Data applications in financial services are very diverse. Aside from algorithmic trading, we also look at providing rich personalized customer experiences, real-time fraud detection, and dynamic risk analytics – all of which will require a database capable of supporting the complexity, quantity, and rate of change. In designing a data-intensive application, Jordan shares three things that we need to consider: (1) fast ingestion, (2) low latency, and (3) high concurrency.
CRAWL, WALK, RUN: DEFINING A STRATEGIC ROADMAP FROM MESSY DATA TO AI & BEYOND
By: Tiffany Perkins-Munn Ph.D.
Organizations aspire to incorporate AI into their core functions, but the reality is 80% of a data scientists’ time is spent cleaning data. To reach AI and beyond, Dr. Tiffany shares a strategic roadmap to become data-driven. We start with educating stakeholders on the value of data. We then collect and clean data and stitch everything together to enhance stories and generate insights. She acknowledges that the start is a very tedious process. One way to get organizational buy-in and to keep the momentum high is through sharing of use cases.
SINGLE SOURCE OF TRUTH: ASPIRATIONAL OR OUTDATED (OR BOTH)?
By: Deborah Williams
We often hear about the need to establish a Single Source of Truth. This morning, Debbie counters that what we are really trying to achieve is AUTHENTICITY. She suggests that instead of enforcing a tightly controlled supply chain, we need to allow for usability and flexibility for downstream users. To achieve this, we can start by establishing accountability and investing in lineage and catalog tools. These are especially critical to meet regulatory standards in a financial environment, and to ensure that our systems are running and processing transactions 24/7.
THEY’RE BANKING ON US! HOW NEOBANKS ARE GAINING MARKET SHARE
By: Laura McDonald
Compared to traditional banks, neobanks are digitally native by nature. Huge amounts of data are tracked for reasons such as gaining a niche market segment, developing a wider portfolio, continued customer acquisition, and even making the shift from growth to profitability. Laura discusses the popularity of ESG frameworks among millennials, wherein potential customers are drawn because of shared values. She expresses that in time, we will see larger corporations partnering with neobanks to bring about a digitally focused and innovative mindset into current structures.
Healthcare: Written by Alice Giontella
Brilliant guests involved in the healthcare system, have shared their experiences with the DATAcated community.
- “To be a good chief data officer you need to understand well the market, your company, and the technology”.
The healthcare section of the DATAcated conference started with Charles Holive. His talk has been about analytics monetization and business growth in healthcare
Every artificial intelligent analytics or data initiative should have an accurate business outcome behind, that focuses on stakeholder’s business metrics.
Data analytics is not only about the data, but how to make impactful information and knowledge from it. Monetization has been born with the need for industries/systems to buy this knowledge to create value, instead of building it internally. In healthcare, an optimization of the workflow that prioritizes patient care based on clinical data, would be important not only for the clinical outcome itself but for the financial outcome associated to it.
- “Does M.D. stands for more data?”
The floor then went to Prof. Shafi Ahmed and John Nosta that started an interesting discussion on how data has affected clinical practice. Nowadays many clinical parameters are digitized, which means they could available everywhere.
Digitization would lead to demonetization on one side, and potentially to the democratization of healthcare on the other side.
Digitization would lead to the development of a new way to approach clinical practice. Thinking for example about a pill with a mini camera on, that might substitute invasive endoscopic analysis; or smartwatches or small devices to monitor blood pressure, heart rate, etc…accessible for everybody online.
Healthcare is shifting towards a personalize which create a more collaborative environment with the patient.
Even if it is true that there are plenty of new technologies and applications due to usage of digitization, is it also true that the healthcare system is still not ready to introduce them. The introduction of telemedicine is still a slow ongoing process, but for sure we are on the right way.
- Analytics in Pharmaceuticals
The last speaker of the healthcare section, John K Thomson, brought us to the world of pharmaceutical companies. We could have an interesting overview of how big strides have been made in healthcare and pharmaceutical companies, as the usage of computational and synthetic data increased.
Nowadays, in drug discovery, it is possible to identify new drugs only using computational analysis and skipping the wet lab part, going forward to the trial phase.
The creation of human synthetic data has allowed simulating a response to a drug in the different permutation of the human condition. Using clinical trials such an approach would have taken years and years!
Energy: Written Jerome Wittersheim
Sweta Kotha – Data Scientist at TIBCO
Renewed Energy: Innovating with Data and ML in the Energy Industry
TIBCO are at the forefront of oil and gas data. Their proven AI solutions have helped track and analyze drilling data in real time, enabling TIBCO to successfully engage with a wide range of organizations in the energy industry.
Thanks to TIBCO’s ability to provide solutions to the energy industry, the company is now managing large volumes of data from wind and solar farms. This data, combined with weather and other geo data helps predict future energy power production needs.
The demonstration focused on Texas and looked at the last 10 years of weather data as well as wind turbine and solar panel data across a range of metrics.
Data science is very effective at picking up on seasonal differences even though the variety and, at times, the messiness of the data available can be challenging. Cross-referencing and interpolating wind farm data, combined with weather stations is very powerful.
Data is then displayed on a map, with the end goal being the development of a comprehensive dashboard.
Site meta data is available at different granularities, highlighting areas of interest and showing different facets of the data.
Wind variables provide different metrics with an emphasis on geo and time data to help track and compare weather conditions across different months and maximize consistency.
It is also possible to look at a season’s output against previous years and data science models can work on long term/seasonal predictions.
Future predictions using TIBCO Data Science can be based on a range of models, including linear regression. TIBCO Data Science allows you to compare models against each other’s.
You can also aggregate data from multiple sources. In this case, both wind and solar data are used together and in context.
Results can be constantly refreshed both for time series variables and projections. It is also possible to view many spatial aspects of the data.
A number of questions focused on the security of data and TIBCO works with a range of partners to add different layers of security to their data infrastructure.
Geoffrey Cann – Speaker, Trainer, Author at Geoffrey Cann
Bits, Bytes and Barrels: The digital transformation of Oil and Gas
Geoffrey’s career spans 35 years and he has worked all over the world. Geoffrey also published a book – Bits, Bytes and Barrels: The digital transformation of Oil and Gas.
Geoffrey is passionate about the oil and gas industry and keen to yell words of encouragement about how data can help this industry.
He lives on wild west coast where he can see sea planes from his front door. Planes hide a truth – that the oil and gas industry is built to last.
Although oil and gas runs on data, the relationship between to two is like a relationship with a bad roommate – it includes dangerous features.
There is a saying that the energy industry invented big data but only uses ½ % of the data they have.
This is why the industry is in desperate need for people who know how to use data and there are fortunes to be made.
Some examples of new developments include using drilling cutting photos, used by geologists to work out what is in the sand. Using algorithms to piece these together – and deriving information at the sand grain level from the data while you drill, cuts the analysis time from the traditional 6 weeks waiting time.
It is estimated that there is 2TB of data available for every single well.
Rogue 7 provides another example of a data enabled solution, supporting human operations of pipelines. Using ML models to predict performance 4 hours ahead of time, this helps operators immensely. This means a lot when you can’t easily build a new pipeline and pipeline data was never collected in the past.
In another example, data helped with maintenance of 50K projects every week – cutting the amount of man-hours required to perform similar operations.
The oil and gas industry runs on data – and is in desperate need of people. There is a complex backlog of data, at scale, that has never even been touch.
The big question is how do we keep everything running whilst we transition to greener energies? How do you keep lights on and cars, and flights running?
There is a complete shift in thinking and mindset. The challenge here is not technological.
Kimberly Sorrell, MBA – Director of Information Management at Southern Company
Data & Analytics: Predicting Equipment Health for Key Plant Equipment
The first message is about data literacy. Southern Company created a whole program about it. Culture is also very important, especially as organizations move towards more digitization and utilizing more data.
At Southern Company, there have been a lot of conversations about conditional type maintenance using sensor data. There was an opportunity to do this in 2019 with a specific project and lessons have been learnt since.
How can you leverage data to move to predictive maintenance model? Equipment may go down and maintenance is expensive – there is big equipment and specialist contractors involved.
Southern talked about different data and looked at data available. This led to questions such as “What does quality look like?” or “Is this objective achievable?” and the willingness to “fail forward fast” and look at results to decide how to proceed to the correct path, fast.
This process helps figure out which vendors are doing a good job. It highlights training opportunities or whether some vendors are just doing a better job.
The project was divided into 3 phases. Generally speaking Southern doesn’t let things fail, which makes predicting failure harder – how do you predict failure in this instance? But, can they predict the health of the equipment?
A big part of the process involved educating users/people. Synergy happens when there is an explainable model of why the output is what it is. If people don’t have that understanding then they won’t trust the process.
Phase 2 involved creating a statistical model dashboard.
For example, here there are 4 units and within each unit 4 pumps – a traffic light dashboard highlights issues, and the business was fully involved in determining those risk levels.
If something turns red there is higher risk of failure and the model helped predict a couple of failures successfully. The AI used after trying a few models was a random forest XG boost paradigm.
At a high level, it is working really well and Southern are looking at other pieces of equipment.
Questions revolved around security and potential attacks. Whilst Kimberly is not a security expert she confirmed a lot of conversations about security, both physical and non-physical, were on-going.
Bernard Marr – Founder, CEO, Author at Bernard Marr & Co
Transformation & DATAcation of the Energy Sector
Bernard is passionate about the use of technology /AI / ML / Data to do things that matter – for example in the healthcare and energy industries.
However, because of the energy crisis and the climate crisis we are facing (this could dwarf the pandemic), the energy sector really is key.
How can you use data to tackle that one problem? The current model is unsustainable – it is one of the most important challenges in the world.
The transition will be achieved thanks to those 3 pillars:
- Decarbonization: more renewable energy produced.
- Decentralization: a more efficient distribution of the energy sources available, and how energy is fed back into the grid.
- Digitization/ datafication: use technology to optimize energy production.
Intelligent devices form part of the solution leading to “Energy 4.0”. AI and predictive analytics help predict demand and equipment failure at wind/solar panels.
IoT such as Nest smart home thermostat help users reduce their energy consumption.
Blockchain enables the production of secure smart contracts – this could be transformative. Block Z access transparent and traceable green energy.
Quantum computing, with its sheer power, can help solve the challenge of energy usage.
Digital twins playing out real world IoTs to try and model how energy will be utilized with computers. This could also go all the way to virtual replicas of power plants and grids.
Bernard worked with Shell to devise their digital strategy. He looked at key elements in the organization to improve decision making by using data. This included better data literacy, culture, data quality etc..
From top to bottom, there was not enough data literacy. Shell Partnered with Udemy to create courses tailored to the organization.
They created a data translator role to act as a bridge between science and business.
Shell also now hold regular data hackathons where anyone in the business can come along to discuss their data challenges.
There is a need to have the right data in place before you can create a useful data visualization tool.
Understanding customers is also critical. Shell use AI to monitor and predict consumption of electric cars and to supply the power efficiently, using smart energy grids which work well.
GE, for example, use the I of Energy to work out how energy is flowing.
Smarter operations can be leveraged with data, including on existing infrastructure, with better drilling routes and increased security.
AI and data analytics are also used to analyse long contracts and identify specific paragraphs using natural language processing.
One of the questions related to governance and the need to have a good data governance policy in place and to secure every bit of data. Blockchain can answer some of those challenges. For example, where does your green energy come from and how is it verified?
Tom Moroney – Member of Arria’s Global Senior Advisory Council
Lyndsee Manna – EVP, Business Development and Strategic Partnerships at Arria
The Oil Industry Gamechanger: Connecting data to the decision with NLG
The subjects of data literacy and digital twins raised in the previous talk are very close to Lyndsee and Tom.
Their talk builds on those subjects and is about the enablement of the articulate enterprise and the articulate oil and gas field, using technology to accelerate data understanding.
The impact of data in oil and gas has evolved, from surveillance to monitoring, supply chain and finance….
What has changed in the last 10 years? The industry is awash with data both for inflow and outflow.
There is a huge amount of data and the volume, velocity and accuracy of that data have increased exponentially. Everyone has a device. Data literacy has evolved as a key enabler and the thought process has changed and improved.
So, now what?
Centralization of data storage in the cloud is key. Having sound data models and integration helps drive sense-making.
Moving people along the data literacy curve is also critical. Each discipline has its data geo, design, etc… and understands the underlying model – how it is pieced together to make sense. This is critical to drive a data model strategy and how you integrate and consume data.
Make smarter decisions, with speed, accuracy and consistency. The maturity of decision making and the value loop set the winners apart. They are the ones that can drive value out of data. Closing the value loop is equally important. What is that asset I am trying to optimise (eg. Well / Oil field) and what data needs to be collected to inform decisions – the data plan? Assessing the options and trade-offs going through the loop is how you drive value.
How much value you can derive and how big the loop will determine your level of maturity. You build a bigger picture that moves you from what to why and capture value to determine the best outcome possible.
Moving from reporting to forecasting and using ML/AI to drive up the context of value is therefore key. Articulate the how, why and what next, to explain the meaning for the business.
Tom is a pioneer in the use of Natural Language in the oil and gas industry. The discipline can bridge the gap between centralised data and action taking:
What is the sophistication of the intelligence you want to drive? An oil and gas field that speaks to your users? There are huge amounts of wastage. It is estimated that 60% of an engineer’s time is wasted looking for the right data and NLP can really help step an organisation forward in that respect.
How you contextualising the data using red/yellow alerts for example to build a trend and narrative can have transformative effects.
Business units have to know what the value drivers are – what is the information needed to address those decisions. But capturing too many variables is not necessary as there are too many repetitions. Instead you should store and concentrate on the minimum amount necessary to fulfil a specific business goal.
Other considerations include data generation, which has to be aggregated and integrated. You need to be well trained and versed as not all data is equal. Data should be structured and you need an understanding of the source of the data.
A rigorous data strategy for the company that can be executed at the business unit level, along with real time data and an understanding of lag and level of accuracy are critical in building proper data definitions.
In other words “Garbage in – garbage out.”
RETAIL: Written by Jessica Uwoghiren
Creating Effective “Next Best Offers” in Retail
Speaker: Tom Davenport
- Next Best Offers (NBOs) are targeted offers or proposed actions based on analysis of your customer’s shopping patterns i.e., shopping history, purchasing context & product attributes
- Offers could range from services to information to new products.
- These offers are usually determined using machine learning and/or business rules and may involve a human filter.
- Steps for creating effective NBOs include: Strategy Design, Know Your Customer (KYC), Know Your Offer (KYO), Know the Purchase Context, Analysis & Execution & Feedback/Adaptation.
- On Strategy design, the team needs to have a defined direction and figure out what variables drive the predictions they intend to make. On KYC, demographics and psychographics play a huge role. On KYO, you need to understand the attributes of each offer and classify your offers/products to allow for effective predictions
- In terms of Purchase context, you need to understand the customer’s behaviors. In Analysis & Execution, your predictive models are created using ML and integrated into your existing platform. Feedback and Adaptation is where ML Ops play a huge role to ensure your models are being updated.
2300+ Outlets, 30+ Brands, 20+ Data Sources, One Analytics Platform
Speakers: Jon Steenkamp & Pete Vomocil
- TFG has been using Pyramid Analytics platform since 2018 and this session took us through the journey of migrating disparate data sources into one robust platform. TFG has over 29 brands in continents such as Africa, Europe & Australia with ~3200 stores in Africa alone. It also has 4 distribution centers in South Africa and 4 Manufacturing plants as at the time of this session.
- TFG was in search of better performance for their analytics tech stack as they were heavily reliant on Microsoft Excel and it would typically take an analyst 3.5 hours every Monday morning to finish a report.
- With Pyramid platform, that story is different now. In a matter of seconds, they can now feed all the data sources from their outlets and facilities (foot traffic, ERP, Point-of-Sales, HR, E-Commerce data etc.) into one platform and pull reports on a case basis.
- Consistency in the data sources was a big issue for them which is something they constantly improve on using data governance strategies.
- TFG also has a digital boardroom which gives each person a view of their department’s metrics.
- On user adoption for the Pyramid platform, Jon says “You do not try to convince people, you try to help them. When they see that it is better for them, they keep using it.”
- In addition, owing to the number of acquisitions TFG has, Pyramid platform helps them to integrate new data sources and roll-out quickly.
Moving from Supplier to Partner: Analyzing Retail Sales Data
Speaker: Kate Herman, Senior Director of Advanced Analytics at the Master Lock Company
- Master Lock supplies thousands of retail stores with their range of products Kate spoke about how their Analytics team uses data to better understand their end-customers. Being a manufacturer/supplier, it is not easy to get access to end-customers as retail stores serve as the middlemen.
- “Location is important”, Kate said. She provided a case-study of how her team successfully identified substitution items especially during COVID-19. The team developed a hypothesis on where demand will continue to grow based on real estate market.
- Hypothesis was that demand for Master lock products will be higher in areas where there is a real estate boom
- Data was used to develop the testing strategy
- To pick test stores and/or target customers, they needed to partner with their retailers
- To end the discussion, Kate mentioned that to move from supplier to partner with your retailers, communication is very vital, and data is only one piece of the puzzle. She said, “You need to show that you are willing to go the extra mile in uncovering insights from the data you have, and the retail stores will be more willing to partner with you for success.”
SPORTS: Written by Naman Jain
Hyoun Park – If you have historic data then through predictive analysis you can get good results. He talked about the 5 key sports analytics lessons to bring to work.
- Find a Right Champion – A right person that works with data of players as well as teams to get good results
- Choose or Create Appropriate Granularity – When outcomes are sparse then Track the processes like how quickly the players respond or move, their situation and health.
- Remove Extraneous details – if you have millions of data points then just translate it to the data that matters (or relevant) to understand/ analyse it in a better way.
- Stand by tough decisions, Even if they break the Status Quo – Don’t worry about failures, Just Go for it. Take Data Analytics in consideration and Trust it. You will get good results.
- When the data gets better, Change your approach- Don’t just measure Speed, Measure spin also. Change your Analytics way when you have additional data.
Ken Jee – Data in sports is an Uneven Playing field. He talked about the importance of considering the analytics and data in sports. Organisations are growing after investing in analytics. Perfect example he gave – How MLB teams are taking advantage of it.
But why there is UNEVENNESS in Sports Analytics – Because of the –
- Privacy and Secretive Nature as Teams are reinventing the analytics from ground because it is not shared by others.
- Uneven Access to the data for the teams.
- As limited access of data affects the job market. This Lack of Access to the talent.
Sports analytics will give pretty marginal returns as well as will improve player’s performance. Sports analytics won’t show immediate results. Organizations need to invest in it.
Neda Tabatabaie – She talked about the Data that is required /essential in the Business of Sports. Business Intelligence plays a key role in advancing the sports industry.
She discussed about Typical BI Team roles : –
CMR(Customer Management Relationship), Sponsorship, Pricing, Analytics, Reporting, Research and Survey
She further talked about Major Challenges faced by Sports Industry like
- Unknown Customers
- Lack of diversity, Nature of Sports’ business creates Data Silos
- Different preferences of different generation fans
- Recruiting Challenges especially in the job market (Biggest challenge)
- Hard to quantify the passion of fans
- Parking location, Loudness of Music, and many more challenges.
Arty Smith – He took everyone on the Basketball Court.
Talked about how the NBA uses Data Analytics to improve their team plays. He showed around 5 millions of data points(scatter plot) of NBA’s players scoring the basket from different positions.
He further talked about the relation(which isn’t linear) between scoring points and distance from the basket both with and without considering average points scored per shots by showing us line graphs. He also compared two different heat maps of 1988-89 and 2019-20. Data Analytics in Sports really helped the NBA teams to clearly visualise better scoring areas and really improved their strategies.
Food & Beverage: Written by Robert Robinson
DATAcated Industry Food and Beverage Recap
Excuse me, bartender there’s AI in my Beer, Data-driven process optimization for the beverage industry
Martin once asked how he could make German better? One answer he received to create beer “the Belgium way.” Not content with that answer, he first looks at how they made beer in the past. For a long time, it was created by using “gut” feelings. He wanted to translate beer language into data language. He discussed the Brewing process and its diverse use-case landscape. It included predictive maintenance for cost analysis, predicting how long to use the filter, forecasting energy demand, and discovering methods to optimize malt yield to get the perfect beer. He also demonstrated the importance of mapping the business problem using an impact and feasibility chart. He demonstrated a look at hop alpha predictions. He pointed out that teamwork is the key to success. The results he pointed out was quick prototyping of one malt source yields good results, and a model based on the full data is as good as the measurement accuracy of the yield measurements. He is an avid user of RapidMiner.
Kate: I need a perfect German beer!
The voice of the customer: Using NLP and Social Media to Extract Customer Feedback
Kendall talked about using social media as a new means of getting customer feedback. In the past, customer feedback was accomplished through surveys found at the bottom of your receipt. She pointed about that incorporating social media such as Twitter would give a broader and honest scale of customer feedback. Once she receives the data, she uses an extensive process to clean it using the Python language. From there, Natural Language Processing (NLP) is used to look for keywords both positive and negative. She used a Kaggle dataset to showcase an example. An essential item she pointed out was that your text analytics should be based on timing and context. If you use data from ten years ago, your results might be a bit out of date. Some applications of using NLP and social media are targeted advertisements, customer engagement hashtags, find areas of improvement, guiding relevant public communications, flagging problematic behavior, and studying a competitor’s behavior.
Kate: You covered so much is a short amount of time.
Wine + Data: An Unlikely Union
It’s an unlikely pairing of data and wine. She wanted to reimagine the wine journey through the lens of data. First, we have data from the wine vineyard which contains weather and yield data. Next, we take the grape into the winery, which provides data about how the wind is produced. After that, we have sales and marketing information. Finally, the consumers provide data (feedback) about what they think of the wine. She also talked about how very little business intelligence or data analytics are being used for wine creation which creates opportunity within the wine industry. She admits there is an issue with the data due to hold-over laws about alcohol from prohibition. She would love for others to educate themselves about data and develop ISL (Information as Second Language).
Cathy: Kate, you pronounced my last name correctly.
Would you like fries with that? – Recommended algorithms in the restaurant Industry
Enrique started by asking the question, “Do we need Data Science or Algorithms?” His strategy is suggestive marketing. He believes in using both collaborative filtering and content-based filtering to suggest products or up-sells to the customers. Certain items, such as muffins and coffee or a hamburger and French fries, are usually sold together, which are known trends. Some models that he works with also go against the trends. He also talked about the differences between exploring and exploiting. He also pointed out the importance of learning statics currently being left out or not, as emphasized in most data science programs. For anyone interested in a “data career,” he had this advice: Learn SQL, don’t neglect Microsoft Excel, and treat a data career like ordering a pizza: you gotta know what you like if you want the most out of it.
Kate: Yes, I would like fries with that!