Great leaps forward in scientific discoveries are often accompanied by certain risk and unintended consequences. Pushing the boundaries of AI using synthetic data is no different. AI innovations are emerging from research and development labs staffed by computer scientists, data scientists, cognitive scientists, mathematicians, statisticians, physicists, and other applied researchers working in a broad cross-section of industry. This pace of innovation presents challenges for business leaders. Before diving in head- first to discuss some of the deeper technical concepts, embracing synthetic data requires a new way of thinking which is outlined below.
AI and synthetic data are here to stay. Like the enterprise data warehouse initiatives that corporations rolled out in the mid-90s, leaders can benefit by developing a Synthetic Data Strategy. The key pillars of such a strategy should include a framework, method, and process for starting small with AI, failing fast using synthetic data, and learning quickly to find the value proposition that supports the organization’s strategic plan.
Choose Smart Partners
AI is a field that spans dozens of technologies. Choosing smart partners who are best-of-breed in their respective fields goes a long way to reducing the cost and risk associated with technology adoption cycles. Starting and supporting an internal AI stakeholder community is an ideal tactic for reducing the risk.
Beware the Trojan Horse
The citizens of Troy were overwhelmed by Greek soldiers hiding within the great wooden horse. Beware the AI start-up bearing gifts. AI is still high risk. Common sense still rules the day. Business leaders should trust but verify that new AI technologies map to the corporate strategic plan and are real products.
Embrace Data Parallelism
Batch processing has been the hallmark of corporate computing, dating back to when the first commercial mainframes entered the marketplace. Today, modern computing architectures and technologies (such as NVIDIA GPUs) make it possible to process data in parallel on thousands of cores. Synthetic data generation and machine learning are ideally suited to exploiting these data parallel technologies.
Use Cross-Disciplinary Teams to Discover the Outliers
Generating synthetic data is an exercise in enumerated permutations, a subfield of computer science and discrete mathematics. Generating a matrix representing all possible permutations can bring the hardware to a screeching halt. Using a cross-disciplinary team of internal and external experts will enable model builders to identify which permutations and outliers represent the highest probability of future events.
Think Automation
Development and operations (DevOps) teams are automating the building, testing, continuous integration, and continuous delivery of IT solutions. The generation of synthetic data can leverage this automation in a way that makes continuous learning a reality with little human intervention. Automation is a winning strategy against the armies of low-cost programmers in the global marketplace.
Navigate the Bounded Rationality of Organizational Data
The source of production data extends back in time to models of rational thinking bounded by outdated environments. Using a database schema developed in the 90s is unlikely to have captured the nuances of customer behavior that have emerged since the 2000s. The world continues to change rapidly. Using real data to innovate is a mine field of hidden risks. Synthetic data provides a blank canvas and conceptual space for developers to think about problems and solutions in a whole new way. Real data is a liability for any team that seeks to innovate in the field of AI.
Each of these ‘business risk’ sections could use some fleshing out. Many include general statements about a potential situation without including the ‘why’ or ‘how’ regarding how they apply here that I think would be relevant to your readers.