|How do databases support AI algorithms?|
By Donald Conway
Databases have always been able to do simple administrative work, such as finding particular records that match some certain criteria, for example, all users who are between 20 and 30 years old. Lately, database companies have been adding artificial intelligence routines to databases so that users can explore the power of these smarter and more sophisticated algorithms on their own data stored in the database.
AI algorithms are also finding a home below the surface, where AI routines help optimize internal tasks like reindexing or query planning. These new features are often billed as an automation addition because they relieve the user of cleaning work. Developers are encouraged to let them do their job and forget about them.
However, there is much more interest in AI routines that are open to users. These machine learning algorithms can classify data and make smarter decisions that evolve and adapt over time. They can unlock new use cases and improve the flexibility of existing algorithms.
In many cases, integration is largely pragmatic and essentially cosmetic. The calculations are no different than what would occur if the data were exported and sent to a separate AI program. Within the database, the AI ??routines are separate and simply take advantage of any internal access to the data. Sometimes this faster access can speed up the process dramatically. When data is important, sometimes just moving it can take a great deal of time.
The integration can also limit the analysis to algorithms that are officially part of the database. If users want to implement a different algorithm, they must go back to the old process of exporting the data in the correct format and importing it into the AI ??routine.
The integration can take advantage of some of the newer in-memory distributed databases that easily distribute the load and data storage across multiple machines. These can easily handle a large amount of data. If a complex analysis is necessary, it may not be difficult to increase the CPU capacity and RAM allocated to each machine.
Some AI-powered databases can also take advantage of GPU chips. Some AI algorithms use the highly parallel architecture of GPUs to train machine learning models and run other algorithms. There are also some custom chips specially designed for AI that can dramatically speed up analysis.
However, one of the biggest advantages may be the standard interface, which is often SQL, a language that is already familiar to many programmers. Many software packages already easily interact with SQL databases. If someone wants more AI analysis, it is no more complex than learning the new SQL statements.
What are established companies doing? Artificial intelligence is a very competitive field now. All the major database companies are exploring integrating algorithms with their tools. In many cases, companies offer so many options that it is impossible to summarize them here.
Oracle has integrated AI is incorporated into its databases in a variety of ways, and the company offers a broad set of options in almost every corner of its stack. At the lower levels, some developers, for example, are running machine learning algorithms in the Python interpreter that is built into the Oracle database. There are also more integrated options like Oracle Machine Learning for R, a version that R uses to analyze data stored in Oracle databases. Many of the services are incorporated at higher levels, for example, as features for analysis in the data science tools or analytics.
IBM also has a number of artificial intelligence tools that are integrated with its various databases, and the company sometimes calls Db2 “the artificial intelligence database.” At the lowest level, the database includes functions in its version of SQL to address common parts of building AI models, such as linear regression. These can be threaded together in custom stored procedures for training. Many IBM AI tools, such as Watson Study, are designed to connect directly to the database to speed up model construction.
Hadoop and its ecosystem of tools are commonly used to analyze large data sets. While they are often viewed as more data processing channels than databases, there is often a database like HBase buried within them. Some people use the Hadoop distributed file system to store data, sometimes in CSV format. A variety of AI tools are already built into the Hadoop pipeline using tools like Submarine, making it a database with built-in AI.
All major cloud companies offer databases and artificial intelligence products. The amount of integration between any particular database and any particular AI varies substantially, but it is often quite easy to connect the two. Amazon Comprehend, a natural language text analysis tool, accepts data from S3 buckets and stores responses in many locations, including some AWS databases. Amazon SageMaker You can access data in S3 buckets or Redshift data lakes, sometimes using SQL through Amazon Athena. While it’s a good question as to whether these count as true integration, there is no question that they simplify the journey.
In the Google cloud, the AutoML tool for automated machine learning can obtain data from BigQuery databases. Firebase ML offers a number of tools to address common challenges for mobile device developers, such as image classification. It will also implement any trained TensorFlow Lite models to work with your data.
Microsoft Azure also offers a collection of databases and artificial intelligence tools. The Databricks tool, for example, is based on the Apache Spark pipeline and comes with connections to Azure Cosmos DB, your Data Lake storage, and other databases like Neo4j or Elasticsearch that may be running within Azure. its Azure Data Factory It is designed to search for data in the cloud, both in databases and in generic storage.
What are the upstarts doing? Several database startups also highlight their direct support for machine learning and other artificial intelligence routines. SingleStore, for example, offers quick analytics to track incoming telemetry in real time. This data can also be annotated Based on various AI models as ingested.
MindsDB add machine learning routines to standard databases such as MariaDB, PostgreSQL, or Microsoft SQL. Extends SQL to include features to learn from the data already in the database to make predictions and classify objects. These functions are also easily accessible in more than a dozen business intelligence applications, such as Salesforce’s Tableau or Microsoft’s Power BI, that work closely with SQL databases.
Many of the companies bury the database deep inside the product and sell only the service itself. Risky, for example, it tracks financial transactions using artificial intelligence models and offers protection to merchants through “chargeback guarantees.” The tool ingests transactions and maintains historical data, but there is little discussion about the database layer.
In many cases, companies that can bill themselves as pure AI companies are also database providers. After all, the data must be somewhere. H2O.aiFor example, it is just one of the cloud AI providers that offers integrated data prep and AI analytics. However, data storage is more hidden and many people think of software like H2O.ai first for its analytical power. Still, you can store and analyze the data.
Is there anything the built-in AI databases can’t do? Adding AI routines directly to a database’s feature set can simplify the lives of database developers and administrators. It can also make the analysis a bit faster in some cases. But beyond the convenience and speed of working with a data set, this does not offer any large and continuous advantage over exporting the data and importing it into a separate program.
The process can limit developers who can choose to explore only algorithms that are implemented directly within the database. If the algorithm is not part of the database, it is not an option.
Of course, many problems cannot be solved with machine learning or artificial intelligence. The integration of AI algorithms with the database does not change the power of the algorithms, it just accelerates them.