Cognitive Computing and Big Data Analytics

Topic > Cognitive Computing and Big Data Analytics

IndexIntroductionFramework of Cognitive Computing and Big DataDefinitionComputing with heterogeneous dataApplications in Cognitive Computing including Big DataFor transportation systemsFor the environmentUrban Computing for urban energy consumptionUrban Computing for the economyUrban Computing for security publicTypical TechnologyUrban Data Management TechniquesTechniques for Managing Data SparsityVisualization of Big DataOptimization TechniquesInformation SecurityFuture DirectionsConclusionThere is a problem within Big Data. The problem is that there is too much information and not enough talent to handle it. The supply of analysts and data scientists cannot keep up with the ever-increasing demand for this type of talent. This deficiency presents a problem, because even the most advanced data platforms are useless without experienced professionals to operate and maintain them. How do we solve this problem? More training and better academic programs? Maybe, but what if there was another solution? What if we instead trained computers to do the work for us, or at least make data tools easier to manage? Improvements in cognitive computing are bringing this reality ever closer. Say no to plagiarism. Get a tailor-made essay on "Why Violent Video Games Shouldn't Be Banned"? Get an original essay IntroductionSensing technologies and large-scale computing infrastructures have produced a variety of big data in urban spaces (e.g., human mobility, air quality, traffic patterns, and geographic data). Big data involves rich knowledge of any organization's population and can help address these challenges if used correctly. Motivated by the opportunities to build smarter cities, we can develop a vision of computing, which aims to unleash the power of knowledge from large, heterogeneous data collected in urban spaces and apply this powerful information to solve the major problems facing our cities. they face today. . In short, we aim to address big challenges in big cities using big data. Cognitive computing will bring a high level of fluidity to analysis. Data processing, which is normally essential for proper analytical functions, allows personnel who are not familiar with the language of data to interact with programs and platforms the way humans interact with each other. Therefore, platforms built with AI technology could translate regular speech and requests into data queries, providing simple commands and using normal language, and then deliver the answers in the same way they were received. With a feature like this, it would be much easier for anyone to work in the data field. Cognitive Computing and big data framework Definition Cognitive computing is a process of acquisition, integration and analysis of big and heterogeneous data generated from different sources in the urban environment. spaces, such as sensors, devices, vehicles, buildings and humans, to address the main issues cities face (e.g. air pollution, increased energy consumption and traffic congestion). It connects discrete and ubiquitous sensing technologies, advanced data management and analytical models, and new visualization methods to create win-win solutions that improve the environment, quality of human life, and city operating systems. Cognitive computing also helps us understand the nature of urban phenomena and even predict the future. It is an interdisciplinary field that merges the field of computer science with traditional fields such as transportation, civil engineering, economics, ecology, andsociology in the context of urban spaces. Computing with Heterogeneous Data Learn mutually reinforcing knowledge from heterogeneous data: Solving urban challenges involves a wide range of factors (e.g., exploring air pollution involves studying traffic flow, meteorology, and land use simultaneously ). However, existing data mining and machine learning techniques usually handle one type of data; for example, computer vision deals with images, and natural language processing relies on texts. Treating features extracted from different data sources the same way (for example, simply inserting these features into a feature vector and feeding them into a classification model) does not achieve the best performance. Furthermore, using multiple data sources in an application leads to high space size, which usually exacerbates the data sparsity problem. If not managed properly, multiple data sources could actually degrade the performance of a model. This requires advanced data analytics models that can learn mutually reinforcing knowledge across multiple heterogeneous data generated from different sources, including sensors, people, vehicles and buildings. Effective and efficient learning: Many urban cyber scenarios (e.g., traffic anomaly detection and air quality monitoring) require immediate responses. Beyond simply increasing the number of machines to speed up computation, we must aggregate data management, mining and machine learning algorithms into a computing framework to provide an effective and efficient knowledge discovery capability. Additionally, traditional data management techniques are typically designed for a single modal data source. An advanced management methodology that can organize multimodal data (such as streaming, geospatial and textual data) well is still missing. Therefore, processing with multiple heterogeneous data is a fusion of data and algorithms. Visualization: Huge amounts of data bring a huge amount of information that needs better presentation. A good visualization of the original data could inspire new ideas to solve a problem, while the visualization of the calculation results can reveal knowledge in an intuitive way to aid in decision making. Data visualization can also suggest correlation or causality between different factors. Multimodal data in urban computing scenarios leads to high-dimensional views, such as spatial, temporal, and social, for visualization. How to connect different types of data in different views and detect patterns and trends is challenging. Furthermore, when dealing with different types and huge volumes of data, seeing how exploratory visualization can provide people with an interactive way to generate new hypotheses becomes even more difficult. This requires an integration of instant data mining techniques into a visualization framework, which is still lacking in urban computing. Applications in cognitive computing including big data For transportation systems Finding fast driving routes saves both driver time and energy consumption in case of traffic congestion wastes a lot of gas. Extensive studies have been conducted to learn historical traffic patterns, estimate traffic flows in real time, and predict future traffic conditions on individual road segments in terms of fluctuating car data, such as vehicle GPS trajectories, WiFi, and GSM signals. However, modeling work on city-level traffic patterns is still rare. Taxis are an important mode of travel between transportationpublic and private, providing almost door-to-door travel services. In big cities like New York and Beijing, people usually wait a non-trivial time before catching a free taxi, while taxi drivers are eager to find passengers. Effectively connecting passengers to free taxis is of great importance to save waiting time, increase taxi drivers' profits and reduce unnecessary traffic and energy consumption. By 2050, 70% of the world's population is expected to live in cities. Municipal planners will face an increasingly urbanized and polluted world, with cities around the world suffering from an overly stressed road transport network. Building more effective public transport systems, as alternatives to private vehicles, has therefore become an urgent priority, both to ensure a good quality of life and a cleaner environment, and to remain economically attractive to citizens, potential investors and employees. Mass public transport systems, combined with integrated fare management and advanced traveler information systems, are considered key enablers to better manage mobility. For the environment Without effective and adaptive planning, the rapid progress of urbanization will become a potential threat to the environment of cities. Recently, we have witnessed an increasing trend of pollution in different aspects of the environment, such as air quality, noise and waste, across the world. Protecting the environment while modernizing people's lives is of paramount importance in urban computing. Urban Informatics for Urban Energy Consumption The rapid progress of urbanization is consuming more and more energy, requiring technologies that can detect city-scale energy costs, improve energy infrastructure, and ultimately reduce energy consumption. Urban Informatics for the Economy The dynamics of a city (for example, human mobility and the number of changes in a POI category) can indicate the performance of the city's economy. For example, the number of movie theaters in Beijing continued to increase from 2008 to 2012, reaching 260. This could mean that more and more people living in Beijing would like to watch a movie in a movie theater. Conversely, some POI categories will disappear in a city, denoting declining business. Similarly, human mobility could indicate the unemployment rate of some large cities, thus helping to predict the performance of a stock market. Urban IT for public safety Major events, pandemics, serious accidents, environmental disasters and terrorist attacks represent additional threats to public safety and order. The wide availability of different types of urban data offers us the possibility, on the one hand, to learn from history how to correctly manage the aforementioned threats and, on the other, to detect them in a timely manner or even predict them in advance. Typical technology Management techniques of urban dataData generated in urban spaces is usually associated with a spatial or spatiotemporal property. For example, road networks and POIs are the frequently used spatial data in urban spaces; Weather data, surveillance video, and electricity consumption are temporal data (also called time series or flow). Other data sources, such as traffic flows and human mobility, simultaneously have spatiotemporal properties. Sometimes temporal data can also be associated with a location, thus becoming a sort of spatiotemporal data (for example, the temperature of a region and the electricity consumption of a building). Consequently, good urban data management techniques should be able toefficiently manage spatial and spatiotemporal data. Furthermore, an urban computing system usually needs to exploit a variety of heterogeneous data. In many cases, these systems need to quickly respond to users' instant questions (for example, predict traffic conditions and predict air pollution). Without data management techniques that can organize multiple heterogeneous data sources, it becomes impossible for the subsequent data mining process to quickly gain knowledge from these data sources. For example, without an efficient spatiotemporal indexing structure that organizes POI, road networks, traffic, and human mobility data well in advance, the unique feature extraction process of the U-Air project will take a few hours. The delay will prevent this application from telling people about a city's air quality every hour. Techniques to Handle Data Scarcity There are many reasons that lead to a data shortage problem. For example, a user would only check in to certain places in a location-based social networking service, and some places might not have people visiting them. If we enter the user's location into a matrix where each entry denotes the number of user visits to a location, the matrix is very sparse; that is, many items have no value. If we further consider the activities (such as shopping, dining, and sports) that a user can perform in a location as a third dimension, a tensor can be formulated. Naturally the tensor is even more sparse. Data sparsity is a general challenge that has been studied for years in many computing tasks. Big Data VisualizationWhen we talk about data visualization, many people would only think about visualizing raw data and presenting the results generated by data mining processes. The former can reveal the correlation between different factors, thus suggesting characteristics for a machine learning model. As mentioned above, spatiotemporal data is widely used in urban computing. For a complete analysis, the data must be considered from two complementary perspectives: as spatial distributions that change over time (i.e. spaces in time) and as profiles of local temporal variation distributed over space. However, data visualization is not just about displaying raw data and presenting the results. Exploratory visualization becomes even more important in urban computing. Semi-supervised learning and transfer learning. Semi-supervised learning is a class of supervised learning tasks and techniques that also use unlabeled data for training, typically a small amount of labeled data with a large amount of unlabeled data. Many researchers in the field of machine learning have found that unlabeled data, when used in conjunction with a small amount of labeled data, can produce a dramatic improvement in learning accuracy. There are several semi-supervised learning methods, such as generative models, graph-based methods, and co-training. Specifically, co-training is a semi-supervised learning technique that requires two views of the data. It is assumed that each example is described by two different sets of features that provide different and complementary information about an instance. Ideally, the two feature sets of each instance are conditionally independent given the class, and the class of an instance can only be accurately predicted from each view. Co-training can generate a better inference result because one of the classifiers correctly labels the data that the other classifier hadpreviously misclassified. Transfer learning: An important assumption in many machine learning and data mining algorithms is that training and future data must be in the same format. same feature space and have the same distribution. However, in many real-world applications, this assumption may not hold. For example, sometimes we have a classification task in one domain of interest, but we only have enough training data in another domain of interest, where the latter data might be in a different feature space or follow a different data distribution . Different from semi-supervised learning, which assumes that the distributions of labeled and unlabeled data are the same, transfer learning, in contrast, allows the domains, tasks, and distributions used in training and testing to be different. In the real world, we see many examples of learning transfer. For example, learning to recognize tables can help you recognize chairs. Optimization Techniques First, many data mining tasks can be solved using optimization methods, such as matrix factorization and tensor decomposition. Examples include location-task recommendations and inference research on refueling behavior. Second, the learning process of many machine learning models is actually based on optimization and approximation algorithms, such as maximum likelihood, gradient descent, and EM (estimation and maximization). Third, the results of operations research can be applied to solving an urban computing task when combined with other techniques, such as database algorithms. For example, the ridesharing problem has been studied in operations research for many years. It has been shown that this is an NP-hard problem if we want to minimize the total travel distance of a group of people planning to share rides. As a result, it is really difficult to apply existing solutions to a large group of users, especially in an online application. In the dynamic taxi ridesharing system T-Share combined spatiotemporal database techniques with optimization algorithms to significantly reduce the number of taxis to be controlled. Finally, the service can be provided online to answer instant questions from millions of users. Another example combined a PCA-based anomaly detection algorithm with L1 minimization techniques to diagnose traffic flows leading to a traffic anomaly. The spatiotemporal properties and dynamics of urban computing applications also pose new challenges to current operations research. Information Security Information security is also not trivial for an urban computing system that can collect data from multiple sources and communicate with millions of devices and users. Common problems that might occur in urban computing systems include data security (e.g., ensuring that received data is integrated, up-to-date, and undeniable), authentication between different sources and clients, and intrusion detection in a hybrid system ( connecting digital and physical worlds).Future DirectionsAlthough many research projects on urban computing have been conducted in recent years, there are still some technologies that are missing or not well studied. Balanced Crowd Sensor: Data generated through a crowd sensing method is not evenly distributed across geographic areas and time spaces. In some locations we may have much more data than we actually need. A method of.