Biblio

List
Filter

Found 101 results

Filters: Keyword is text analysis [Clear All Filters]

2017-12-20

Azakami, T., Shibata, C., Uda, R.. 2017. Challenge to Impede Deep Learning against CAPTCHA with Ergonomic Design. 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC). 1:637–642.

Once we had tried to propose an unbreakable CAPTCHA and we reached a result that limitation of time is effect to prevent computers from recognizing characters accurately while computers can finally recognize all text-based CAPTCHA in unlimited time. One of the existing usual ways to prevent computers from recognizing characters is distortion, and adding noise is also effective for the prevention. However, these kinds of prevention also make recognition of characters by human beings difficult. As a solution of the problems, an effective text-based CAPTCHA algorithm with amodal completion was proposed by our team. Our CAPTCHA causes computers a large amount of calculation costs while amodal completion helps human beings to recognize characters momentarily. Our CAPTCHA has evolved with aftereffects and combinations of complementary colors. We evaluated our CAPTCHA with deep learning which is attracting the most attention since deep learning is faster and more accurate than existing methods for recognition with computers. In this paper, we add jagged lines to edges of characters since edges are one of the most important parts for recognition in deep learning. In this paper, we also evaluate that how much the jagged lines decrease recognition of human beings and how much they prevent computers from the recognition. We confirm the effects of our method to deep learning.

Wang, Y., Huang, Y., Zheng, W., Zhou, Z., Liu, D., Lu, M.. 2017. Combining convolutional neural network and self-adaptive algorithm to defeat synthetic multi-digit text-based CAPTCHA. 2017 IEEE International Conference on Industrial Technology (ICIT). :980–985.

We always use CAPTCHA(Completely Automated Public Turing test to Tell Computers and Humans Apart) to prevent automated bot for data entry. Although there are various kinds of CAPTCHAs, text-based scheme is still applied most widely, because it is one of the most convenient and user-friendly way for daily user [1]. The fact is that segmentations of different types of CAPTCHAs are not always the same, which means one of CAPTCHA's bottleneck is the segmentation. Once we could accurately split the character, the problem could be solved much easier. Unfortunately, the best way to divide them is still case by case, which is to say there is no universal way to achieve it. In this paper, we present a novel algorithm to achieve state-of-the-art performance, what was more, we also constructed a new convolutional neural network as an add-on recognition part to stabilize our state-of-the-art performance of the whole CAPTCHA system. The CAPTCHA datasets we are using is from the State Administration for Industry& Commerce of the People's Republic of China. In this datasets, there are totally 33 entrances of CAPTCHAs. In this experiments, we assume that each of the entrance is known. Results are provided showing how our algorithms work well towards these CAPTCHAs.

2017-12-12

Zhou, G., Huang, J. X.. 2017. Modeling and Learning Distributed Word Representation with Metadata for Question Retrieval. IEEE Transactions on Knowledge and Data Engineering. 29:1226–1239.

Community question answering (cQA) has become an important issue due to the popularity of cQA archives on the Web. This paper focuses on addressing the lexical gap problem in question retrieval. Question retrieval in cQA archives aims to find the existing questions that are semantically equivalent or relevant to the queried questions. However, the lexical gap problem brings a new challenge for question retrieval in cQA. In this paper, we propose to model and learn distributed word representations with metadata of category information within cQA pages for question retrieval using two novel category powered models. One is a basic category powered model called MB-NET and the other one is an enhanced category powered model called ME-NET which can better learn the distributed word representations and alleviate the lexical gap problem. To deal with the variable size of word representation vectors, we employ the framework of fisher kernel to transform them into the fixed-length vectors. Experimental results on large-scale English and Chinese cQA data sets show that our proposed approaches can significantly outperform state-of-the-art retrieval models for question retrieval in cQA. Moreover, we further conduct our approaches on large-scale automatic evaluation experiments. The evaluation results show that promising and significant performance improvements can be achieved.

2017-11-27

Settanni, G., Shovgenya, Y., Skopik, F., Graf, R., Wurzenberger, M., Fiedler, R.. 2016. Correlating cyber incident information to establish situational awareness in Critical Infrastructures. 2016 14th Annual Conference on Privacy, Security and Trust (PST). :78–81.

Protecting Critical Infrastructures (CIs) against contemporary cyber attacks has become a crucial as well as complex task. Modern attack campaigns, such as Advanced Persistent Threats (APTs), leverage weaknesses in the organization's business processes and exploit vulnerabilities of several systems to hit their target. Although their life-cycle can last for months, these campaigns typically go undetected until they achieve their goal. They usually aim at performing data exfiltration, cause service disruptions and can also undermine the safety of humans. Novel detection techniques and incident handling approaches are therefore required, to effectively protect CI's networks and timely react to this type of threats. Correlating large amounts of data, collected from a multitude of relevant sources, is necessary and sometimes required by national authorities to establish cyber situational awareness, and allow to promptly adopt suitable countermeasures in case of an attack. In this paper we propose three novel methods for security information correlation designed to discover relevant insights and support the establishment of cyber situational awareness.

2017-11-20

Buthelezi, M. P., Poll, J. A. van der, Ochola, E. O.. 2016. Ambiguity as a Barrier to Information Security Policy Compliance: A Content Analysis. 2016 International Conference on Computational Science and Computational Intelligence (CSCI). :1360–1367.

Institutions use the information security (InfoSec) policy document as a set of rules and guidelines to govern the use of the institutional information resources. However, a common problem is that these policies are often not followed or complied with. This study explores the extent to which the problem lies with the policy documents themselves. The InfoSec policies are documented in the natural languages, which are prone to ambiguity and misinterpretation. Subsequently such policies may be ambiguous, thereby making it hard, if not impossible for users to comply with. A case study approach with a content analysis was conducted. The research explores the extent of the problem by using a case study of an educational institution in South Africa.

2017-11-03

Zulkarnine, A. T., Frank, R., Monk, B., Mitchell, J., Davies, G.. 2016. Surfacing collaborated networks in dark web to find illicit and criminal content. 2016 IEEE Conference on Intelligence and Security Informatics (ISI). :109–114.

The Tor Network, a hidden part of the Internet, is becoming an ideal hosting ground for illegal activities and services, including large drug markets, financial frauds, espionage, child sexual abuse. Researchers and law enforcement rely on manual investigations, which are both time-consuming and ultimately inefficient. The first part of this paper explores illicit and criminal content identified by prominent researchers in the dark web. We previously developed a web crawler that automatically searched websites on the internet based on pre-defined keywords and followed the hyperlinks in order to create a map of the network. This crawler has demonstrated previous success in locating and extracting data on child exploitation images, videos, keywords and linkages on the public internet. However, as Tor functions differently at the TCP level, and uses socket connections, further technical challenges are faced when crawling Tor. Some of the other inherent challenges for advanced Tor crawling include scalability, content selection tradeoffs, and social obligation. We discuss these challenges and the measures taken to meet them. Our modified web crawler for Tor, termed the “Dark Crawler” has been able to access Tor while simultaneously accessing the public internet. We present initial findings regarding what extremist and terrorist contents are present in Tor and how this content is connected to each other in a mapped network that facilitates dark web crimes. Our results so far indicate the most popular websites in the dark web are acting as catalysts for dark web expansion by providing necessary knowledgebase, support and services to build Tor hidden services and onion websites.

2017-03-08

Roth, J., Liu, X., Ross, A., Metaxas, D.. 2015. Investigating the Discriminative Power of Keystroke Sound. IEEE Transactions on Information Forensics and Security. 10:333–345.

The goal of this paper is to determine whether keystroke sound can be used to recognize a user. In this regard, we analyze the discriminative power of keystroke sound in the context of a continuous user authentication application. Motivated by the concept of digraphs used in modeling keystroke dynamics, a virtual alphabet is first learned from keystroke sound segments. Next, the digraph latency within the pairs of virtual letters, along with other statistical features, is used to generate match scores. The resultant scores are indicative of the similarities between two sound streams, and are fused to make a final authentication decision. Experiments on both static text-based and free text-based authentications on a database of 50 subjects demonstrate the potential as well as the limitations of keystroke sound.

Gonzalez, N., Calot, E. P.. 2015. Finite Context Modeling of Keystroke Dynamics in Free Text. 2015 International Conference of the Biometrics Special Interest Group (BIOSIG). :1–5.

Keystroke dynamics analysis has been applied successfully to password or fixed short texts verification as a means to reduce their inherent security limitations, because their length and the fact of being typed often makes their characteristic timings fairly stable. On the other hand, free text analysis has been neglected until recent years due to the inherent difficulties of dealing with short term behavioral noise and long term effects over the typing rhythm. In this paper we examine finite context modeling of keystroke dynamics in free text and report promising results for user verification over an extensive data set collected from a real world environment outside the laboratory setting that we make publicly available.

Vizer, L. M., Sears, A.. 2015. Classifying Text-Based Computer Interactions for Health Monitoring. IEEE Pervasive Computing. 14:64–71.

Detecting early trends indicating cognitive decline can allow older adults to better manage their health, but current assessments present barriers precluding the use of such continuous monitoring by consumers. To explore the effects of cognitive status on computer interaction patterns, the authors collected typed text samples from older adults with and without pre-mild cognitive impairment (PreMCI) and constructed statistical models from keystroke and linguistic features for differentiating between the two groups. Using both feature sets, they obtained a 77.1 percent correct classification rate with 70.6 percent sensitivity, 83.3 percent specificity, and a 0.808 area under curve (AUC). These results are in line with current assessments for MC–a more advanced disease–but using an unobtrusive method. This research contributes a combination of features for text and keystroke analysis and enhances understanding of how clinicians or older adults themselves might monitor for PreMCI through patterns in typed text. It has implications for embedded systems that can enable healthcare providers and consumers to proactively and continuously monitor changes in cognitive function.

Lokhande, S. S., Dawande, N. A.. 2015. A Survey on Document Image Binarization Techniques. 2015 International Conference on Computing Communication Control and Automation. :742–746.

Document image binarization is performed to segment foreground text from background text in badly degraded documents. In this paper, a comprehensive survey has been conducted on some state-of-the-art document image binarization techniques. After describing these document images binarization techniques, their performance have been compared with the help of various evaluation performance metrics which are widely used for document image analysis and recognition. On the basis of this comparison, it has been found out that the adaptive contrast method is the best performing method. Accordingly, the partial results that we have obtained for the adaptive contrast method have been stated and also the mathematical model and block diagram of the adaptive contrast method has been described in detail.

Kesiman, M. W. A., Prum, S., Sunarya, I. M. G., Burie, J. C., Ogier, J. M.. 2015. An analysis of ground truth binarized image variability of palm leaf manuscripts. 2015 International Conference on Image Processing Theory, Tools and Applications (IPTA). :229–233.

As a very valuable cultural heritage, palm leaf manuscripts offer a new challenge in document analysis system due to the specific characteristics on physical support of the manuscript. With the aim of finding an optimal binarization method for palm leaf manuscript images, creating a new ground truth binarized image is a necessary step in document analysis of palm leaf manuscript. But, regarding to the human intervention in ground truthing process, an important remark about the subjectivity effect on the construction of ground truth binarized image has been analysed and reported. In this paper, we present an experiment in a real condition to analyse the existance of human subjectivity on the construction of ground truth binarized image of palm leaf manuscript images and to measure quantitatively the ground truth variability with several binarization evaluation metrics.

2017-03-07

Lakhita, Yadav, S., Bohra, B., Pooja. 2015. A review on recent phishing attacks in Internet. 2015 International Conference on Green Computing and Internet of Things (ICGCIoT). :1312–1315.

The development of internet comes with the other domain that is cyber-crime. The record and intelligently can be exposed to a user of illegal activity so that it has become important to make the technology reliable. Phishing techniques include domain of email messages. Phishing emails have hosted such a phishing website, where a click on the URL or the malware code as executing some actions to perform is socially engineered messages. Lexically analyzing the URLs can enhance the performance and help to differentiate between the original email and the phishing URL. As assessed in this study, in addition to textual analysis of phishing URL, email classification is successful and results in a highly precise anti phishing.

2017-02-23

A. Rahmani, A. Amine, M. R. Hamou. 2015. "De-identification of Textual Data Using Immune System for Privacy Preserving in Big Data". 2015 IEEE International Conference on Computational Intelligence Communication Technology. :112-116.

With the growing observed success of big data use, many challenges appeared. Timeless, scalability and privacy are the main problems that researchers attempt to figure out. Privacy preserving is now a highly active domain of research, many works and concepts had seen the light within this theme. One of these concepts is the de-identification techniques. De-identification is a specific area that consists of finding and removing sensitive information either by replacing it, encrypting it or adding a noise to it using several techniques such as cryptography and data mining. In this report, we present a new model of de-identification of textual data using a specific Immune System algorithm known as CLONALG.

I. Mukherjee, R. Ganguly. 2015. "Privacy preserving of two sixteen-segmented image using visual cryptography". 2015 IEEE International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN). :417-422.

With the advancement of technology, the world has not only become a better place to live in but have also lost the privacy and security of shared data. Information in any form is never safe from the hands of unauthorized accessing individuals. Here, in our paper we propose an approach by which we can preserve data using visual cryptography. In this paper, two sixteen segment displayed text is broken into two shares that does not reveal any information about the original images. By this process we have obtained satisfactory results in statistical and structural testes.

2015-05-05

Lomotey, R.K., Deters, R.. 2014. Terms Mining in Document-Based NoSQL: Response to Unstructured Data. Big Data (BigData Congress), 2014 IEEE International Congress on. :661-668.

Unstructured data mining has become topical recently due to the availability of high-dimensional and voluminous digital content (known as "Big Data") across the enterprise spectrum. The Relational Database Management Systems (RDBMS) have been employed over the past decades for content storage and management, but, the ever-growing heterogeneity in today's data calls for a new storage approach. Thus, the NoSQL database has emerged as the preferred storage facility nowadays since the facility supports unstructured data storage. This creates the need to explore efficient data mining techniques from such NoSQL systems since the available tools and frameworks which are designed for RDBMS are often not directly applicable. In this paper, we focused on topics and terms mining, based on clustering, in document-based NoSQL. This is achieved by adapting the architectural design of an analytics-as-a-service framework and the proposal of the Viterbi algorithm to enhance the accuracy of the terms classification in the system. The results from the pilot testing of our work show higher accuracy in comparison to some previously proposed techniques such as the parallel search.

Koch, S., John, M., Worner, M., Muller, A., Ertl, T.. 2014. VarifocalReader #x2014; In-Depth Visual Analysis of Large Text Documents. Visualization and Computer Graphics, IEEE Transactions on. 20:1723-1732.

Interactive visualization provides valuable support for exploring, analyzing, and understanding textual documents. Certain tasks, however, require that insights derived from visual abstractions are verified by a human expert perusing the source text. So far, this problem is typically solved by offering overview-detail techniques, which present different views with different levels of abstractions. This often leads to problems with visual continuity. Focus-context techniques, on the other hand, succeed in accentuating interesting subsections of large text documents but are normally not suited for integrating visual abstractions. With VarifocalReader we present a technique that helps to solve some of these approaches' problems by combining characteristics from both. In particular, our method simplifies working with large and potentially complex text documents by simultaneously offering abstract representations of varying detail, based on the inherent structure of the document, and access to the text itself. In addition, VarifocalReader supports intra-document exploration through advanced navigation concepts and facilitates visual analysis tasks. The approach enables users to apply machine learning techniques and search mechanisms as well as to assess and adapt these techniques. This helps to extract entities, concepts and other artifacts from texts. In combination with the automatic generation of intermediate text levels through topic segmentation for thematic orientation, users can test hypotheses or develop interesting new research questions. To illustrate the advantages of our approach, we provide usage examples from literature studies.

Zadeh, B.Q., Handschuh, S.. 2014. Random Manhattan Indexing. Database and Expert Systems Applications (DEXA), 2014 25th International Workshop on. :203-208.

Vector space models (VSMs) are mathematically well-defined frameworks that have been widely used in text processing. In these models, high-dimensional, often sparse vectors represent text units. In an application, the similarity of vectors -- and hence the text units that they represent -- is computed by a distance formula. The high dimensionality of vectors, however, is a barrier to the performance of methods that employ VSMs. Consequently, a dimensionality reduction technique is employed to alleviate this problem. This paper introduces a new method, called Random Manhattan Indexing (RMI), for the construction of L1 normed VSMs at reduced dimensionality. RMI combines the construction of a VSM and dimension reduction into an incremental, and thus scalable, procedure. In order to attain its goal, RMI employs the sparse Cauchy random projections.

Baughman, A.K., Chuang, W., Dixon, K.R., Benz, Z., Basilico, J.. 2014. DeepQA Jeopardy! Gamification: A Machine-Learning Perspective. Computational Intelligence and AI in Games, IEEE Transactions on. 6:55-66.

DeepQA is a large-scale natural language processing (NLP) question-and-answer system that responds across a breadth of structured and unstructured data, from hundreds of analytics that are combined with over 50 models, trained through machine learning. After the 2011 historic milestone of defeating the two best human players in the Jeopardy! game show, the technology behind IBM Watson, DeepQA, is undergoing gamification into real-world business problems. Gamifying a business domain for Watson is a composite of functional, content, and training adaptation for nongame play. During domain gamification for medical, financial, government, or any other business, each system change affects the machine-learning process. As opposed to the original Watson Jeopardy!, whose class distribution of positive-to-negative labels is 1:100, in adaptation the computed training instances, question-and-answer pairs transformed into true-false labels, result in a very low positive-to-negative ratio of 1:100 000. Such initial extreme class imbalance during domain gamification poses a big challenge for the Watson machine-learning pipelines. The combination of ingested corpus sets, question-and-answer pairs, configuration settings, and NLP algorithms contribute toward the challenging data state. We propose several data engineering techniques, such as answer key vetting and expansion, source ingestion, oversampling classes, and question set modifications to increase the computed true labels. In addition, algorithm engineering, such as an implementation of the Newton-Raphson logistic regression with a regularization term, relaxes the constraints of class imbalance during training adaptation. We conclude by empirically demonstrating that data and algorithm engineering are complementary and indispensable to overcome the challenges in this first Watson gamification for real-world business problems.

Babour, A., Khan, J.I.. 2014. Tweet Sentiment Analytics with Context Sensitive Tone-Word Lexicon. Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on. 1:392-399.

In this paper we propose a twitter sentiment analytics that mines for opinion polarity about a given topic. Most of current semantic sentiment analytics depends on polarity lexicons. However, many key tone words are frequently bipolar. In this paper we demonstrate a technique which can accommodate the bipolarity of tone words by context sensitive tone lexicon learning mechanism where the context is modeled by the semantic neighborhood of the main target. Performance analysis shows that ability to contextualize the tone word polarity significantly improves the accuracy.

Eun Hee Ko, Klabjan, D.. 2014. Semantic Properties of Customer Sentiment in Tweets. Advanced Information Networking and Applications Workshops (WAINA), 2014 28th International Conference on. :657-663.

An increasing number of people are using online social networking services (SNSs), and a significant amount of information related to experiences in consumption is shared in this new media form. Text mining is an emerging technique for mining useful information from the web. We aim at discovering in particular tweets semantic patterns in consumers' discussions on social media. Specifically, the purposes of this study are twofold: 1) finding similarity and dissimilarity between two sets of textual documents that include consumers' sentiment polarities, two forms of positive vs. negative opinions and 2) driving actual content from the textual data that has a semantic trend. The considered tweets include consumers' opinions on US retail companies (e.g., Amazon, Walmart). Cosine similarity and K-means clustering methods are used to achieve the former goal, and Latent Dirichlet Allocation (LDA), a popular topic modeling algorithm, is used for the latter purpose. This is the first study which discover semantic properties of textual data in consumption context beyond sentiment analysis. In addition to major findings, we apply LDA (Latent Dirichlet Allocations) to the same data and drew latent topics that represent consumers' positive opinions and negative opinions on social media.

Conglei Shi, Yingcai Wu, Shixia Liu, Hong Zhou, Huamin Qu. 2014. LoyalTracker: Visualizing Loyalty Dynamics in Search Engines. Visualization and Computer Graphics, IEEE Transactions on. 20:1733-1742.

The huge amount of user log data collected by search engine providers creates new opportunities to understand user loyalty and defection behavior at an unprecedented scale. However, this also poses a great challenge to analyze the behavior and glean insights into the complex, large data. In this paper, we introduce LoyalTracker, a visual analytics system to track user loyalty and switching behavior towards multiple search engines from the vast amount of user log data. We propose a new interactive visualization technique (flow view) based on a flow metaphor, which conveys a proper visual summary of the dynamics of user loyalty of thousands of users over time. Two other visualization techniques, a density map and a word cloud, are integrated to enable analysts to gain further insights into the patterns identified by the flow view. Case studies and the interview with domain experts are conducted to demonstrate the usefulness of our technique in understanding user loyalty and switching behavior in search engines.

Koyanagi, T., Shinjo, Y.. 2014. A fast and compact hybrid memory resident datastore for text analytics with autonomic memory allocation. Information and Communication Systems (ICICS), 2014 5th International Conference on. :1-7.

This paper describes a high-performance and space-efficient memory-resident datastore for text analytics systems based on a hash table for fast access, a dynamic trie for staging and a list of Level-Order Unary Degree Sequence (LOUDS) tries for compactness. We achieve efficient memory allocation and data placement by placing freqently access keys in the hash table, and infrequently accessed keys in the LOUDS tries without using conventional cache algorithms. Our algorithm also dynamically changes memory allocation sizes for these data structures according to the remaining available memory size. This technique yields 38.6% to 52.9% better throughput than a double array trie - a conventional fast and compact datastore.

Heimerl, F., Lohmann, S., Lange, S., Ertl, T.. 2014. Word Cloud Explorer: Text Analytics Based on Word Clouds. System Sciences (HICSS), 2014 47th Hawaii International Conference on. :1833-1842.

Word clouds have emerged as a straightforward and visually appealing visualization method for text. They are used in various contexts as a means to provide an overview by distilling text down to those words that appear with highest frequency. Typically, this is done in a static way as pure text summarization. We think, however, that there is a larger potential to this simple yet powerful visualization paradigm in text analytics. In this work, we explore the usefulness of word clouds for general text analysis tasks. We developed a prototypical system called the Word Cloud Explorer that relies entirely on word clouds as a visualization method. It equips them with advanced natural language processing, sophisticated interaction techniques, and context information. We show how this approach can be effectively used to solve text analysis tasks and evaluate it in a qualitative user study.

Mukkamala, R.R., Hussain, A., Vatrapu, R.. 2014. Towards a Set Theoretical Approach to Big Data Analytics. Big Data (BigData Congress), 2014 IEEE International Congress on. :629-636.

Formal methods, models and tools for social big data analytics are largely limited to graph theoretical approaches such as social network analysis (SNA) informed by relational sociology. There are no other unified modeling approaches to social big data that integrate the conceptual, formal and software realms. In this paper, we first present and discuss a theory and conceptual model of social data. Second, we outline a formal model based on set theory and discuss the semantics of the formal model with a real-world social data example from Facebook. Third, we briefly present and discuss the Social Data Analytics Tool (SODATO) that realizes the conceptual model in software and provisions social data analysis based on the conceptual and formal models. Fourth and last, based on the formal model and sentiment analysis of text, we present a method for profiling of artifacts and actors and apply this technique to the data analysis of big social data collected from Facebook page of the fast fashion company, H&M.

Dey, L., Mahajan, D., Gupta, H.. 2014. Obtaining Technology Insights from Large and Heterogeneous Document Collections. Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on. 1:102-109.

Keeping up with rapid advances in research in various fields of Engineering and Technology is a challenging task. Decision makers including academics, program managers, venture capital investors, industry leaders and funding agencies not only need to be abreast of latest developments but also be able to assess the effect of growth in certain areas on their core business. Though analyst agencies like Gartner, McKinsey etc. Provide such reports for some areas, thought leaders of all organisations still need to amass data from heterogeneous collections like research publications, analyst reports, patent applications, competitor information etc. To help them finalize their own strategies. Text mining and data analytics researchers have been looking at integrating statistics, text analytics and information visualization to aid the process of retrieval and analytics. In this paper, we present our work on automated topical analysis and insight generation from large heterogeneous text collections of publications and patents. While most of the earlier work in this area provides search-based platforms, ours is an integrated platform for search and analysis. We have presented several methods and techniques that help in analysis and better comprehension of search results. We have also presented methods for generating insights about emerging and popular trends in research along with contextual differences between academic research and patenting profiles. We also present novel techniques to present topic evolution that helps users understand how a particular area has evolved over time.