Skip to main content

RegEx

Abbreviations regex and regexp denote regular expressions used in theoretical computing, programming, software development, word processing, and search engine optimization. Regular expressions can be used to describe strings and string numbers in a general logical way in order to find, replace, manipulate, or process them in documents, source code, or a database.

Example: In a regex-enabled text editor, all links contained in an HTML file are assumed to be displayed. If the expression [^ »] *» [^>] *> is entered in the editor's search function, all links that have the usual HTML link format will be displayed. The term [. * Performs the same task.]

General information

The logician and mathematician Stephen Kleene is considered the founder of regex. In 1956 he used annotations of regular quantities in an essay on the representation of events in neural networks and finite automata. This and other works are today fundamental fundamental principles in theoretical computing. Regular expressions are now used in various fields to simplify operations, which would otherwise be labor intensive and time consuming.

Regex It can be used depending on its implementation in various programming languages, environments and text editors, for example, in Perl, PHP, .NET or JavaScript as items in a library[1]. Or in EditPad, Emacs and Notepad ++ as a search and replace function in text editors. In the Google Analytics search engine, regular expressions are also used to filter traffic sources, define segments and separate the detailed data of the report from other data.

Functionality

The uses of regex they are extremely multiple. The possible regular expressions depend on the notation. There are different notations in different programming languages. These annotations are called shell pattern name, BRE (Basic Regular Expressions) and ERE (Extended Regular Expressions). The differences are sometimes due to the fact that individual characters and mostly metacharacters (control characters) are used in a programming language.

Generally, characters (terminals) and metacharacters are distinguished. Characters are recorded in the character set (the alphabet) that contains, for example, numbers, letters, and commas. Metacharacters are operations specified as toggle |, relation (), and and repeated with *, +, and?. With ^ the amounts can be negated. The metacharacters are instructions for the processing software. Regular characters can be in front of or behind metacharacters, their formal meaning will be different. Most implementations work with a special engine for regex which parses and interprets the listed regular expressions and checks resources for instances.

  • Regular character: All the numbers from 0 to 9. All the letters of an alphabet and some special characters (commas, hyphens, semicolons). Important: The alphabet depends on the character set used (for example, Unicode or ASCII).
  • Character classes: it is, for example, a number from 1 to 9. Whereas it would find all the tabs. Other options are l for lowercase, s for all spaces, or u for all uppercase letters.
  • Metacharacters:
 [] () {} | ? + - * ^ $  

With a backslash placed before, a metacharacter can be canceled.

Practical importance

The following methods can be implemented with regular expressions:

  • Pattern matching: Through the use of a string matching algorithm, texts can be checked for patterns. In this case, a regular expression represents a collection of strings with their occurrences reconciled in the text. The expression regex specifies the pattern, the engine checks the pattern against a resource (for example, an HTML document or text). In certain circumstances, a replacement rule can be specified to directly modify the found strings. Quantifiers can be used to narrow down the results. Examples: verifying an entered email address for its formal correctness, or searching for top-level domains in a list of URLs.
  • Globbing: File names are added to placeholders to choose all files in a particular format, for example. The wildcard "sample. *" Would find all files in a file management system that start with "sample", but are different file formats like .txt. o.doc. The asterisk represents the variety of file formats. Globbing is also used in denial of service attacks where servers are being intentionally overloaded.[2]
  • Truncation: In database searches, search terms are often abbreviated or truncated using wildcards. The term "sample *" would find all terms that begin with sample and end with other letters, such as sample match, sample test, or sample example. By truncation, the search space is expanded. Example: In a library search, all entries that contain a specific search definition can be found.
  • Stemming: When braking, different morphological variants of a word are attributed to the word stem. The negations and conjugations of words can thus be reduced to their root or linguistic root. This method is used in information retrieval (for example, through search engines) and in theoretical computing. Example: Google probably uses a similar procedure in the context of organic search[3].

Importance for search engine optimization

Regex can be very useful for some jobs in the area of search engine optimization[4] . Monitoring and analysis tools such as Google Analytics have an application for regex.[5]

In the Google Analytics search engine, regular expressions are used to determine filters for IP addresses. Individual filters can be defined in the profile settings excluding the IP addresses of one or more visitors. This way, traffic from a range of IP addresses will not be included in the reports. This is useful if you want to exclude irrelevant visits from the visit statistics, such as your own workers.

In addition, different segments can be processed in the Google Analytics search engine using regex. For example, searches that contain a brand name can be excluded. For this purpose, a segment would be defined, which includes only organic traffic and not the brand name that has been defined beforehand using regex: "[Mm] company example" for spellings with uppercase and lowercase letters. In addition, different types of keywords can be excluded to find out how much traffic is generated by two or three specific keywords. The same applies to traffic from other sources such as newsletters, emails, and link associations from external websites.

Such a tactic can be useful for monitoring social media channels. A font would be defined for this case by specifying the possible fonts with the regular expression. For example, "facebook | twitter | youtube | LinkedIn." Google Analytics is not the only thing that offers a series of alternatives that can be exploited with regular expressions[6]. Weblogs and server environments can also interpret and process regex. Thus, websites can be redirected and tagged as canonical by means of certain patterns that are described by regex.[7]

Web Links