Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Theory of Computation - Lecture Slides | CS 1010, Study notes of Computer Science

Material Type: Notes; Class: Introduction to Information Technology; Subject: Computer Science; University: University of Virginia; Term: Fall 2007;

Typology: Study notes

Pre 2010

Uploaded on 07/29/2009

koofers-user-18q
koofers-user-18q 🇺🇸

10 documents

1 / 9

Toggle sidebar

Related documents


Partial preview of the text

Download Theory of Computation - Lecture Slides | CS 1010 and more Study notes Computer Science in PDF only on Docsity! Introduction to Computer Science • Robert Sedgewick and Kevin Wayne • http://www.cs.Princeton.EDU/IntroCS 7: Theory of Computation 2 Introduction to Theoretical CS Two fundamental questions.  What can a computer do?  What can a computer do with limited resources? General approach.  Don't talk about specific machines or problems.  Consider minimal abstract machines.  Consider general classes of problems. Pentium IV running Linux kernel 2.4.22 3 Why Learn Theory In theory . . .  Deeper understanding of what is a computer and computing.  Foundation of all modern computers.  Pure science.  Philosophical implications. In practice . . .  Web search: theory of pattern matching.  Sequential circuits: theory of finite state automata.  Compilers: theory of context free grammars.  Cryptography: theory of computational complexity.  Data compression: theory of information. "In theory there is no difference between theory and practice. In practice there is." -Yogi Berra Introduction to Computer Science • Robert Sedgewick and Kevin Wayne • http://www.cs.Princeton.EDU/IntroCS Regular Expressions and DFAs a* | (a*ba*ba*ba*)* 0 21 b b a a a b 5 Pattern Matching Applications Test if a string matches some pattern.  Process natural language.  Scan for virus signatures.  Search for information using Google.  Access information in digital libraries.  Retrieve information from Lexis/Nexis.  Search-and-replace in a word processors.  Filter text (spam, NetNanny, Carnivore, malware).  Validate data-entry fields (dates, email, URL, credit card).  Search for markers in human genome using PROSITE patterns. Parse text files.  Compile a Java program.  Crawl and index the Web.  Read in data stored in TOY input file format.  Automatically create Java documentation from Javadoc comments. 6 Regular Expressions: Basic Operations Regular expression. Notation to specify a set of strings. every other stringaabaabaabaabConcatenation every other stringaaaababaab a(a|b)aab Parentheses (ab)*a ab*a aa | baab .u.u.u. Regular Expression ε abbbaa a ababababa ab ababa aa abbba Closure Union Wildcard Operation every other stringaabaab succubus tumultuous cumulus jugulum NoYes 7 Regular Expressions: Examples Regular expression. Notation is surprisingly expressive. b bb baabbbaa bbb aaa bbbaababbaa a* | (a*ba*ba*ba*)* multiple of three b’s 111111111 403982772 1000234 98701234 .*0.... fifth to last digits is 0 subspace subspecies raspberry crispbread .* spb .* contains the trigraph spb gcgcgg cggcggcggctg gcgcaggctg gcgctg gcgcggctg gcgcggaggctg gcg (cgg|agg)* ctg fragile X syndrome indicator Regular Expression NoYes 8 Generalized Regular Expressions Regular expressions are a standard programmer's tool.  Built in to Java, Perl, Unix, Python, . . . .  Additional operations typically added for convenience.  Ex: [a-e]+ is shorthand for (a|b|c|d|e)(a|b|c|d|e)*. 111111111 166-54-111 08540-1321 19072-5541 [0-9]{5}-[0-9]{4}Exactly k decaderhythm[^aeiou]{6}Negations camelCase 4illegal capitalized Word [A-Za-z][a-z]*Character classes ade bcde abcde abcbcde a(bc)+deOne or more Regular ExpressionOperation NoYes 17 Fundamental Questions Which languages CANNOT be described by any RE?  Bit strings with equal number of 0s and 1s.  Decimal strings that represent prime numbers.  Genomic strings that are Watson-Crick complemented palindromes.  Many more. . . . How can we extend REs to describe richer sets of strings?  Context free grammar (e.g., Java). Q. How can we make simple machines more powerful? Q. Are there any limits on what kinds of problems machines can solve? Reference: http://java.sun.com/docs/books/jls/second_edition/html/syntax.doc.html 18 Summary Programmer.  Regular expressions are a powerful pattern matching tool.  Implement regular expressions with finite state machines. Theoretician.  Regular expression is a compact description of a set of strings.  DFA is an abstract machine that solves pattern match problem for regular expressions.  DFAs and regular expressions have limitations. You. Practical application of core CS principles. Variations.  Terminology: DFA, FSA, FSM.  DFAs with output: Moore machines, Mealy machines. Introduction to Computer Science • Robert Sedgewick and Kevin Wayne • http://www.cs.Princeton.EDU/IntroCS 7.1: Extra Slides Y 0 mod 3 2 mod 3 0 1 mod 3 1 0 1 1 20 Pattern Matching in Google Google. Supports * for full word wildcard and | for union. 21 Pattern Matching in TiVo TiVo. WishList has very limited pattern matching. Reference: page 76, Hughes DirectTV TiVo manual 22 Order 3 Markov Model: bbbabbabbbbaba bbb bab bba abb aba 23 Fundamental Questions Which languages CANNOT be described by any RE?  Set of all bit strings with equal number of 0s and 1s.  Set of all bit strings of the form ww where w is some string.  Many more. . . . Are Java regular expressions more expressive than REs?  No: extra rules can be expressed in terms of minimal core.  One important exception: back references. – \1 matches whatever string that (.*) matches – words of the form ww where w is some word (beriberi, couscous) (.*)\1 not-so-regular expression 24 DFA Applications Software applications.  Implementing regular expressions.  Lexical analysis in compilers.  Parsing html in a web browser.  Controlling graphical user interfaces. Hardware applications.  Bounce filter.  Traffic lights.  Dishwashers.  Remote controls.  Computer microprocessors.  . . . 25 Limitations of FSA FSA are simple machines.  N states ⇒ can't "remember" more than N things.  Some languages require "remembering" more than N things. Theorem. No FSA can recognize the language of all bit strings with an equal number of 0's and 1's. A warmup exercise: 0 1 0 If 01xyz accepted then so is 00001xyz 0 26 Limitations of FSA Theorem. No FSA can recognize the language of all bit strings with an equal number of 0's and 1's.  Suppose an N-state FSA can recognize this language.  Consider following input: 0000000011111111  FSA must accept this string.  Some state x is revisited during first N+1 0's since only N states. 0000000011111111 x x  Machine would accept same string without intervening 0's. 000011111111  This string doesn't have an equal number of 0's and 1's. N+1 0's N+1 1's 27 Text Searching Build an FSA that accepts all strings that contain 'acat' as a substring.  tatgacatg  acacatg Start building: N N N N Y a c a t acatacaaca φ State name represents longest prefix of acat that input currently matches. 28 Text Searching Build an FSA that accepts all strings that contain 'acat' as a substring.  tatgacatg  acacatg Continue building: N N N N Y a c a t acgtcgt acatacaaca φ cgt
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved