0

I have a language L, which simply consists of strings that are URLs, and I need to design and implement a DFA that recognizes L. (e.g: www.test.com). My issue right now is, once you read in everything up to the 'www.', how would you know when to stop reading for the ".com"?

My code so far:

s = input("Would you like to input a string? y/n")
if(s == 'n'):
    exit
dfa = {'':{'w':'ww'}, 'w': {'w': 'ww'}, 'ww': {'w': 'www'},'www': {'.': 'www.'},"}}
def accepts(transitions,initial,accepting,s):
    state = initial
    for c in s:
        state = transitions[state][c]
    return state in accepting
accepts(dfa,0,{0},"www.hi.com")

Any help is appreciated! (Note that I'm temporarily borrowing a function from here just so I can understand the concepts at play.

Michele La Ferla
  • 6,775
  • 11
  • 53
  • 79
witcheR
  • 89
  • 4
  • 13
  • [Searching Google for "DFA in Python"](https://encrypted.google.com/search?q=DFA+in+Python) provides several examples immediately (including [one on Stack Overflow](http://stackoverflow.com/questions/35272592/how-are-finite-automata-implemented-in-code)). What was wrong with them? – Kevin J. Chase Feb 12 '17 at 21:13
  • I understand the concept better with that stack overflow you linked (not sure why I couldn't find it before), but applying it to characters over ints is confusing me. – witcheR Feb 13 '17 at 22:40

2 Answers2

1

A DFA is basically defined by a transition table. This transition table maps each (valid) combination of current state and current input to the corresponding successor state. Such a table can be modelled as a dictionary of dictionaries. For example: The outer dict contains the states as keys and dictionaries as values, those dictionaries in turn each have the valid inputs as keys and the successor state as value.

EDIT: Your chosen example is not ideal, in such that it has a fairly large alphabet (i.e. all possible input characters) of at least [a-zA-Z0-9], the linked answer limited itself to [01] for a reason ;-) Never the less here is how I would start out:

{
# in state '' we have not yet processed/consumed any input
# it is the start state
# the only valid input is a 'w'
'': {'w': 'w'},    

# in state 'w' we a have already consumed a 'w'
# the only valid input is another 'w'   
'w': {'w': 'ww'},

# in state 'ww' we have previously consumed 'ww'
# the only valid input is still only a 'w'  
'ww': {'w': 'www'},

# now the only valid input is a '.'
'www': {'.': 'www.'},

# this is where your example becomes unpractical:
# we have to add a transition for every valid input
# (you could get around this by using a defaultdict and some kind of special signal value, but im not quite sure you are up to that)
'www.': {'a': 'www.*', 'b': 'www.*', ...},

# I used the star in this state name to symbolize multiple (at least one) valid character
# we only leave this state if we encounter a '.' 
'www.*': {'.': 'www.*.', 'a': 'www.*', 'b': 'www.*', ...},

# it should be obvious how to continue from here 
'www.*.': ...
}

EDIT2: Implementation after chat.

from collections import defaultdict

dfa =  {
  'initial': defaultdict(lambda: 'invalid', w='w'),
  'w': defaultdict(lambda: 'invalid', w='ww'),
  'ww': defaultdict(lambda: 'invalid', w='www'),
  'www': defaultdict(lambda: 'invalid', [('.','www.')]),
  'www.': defaultdict(lambda: 'www.*', [('.','invalid')]),
  'www.*': defaultdict(lambda: 'www.*', [('.','www.*.')]),
  'www.*.': defaultdict(lambda: 'www.*', [('c','www.*.c')]),
  'www.*.c': defaultdict(lambda: 'www.*', [('o','www.*.co')]),
  'www.*.co': defaultdict(lambda: 'www.*', [('m','www.*.com'), ('.','www.*.')]),
  'www.*.com': defaultdict(lambda: 'www.*', [('.','www.*.')]),
  'invalid': defaultdict(lambda: 'invalid')
}
def accepts(transitions,initial,accepting,s):
    state = initial
    for c in s:
        state = transitions[state][c]
        print(c, '->', state)
    return state in accepting
print(accepts(dfa,'initial',{'www.*.com', 'www.*.co'},"www.hi.com"))
PeterE
  • 5,715
  • 5
  • 29
  • 51
  • but how does this relate to checking to see if a string is part of a language? – witcheR Feb 12 '17 at 23:47
  • the assignment specifies that the DFA should process the string character wise, so every character is an input, and the states represent the input previously consumed, in such a way that valid words (i.e. words in L) can be marked by selecting valid end states for the DFA. The solution here on stackoverflow that the others linked to, is quite nice. You just have to extent the alphabet to more than 0 and 1. – PeterE Feb 13 '17 at 07:39
  • @subzira Not quite, you process the input characterwise which is correct, but that means that in your transition table you must also only process single character inputs. Apart from that recommendation is to give the states more useful / speaking names, instead of simply numbering them let them symbolize what they represent. I will update my answer with the beginning of a possible transition table. – PeterE Feb 13 '17 at 22:52
  • @subzira You could implement it by adding a transition to state 'www.*.' which goes (back) to state 'www.*' for any input that is NOT 'c' and only 'c' goes on to 'www.*.c'. You would also have to add that logic for the next two inputs 'o' and 'm'. – PeterE Feb 13 '17 at 23:39
  • that seems like an excellent way of avoiding adding a lot of inner dictionaries to handle any int upper or lowercase letter input, but how would you make the DFA do something for anything OTHER than 1 character?. – witcheR Feb 13 '17 at 23:41
1

There is an answer here that explains how this is implemented, but you also ask why a dictionary of dictionaries can account for different states. So from that mentioned answer lets take this example:

dfa = {0:{'0':0, '1':1},
       1:{'0':2, '1':0},
       2:{'0':1, '1':2}}

As you can see the first dictionary contains the numbers 0, 1 and 2, which are dictionaries themselves. These are your states. Inside their dictionaries there is a character that your dfa will read, '0' or '1'. For those read characters it also gives you your next state.

For example:

  1. You start in state 0
  2. You read character '1'
  3. You go to state 1
Community
  • 1
  • 1
Leon Z.
  • 542
  • 1
  • 6
  • 11