Experimental chatbot
Chat with the bot
Click to chat with the bot. Read on for some technical info.
Background
Last spring I started working on a chat bot. I downloaded a dump of wikitionary and let the bot read it. From that it created a two graphs. One graph with syntactic rules (VERB->PREPOSITION->NOUN etc). The second graph could be searched to find associations between words (finding a path between ‘socialism’ and ‘color’ could generate: ‘socialism’->red’->’color’ etc).
It did show pretty cool behavior sometimes. But it had no idea how to answer to ‘How are you?’.
So the next step would be to add some everyday chat data into the mix. It turned out that it was very hard to find any public chat data that I could use for this. This is probably due to privacy issues. Even though I did find some corporas they were all not clean enough (e.g. IRC rooms with many people). I wanted peer to peer plain chat logs and gigabytes of them.
Then I ran into a chat bot called Cleverbot, and I was amazed how well it worked on every day phrases. So I though, I would try to do something similar to collect chat data for my bot.
Gamification of the chatting
On top of that I quickly wrote a flash client with a simple visualization of the bots ‘brain’. The more particles – the smarted bot. Particles actually doesn’t relate to the number of nodes (phrases) but rather to the strength of relationships in the graph.
What I really would want to do with the client is to gamify it, so that the user gets rewards from chatting to it and leveling it up. Sort of becoming a part of the development of the bot (just as parents feel involved with their kids are learning). Not there yet – this is just a preliminary test to reveal what I need to fix to make the model working.
Server
I had a good idea of how to build such a system, but wasn’t sure how to store the graphs so I entered ‘graph database’ into Goolge and Neo4j showed up, after reading up a bit I decided to give it a try. Using it went as a breeze. Usually when I deal with SQL databases it really gives me a headache. It’s really frustrating to think in rows in column for many problems. Neo4j suited my problem exactly and within hours I had it up and running. Each node is a phrase, and relations are the next phrase in a chat transcript. Relations are strengthen only when humans walk the path. Also there are prediction relations that points more then one node ahead. This will give the another hint of how to branch as a function of the context.
No content filter yet
I’ve not written any content filter yet so it will probably be rude say really ugly things and try to offend you. Don’t take it personally – it’s just those internet trolls thinking they are funny.
Slow?
I’m running it on a free Amazon Micro instance with very limited memory and CPU – so if it’s slow, don’t be surprised.
it says connecting to brain and wont work