This is a sample wiki engine, written in Python. It’s meant as a learning aid, not a real tool: it lacks most functionalities, can serve only to one user at a time and stores all the page contents in memory – so they are gone when you restart it.
#!/usr/bin/python
# -*- coding: utf-8 -*-
import BaseHTTPServer, urllib, re
class Handler(BaseHTTPServer.BaseHTTPRequestHandler):
template = u"""<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd"><html><head><title>%s</title>
</head><body><h1>%s</h1><pre>%s</pre><form action="" method="POST"
class="editor"><div><textarea name="text">%s</textarea><input type="submit"
value="Save"></div></form></body></html>"""
def escape_html(self, text):
"""Replace special HTML characters with HTML entities"""
return text.replace(
"&", "&").replace(">", ">").replace("<", "<")
def link_repl(self, match):
"""Return HTML for link"""
title = match.group(1)
if title in self.server.pages:
return u"""<a href="%s">%s</a>""" % (title, title)
return u"""%s<a href="%s">?</a>""" % (title, title)
def do_HEAD(self):
"""Send response headers"""
self.send_response(200)
self.send_header("content-type", "text/html;charset=utf-8")
self.end_headers()
def do_GET(self):
"""Send page text"""
self.do_HEAD()
page = self.escape_html(urllib.unquote(self.path.strip('/')))
text = self.escape_html(self.server.pages.get(page, "Empty..."))
parsed = re.sub(r"\[\[([^]]+)\]\]", self.link_repl, text)
self.wfile.write(self.template % (page, page, parsed, text))
def do_POST(self):
"""Save new page text and display it"""
length = int(self.headers.getheader('content-length'))
if length:
text = self.rfile.read(length)
page = self.escape_html(urllib.unquote(self.path.strip('/')))
self.server.pages[page] = urllib.unquote_plus(text[5:])
self.do_GET()
if __name__ == '__main__':
server = BaseHTTPServer.HTTPServer(("127.0.0.1", 8080), Handler)
server.pages = {}
server.serve_forever()
To try this wiki, just run it with a Python interpreter on your computer, and point your web browser to http://127.0.0.1:8080.
This engine uses build in web server from Python’s standard library, BaseHTTPServer, so that you don’t need to setup your own web server or look for hosting services just to play with it. We provide this server with a custom request handler, that supports three kind of requests:
- HEAD request is done by your web browser to check if a page exist, if it changed recently, how large it is etc. In this code it is handled by the
do_HEAD method, and we do only the absolute minimum required by the HTTP protocol:- we send the response code 200 (that means “everything is alright”),
- we say what kind of data the page contains (in this case, HTML web page with Unicode characters),
- and we send a marker for the end of headers (just an empty line).
- GET request is performed when your browser needs to download the page contents, so that it can display it.
- We need to send the headers, just like in the previous case, but we also need to send the page contents.
- First, we determine the name of the page.
- The attribute
self.path contains the part of the URL without the server name. - Because it was inside an URL, all characters not normally allowed in there had to be encoded in the form of
%XX, where XX is the number of the character. We use the function urllib.unquote to convert it back into normal characters. - Now, because our page title is going to be displayed inside HTML, it can’t contain characters like
<, > or &, because they could be confused for part of the markup. That’s why we use function escape_html to convert these three characters into form that is allowed inside HTML: <, > and &.
- Next step is to retrieve the actual text of the specified page – we try to take it from our
server.pages dictionary, and fall back to “Empty…” if no such page was created yet. - Since the page’s text is going to be displayed as a part of HTML, we have to escape it too.
- We must find all the links in the page text, and convert them into the actual HTML markup for the links.
- The function
re.sub searches for all occurrences of the specified pattern, and replaces them with the results of calling the link_repl function. - The function
link_repl takes the contents of first parenthesis in our link pattern (that is, everything that was between [[ and ]]), and checks if a page with this name already exists. Then it returns either link to existing or non-existing page (one with a question mark).
- Finally we just put it all together, using our
template, and send it to the browser.
- POST requests are made when you fill a form and send it to the server. In our case this means that you edited the page and clicked “Save”. We need to update the page’s contents and then display the page normally.
- First, we check how long the form data is, so that we can read it all. If the length is zero, we just skip the whole saving step.
- We read the data that was sent from the browser.
- We determine the page’s title, just like before.
- Now we have to decode the data. It’s encoded just like the path in the URL, only in addition all the spaces are replaced with
+ characters. Also, because the text field in the form where the page’s text was is called “text”, the data will contain text= at the beginning – that’s why we skip the first 5 characters. - We store the new page text in the dictionary and serve the page normally, like we did with GET.
There is a lot of space for improvement:
- We can store the last modification time of the pages, and send it in the headers – then the browser will know if that page changed since it was last downloaded. If the page didn’t change, the browser won’t download it again but show the cached version instead.
- We can send the page size in the headers, so that the browser can display a progress bar when downloading.
- We can send varying response codes depending on what actually happens. For example, if the page doesn’t exist yet, we can send code 404 (Not found). When the page is first created, we can send 201 (Created), etc. This kind of additional information can be very helpful, for example web indexing spiders will know to not save all the “Empty page…” pages.
- If you save the page, then follow some link and hit “back”, your browser will attempt to save the page again (and sometimes display an annoying message). It does that, because it remembers not only the URL of the page, but also the form data that was sent with it. We can easily prevent this: instead of sending the page’s contents after it was saved, we can respond with a redirect: a code of 301 or 303, and a new address to redirect to – in this case the same URL, but without the POST data. Then the web browser will only remember the new page, without the form data.
- You can remember all the changes made, together with dates and the address from which they came, and display them on a special “recent changes” page.
- You can add more markup rules than just the simple links. Then, if parsing the page text when you send it takes too long, you can also store the “parsed” versions, and only parse when something was changed or when the user requests a refresh (you can recognize it by checking the request headers).
- You can move the editor form to a separate page. In order to do that, you need a way of recognizing if the user wants normal page or editor: you can do that by adding something to the page name, like “edit”.
- Once you have one such special action, you can add more, for example searching.
- You don’t have to just replace the text of saved page: you can keep the old versions of pages too. Then you can display page history and differences between versions.
- This engine can only serve one page to one user at a time: if more people uses the wiki, they have to wait in a queue. This is not a big problem, but you can improve the engine to serve several pages at a time. Remember to add page locking, so that when two people hit “Save” at the same time, the page text is not destroyed!
- Obviously it would be nice to have the pages actually saved somewhere, either into files or a database.
- And lots more…
I’m also writing down a detailed process in which I came up with this code (minus obvious errors and some frustration with empty POSTs) at Step By Step Wiki Engine.