Wednesday, July 30, 2008

Improving bubble sort

Most people believe that using Quick Sort or Merge Sort, would achieve the best sorting performance over a standard array. Even though this might be true in most (or at least some) cases, there are real world situations where a better solution can be applied.

Let's assume there's a big chance we are trying to sort an already sorted array, or almost-sorted array. What is an almost-sorted array? This is an array which is sorted beginning from some element which is not far from the first element.

Now, let's try to recall what bubble sort looks like:
for i = 1 to length(A)
  for j = length(A) downto i+1
   if A[j] < A[j-1] then
    swap A[j] with A[j-1]

This is a very simple implementation, but would cost O(n^2) even when we deal with an already sorted array. How could this be improved? Let's add a flag:
for i = 1 to length(A)
  swapped = false
  for j = length(A) downto i+1
   if A[j] < A[j-1] then
    swap A[j] with A[j-1]
    swapped = true
  if swapped = false then
   break

As you can probably see, once there's a full iteration in which no elements were swapped, the algorithm finishes. For an already sorted array, performance are now O(n). For an almost-sorted array (beginning item c), performance are O(cn). Not bad...

Monday, July 7, 2008

Dar the mailman

In the past few days, I've been working really hard on my final project. As mentioned before, it has something to do with e-mail and spam. The project requires me (actually us, I have a partner) to act both as a mail sender, and as a mail receiver. This is have to support any modern mail server, and also work with every major web mail providers (where most spam is usually met).


For some years I've known how to telnet to a SMTP server, and use it to send mail (which should be accounted as spam). When it comes to authorized mail servers, this is a bit trickier, as log-in mechanisms are required. Most use SSL or TLS, over semi-standard ports. So, I managed to write a python script to use gmail's SMTP server, using my gmail account (TODO: publish code). This made me happy, because I thought to myself it's gonna be easy. Well, not quite.


As it seems, the other major providers (hotmail, yahoo) would allow you to make telnet connection to their SMTP servers, but the log-in would fail. After a little research I found out that yahoo wants money for SMTP, and hotmail would simply not allow it altogether. Not good.


My next attempt was trying to use the mail provider's APIs. Google offers GData, while yahoo offers some other API. Microsoft offers only undocumented HTTPMail (webdav?) protocol. Once again, only the Google solution worked. Yahoo wants you to register your application, and your domain (hadn't bought one yet for the project) in order for the API to work. As for hotmail, well...

So, where does this leave us? It seems our last option is the one I've been trying to avoid: writing a parser (Perl or Python) for the web pages of the popular web mail providers, and normal SMTP and POP3/IMAP for other mail servers. Very cumbersome, error prone, hard labor and a waste of time.