posted on November 12th, 2008 by Greg in Personal Projects
I’ve been curious to learn more about screen scraping for some time. And then I heard about
a python script that is great for parsing html. Since I’ve also been learning python, I thought now was the perfect time to explore some scraping.
In the past I had some trouble with using php to parse the magic the gathering official site for new card info when working on my
mtg card database. I didn’t spend much time trying to figure that out, but using python I didn’t have a problem.
After copying Beautiful Soup to my python path I started typing in some python at the command line.
from BeautifulSoup import BeautifulSoup as BSoup
import urllib
url = 'http://ww2.wizards.com/gatherer/Index.aspx?setfilter=Shards%20of%20Alara&output=Spoiler'
html = urllib.urlopen(url).read()
soup = BSoup(html)
for tr in soup.fetch('tr'):
if tr.td:
print tr.td.string
This would output all of the magic card names on the page (and some other stuff). Here is another example: getting image urls when knowing the value of the id attribute on the img tags.
url = 'http://ww2.wizards.com/gatherer/CardDetails.aspx?&id=175000'
html = urllib.urlopen(url).read()
soup = BSoup(html)
for img in soup.findAll(id='_imgCardImage'):
print img['src']
With a little more time cooking the soups I could get all the cards and their images and fill up my database. I just have to find the time now.
posted on October 14th, 2008 by Greg in Personal Projects
I ran into some trouble with python2.4 and the django code I was using. The previous server had 2.5 and I didn’t notice any problems, so I tried upgrading to 2.5 and
changing which version of python Debian uses as default (this was on Debian Etch). I was having some difficulty getting a few of the site-packages to work with 2.5 by default (like mod_python), so I decided to move to Debian Lenny even though it isn’t as supported. While doing that I ran into a problem where it doesn’t work well with xfs and Amazon’s Elastic Block Store.
They are looking into the matter, but while trying to figure that out, I realized that AWS doesn’t come with support. There is an extra package you have to purchase which starts at $100 a month.
That made Amazon look less awesome since I know I am going to need some support at some point. I decided to compare prices and features around again. I ended up revisiting
Slicehost since I knew a lot more about setting up a server than I did before.
I posted
the steps that I took to set up apache, mysql, django, and a few other things on a clean ubuntu machine on Code Spatter.
Now I have a
WebFaction account for testing and subversion hosting and I’m using the Slicehost account for the live version of the site.
Subversion makes it easy to commit on one server and update on the other once it is stable. I should explore a distributed version control system like git since it might help out with this in the future.
Update October 21, 2008
The
AWS developer community seems to be a good alternative to having direct support from amazon. The people there are knowledgeable and amazon reps post frequently. Here is a quote from someone at amazon about
the issue I was having
We are still investigating the issue and will post an analysis a little later and a workaround. Basically the problem revolves around the interaction between very specific kernel versions, XFS and our version of Xen.
Even though my slice is running fine, I will still be keeping AWS in mind.
posted on September 23rd, 2008 by Greg in Personal Projects
Yesterday I dove into amazon’s web services to check it out as a solution for a project I’m working on. I followed a guide to setup django development server on a default amazon machine image to start off. Then I decided to go with a debian AMI and do a full production server. I used apt-get to install the newest versions of apache, python, mysql, mod_python, svn, and some others. Debian turned out to be a lot easier than some other flavors of linux I have used.
After getting the instance configured the way I wanted it, I saved an image of it to my storage bucket so I could bring it up at any time instead of paying ten cents an hour until I need it.
A recent post updates the Amazon Adventure.
posted on August 27th, 2008 by Greg in Personal Projects
I was learning
python and django earlier to build a social network. So far, I have created the ability for users to
- create an account with e-mail activation
- login/out
- add other users as friends and confirm friendship that other users requested
- send/reply/forward messages
This was the base for a niche social network to be built upon.
Soon after completing those features, I discovered
elgg. It’s an open source social network written in php. It can do all of those features and more. I am now looking into using that and modifying it for the original goal.
We’ve gone back to django since elgg wasn’t the easiest thing to modify. I was hoping they might have used a common php framework like cake or code igniter. More on the django developments in another post soon. On CodeSpatter I have posted about what I learned about
Python, PIL, and Django working together.
Update November 12, 2008
If you are looking for an
Open Source Social Network written in Django,
Pinax is looking really good right now. They have combined many reusable django apps into one slick project.
Cloud27 is set up as an example of all the features included in Pinax. The
contact importing feature is one that I will be adding to my social app that I built before having knowledge of Pinax.
posted on April 1st, 2008 by Greg in Personal Projects
Code Spatter is a personal project that I started when I thought it would be useful to have a Weblog about projects and other things involving web development to be used by myself and other co-workers. It was also a chance to use
CyTE for a practical application and start development on
MorfU. Both are open source projects that I develop for.
Read the rest of this entry »
posted on April 1st, 2008 by Greg in Personal Projects
The Guild
Tragedy was a guild in
World of Warcraft that had up to 40 members in a single raid event as often as 4-5 nights a week. There was a lot of information that needed to be saved from the raids. It was important to know which members attended them and which monsters were defeated that evening. The monsters would drop loot and it was necessary to know who received the loot. There was a game modification that would store all of this data, but there wasn’t an easy way to get this information onto the website.
Read the rest of this entry »
posted on April 1st, 2008 by Greg in Personal Projects
All of my web development experience started with Pyrodius.com. I learned PHP and MySQL to allow the website to dynamically add movie reviews to the website and allow users to post their own reviews. I created a blog to display the news of the website before I even heard the word “blog”.
At the moment the site doesn’t have any activity, but I will still use it to learn and test out new software or ideas. I have installed a few versions of phpBB and MediaWiki to test out various ideas. CyTE is also installed there for testing.
There are many things I aspire to do with the site, however other projects have taken priority for the time being.
posted on April 1st, 2008 by Greg in Personal Projects
MorfU is a project that I conceived that will combine all of the features of wikis, blogs, and forums. The name is an anagram of forum and is pronounced like morph you.
Current the only development that has been done on it is with Codespatter which only has limited blog functionalities.
This is a module for CyTE that should be able to be packaged with any installation seamlessly. There hasn’t been much else to test this with as of yet.
posted on March 24th, 2008 by Greg in Personal Projects
I’ve gone through part of the Django tutorial. I installed the latest copy of Django, Python, and MySQL on my desktop (windows environment) and followed the tutorial through the first three sections. I’ve started to become familiar with the data models and the admin interface.
At the moment I am liking Django’s admin interface that is created by default a little bit more than the scaffolding that can be used with Ruby on Rails. As far as comparing Ruby and Python I still don’t know enough about either language to make a decision.
Update 7/24/8
Going outside of the tutorial, I created a few views and templates to get the basic idea. Using the Django Authentication module’s User model, I displayed a few things and plowed through a few of my own mistakes. I’m enjoying learning this.
Update 7/30/8
Finished the tutorial and moved on to create my own interface for Django’s Authentication. The app can create users, log them in and out, and list them. Simple enough, but I got the hang of the templates and form helpers.
More in this later post.
posted on March 20th, 2008 by Greg in Personal Projects
The Cyberia Template Engine is a project that was developed after Thomas Welfley and I wanted to expand upon the basic template engine used in Valhalla. We expanded on the idea of content slots to instead use keys which could return content as well as additional keys. This would allow the site to be broken into small reusable pieces and reduce the amount of duplicated code. There is also a post handler aspect that will help with error checking and collecting form data.
For this project I have created the database abstraction layer, the authorization system, and added the ability to package modules easily for future developers that may use the platform.
I have used this template engine on a few sites and in the process I have gathered an abundant repository of useful functions that will be packed with the release.