Wednesday 12 February 2014

Finding What You're Looking For

Finding What You're Looking For

A previous blog entry looked at simple searching with find(). Now we will look deeper into building queries. This blog covers:
  • Numbers and searching ranges
  • Arrays and searching in them
  • Building logical expressions
  • Regular expressions for super flexible searching

Numbers and Ranges

Numeric data is inserted into MongoDB without quotes, just as we'd treat a number in any programming language:

db.eg.insert({Name : "Bill", Age : 18})
db.eg.insert({Name : "Ted", Age : 17})

and we can search for exact numbers:

db.eg.find({Age:17})

or those in a range:

db.eg.find({Age: {$lt : 18}})

Other useful operators include:

  • $lte - Less than or equal
  • $gt - Greater than
  • $gte - Greater than or equal
  • $ne - Not equal

Inserting and Searching Arrays

To search for each and any of an array of possible matches on the same field, you can specify an array of target values. This is better than a string of $or operators.

db.eg.find({Name : { $in : ["Bill" , "Ted"]}})

Note the [ ] square brackets to  denote the array.

You can insert an array as a value in a document:

db.eg.insert({Likes: ["Ice cream","Apples","Chocolate"]})

and search for one or more items in it:

db.eg.find({Likes: { $in: ["Apples","Gin"]}})

If you want to match a single term to the array, the single term still needs to be in an array of length one:

db.eg.find({Likes: { $in: ["Apples"]}})

The opposite of $in is $nin, which means not in.

Building Logical Expressions

I've already hinted at it: you can build logical expressions for a search using $and, $or, $not and $nor. A simple AND is implemented implicitly by MongoDB if you provide a comma separated list of matches:

db.eg.find({Name: "Bill", Age: 18})

is the same as

db.eg.find({$and: [{Name: "Bill"}, {Age: 18}]})

but we must be explicit if we want to use OR:

db.eg.find({$or: [{Name: "Bill"}, {Age: 17}]})

$Not does as you'd expect and $nor means not and not.

Finally, if you want to specify more complex matches on a single value, you can use regular expressions, which are described below:

Searching with Regular Expressions

You can build some reasonably complex searches with a mixture of AND, OR, and NOT, but logical expressions are less useful if you want to match certain classes of string. For example, a search to find all the names that include a number (Like Joe90 or Ben10). For this, we need regular expressions.

There are two ways to tell MongoDB you want to use a regular expression. One is to use the Javascript notation and the other is to use the MongoDB $regex operator.

The Javascript notation puts the regular expression between slashes, followed by any options. Here is the syntax:

/expression/options

and the $regex operator makes things explicit:

{ $regex: 'expression', $options: 'options' }

And now some examples from a collection called db.colc:


find({Name: /ev/}) Find names that contain "ev" anywhere
find({Name: /^K/i}) Find names that start with "K", ignoring case (K or k)
find({Name: /[abc]/}) Find names that contain "a" or "b" or "c" anywhere
find({Name: /\w{1,}\d{1,}/}) Find names that end with some numbers

It all looks a bit cryptic, but you get the idea - you can specify pretty much any pattern you want. To do it confidently off the top of your head takes a bit of practice, but its worthwhile as regular expressions have many other uses. Once you can do it in Javascript, for example, you can validate web forms before they are submitted.

A bit more information on regular expressions in MongoDB can be found here. For more details on the Javascript method of defining regular expressions, look here.




First Steps in MongoDB

First Steps in mongoDB

In this article, I'm going to use the mongo shell to play with some simple DB queries. See the post on installation if you don't yet have mongo installed. Here is what I'll be doing today
  • Starting a new collection to store some simple documents
  • Putting some documents into the collection
  • Listing the contents of the collection
  • Searching the collection for documents with certain values (e.g Male)
  • Searching the collection for documents with certain keys (e.g.Gender)
  • Removing a document from the collection
Later posts will cover more complex searching using regular expressions and tasks such as listing all keys in a collection.

Get a Collection

A collection is somewhere to put a set of related documents. The relationship is loose - not like a relational database, just things that belong together. You can create one like this:

db.creatCollection(colname)

where colname is the name you want to give your new collection, and you refer to existing collections as:

db.colname

For example,

db.createCollection("books")

and you refer to the collection as

db.books

Note the " quotes around the creation (it is a string here) but not around the reference, where it is a database object. Also, you do not have to create the collection before using it.

Not sure what collections your database contains? No problem:

show collections

will tell you.

You can just insert data into a collection that doesn't exist, and one will be created. How to do that is next:

Insert a Document

No surprises here, you insert a document using

db.books.insert(doc)

where doc is defined as a set of key: value pairs like this:

doc={ k1:v1, k2:v2 ... }

for example,

db.books.insert({Name:"On the Road", Author: "Jack Kerouac"})

Or, if we first say

b= db.books

then we can just use

b.insert (.....)

as a shorter way to reference the collection.

Perhaps the next book doesn't have an author, but an editor, so:

db.books.insert({Name:"Collected Poems", Editor: "J.E Bowles"})

Notice, we haven't defined a table structure - we just add documents in this style and new keys are added as needed.

List the Contents of a Collection

The find() command is used to retrieve data from a MongoDB:

b.find()

returns every entry in the collection. We will learn to iterate over rows later, for now just see that this produces:

> db.books.find()
{ "_id" : ObjectId("52658e0a84b47fef69ebab5f"), "Name" : "On the Road", "Author" : "Jack Kerouac" }
{ "_id" : ObjectId("52658e4984b47fef69ebab60"), "Name" : "Collected Poems", "Editor" : "J.E Bowles" }

Note the _id given to each document in the collection - MongoDB assigns a unique ID to each one as they are entered.

Search for Certain Values

Searches are defined in much the same way as data is entered, by specifying key: value pairs:

b.find({Name : "Collected Poems"})

Searches for the book with the Name of "Collected Poems". We will look at more complex searches in another blog.

Search for Certain Key

Perhaps you want to know all the books with an editor, but don't know any of the editors' names. You can search for documents with a certain key like this:

db.books.find( { Editor:{ "$exists":"true"}})

This is our first peek at the $, which prefixes a set of MongoDB operators, described here. We will look at them in a future blog.

Removing a Document

To remove a document from a collection, use

db.books.remove({Editor:"J.E. Bowles"})

Note that the syntax is the same as for find (except the word remove replaces the word find, of course).

Well, that's it for now. Next stop, more detailed searching.

Monday 2 December 2013

Python for MongoDB

Python For MongoDB

This isn't a tutorial on programming Python. If you're new to the language, there are plenty of online resources to get you going. Many are listed here. I've already covered installing Python and the MongoDB package PyMongo, so what are we doing here? Well, two things:
  • A quick look at the bits of Python that are useful for use with MongoDB 
  • Some examples of connecting and manipulating a MongoDB from Python

JSON, dict and find()

MongoDB stores data in binary JSON format. JSON, if you haven't seen it, is a format for storing data in a human readable form. There is a nice quick introduction to it here. You don't need any particular software to use it, though it is supported in pretty much any programming language you might pick. JSON objects are defined as either name:value pairs or arrays. The value part of a name:value pair can be an array or a set of name:value pairs and an array can contain name:value pairs. Some examples will clarify:

Here we define a single object and its properties:

{"Name":"Tom","Animal":"Cat","Activities":["Chasing Mice","Exploding"]}

The object is enclosed in curly braces {}, the name:value pairs are enclosed in double quotes "" and the array is enclosed in square brackets []. Values can be objects themselves:

{"Cartoon":"Roadrunner", "Main Character":{"Name":"Roadrunner","Speed":"Fast"}}

Values can also be arrays of objects:

{"Cartoon":"Roadrunner", "Character":[{"Name":"Roadrunner","Speed":"Fast"},{"Name":"Wiley Coyote","Speed":"Slow"]}

JSON supports a number of data types, not just strings as used above. They are string, number and boolean. Objects and arrays are also data types, and there is a null type, which means the value is empty.

Okay, so what? Well, Python has a data structure known as a dictionary, which pretty much mimics the JSON format.

Let's switch to Python now. Run your chosen Python program (I suggested PythonWin for Windows, and that is what I'll use here). Most offer an interactive window for testing short bits of code and that is what we will use.

You can define a dictionary like this:

toon = {"Name":"Tom","Animal":"Cat","Activities":["Chasing Mice","Exploding"]}

There is also the dict() constructor, which takes an array of key value pairs:

toon = dict([("Name","Tom"),("Animal","Cat"),("Activities",["Chasing Mice","Exploding"])])

And you can extract named elements like this:

toon["Name"]

which would return "Tom". There are other useful things you can do with a dictionary - some are explained here.

Finally, we stay with Python, but start using PyMongo to interact with the database. First, we need to import the PyMongo library and get a connection to the database:


import pymongo# Import the pymongo library
from pymongo import MongoClient# and the MongoClient
client = MongoClient()# Get a handle on the Mongo client - see note below
db = client.db_name# Where dbname is the name of the database you want
col = db.colname# Get a link to the collection you want to work with
Note - when you get the client, you might need to include authorisation details as follows:

client = MongoClient(databaseURL)
client.kms.authenticate(username,password)

We can insert our toon dictionary like this:

col.insert(toon)

and search it like this:

col.find({"Name":"Tom"})
col.find({"Activities" : {"$in": ["Exploding"]}})

The find() method returns a cursor which you must iterate through to access each of the documents that a search produces (even if there is only one). This is easy to do:

for doc in col.find():
   print(d)    # Or do something else with d

So d is a variable which takes the value of each document returned by the find() in turn. Looking at the find() examples, you should note that the key names ("Name" for example) are in double quotes, as is the $in.


Thursday 24 October 2013

Installing the Software

A Sandpit

Let's build local sandpit first, you can run MongoDB and Python on an Apache web server to drive web content, but we'll come to that later. Step one is something to play with one your own PC. We will need:

  • A MongoDB server and client
  • The ability to run and develop Python scripts
  • A connection between Python and MongoDB
So let's install three things.

MongoDB

Installing MongoDB is easy. I'm doing it on a Windows PC, but there are versions of other OS choices too. Download yours here. Install instructions are given, but it just involves putting the executables in a directory of your choice.

There are two files of immediate interest. mongod, which is the database daemon and mongo provides a client shell. These are best run from the command line, so if you are using windows, get two running using the cmd program. Run mongod and leave it running. In another window, run mongo and you will see a shell into which you can type commands. The first thing I'd suggest you try is typing

> tutorial

As this runs a nice introductory tutorial.

More on what else to do with your shiny new DB later ...

Python

Once again, there are a number of OS choices for downloading Python. You can see them all here. Pick the right one for your OS (make a note of which you choose, including the 32 or 64 bit choice).

As a windows user, I also chose PyWin from here. This provides a simple environment for editing and running Python scripts and an interactive window for typing Python commands. Make sure you download the right one for the version of Python you installed (including 32 or 64 bit).

Now all we need is ...

PyMongo

PyMongo allows your Python scripts to talk to your MongoDB database. You can get it here. Got it? Cool.

Python has a nice method for installing packages. There is a script called easy_install (you'll find it in the Python\Scripts folder). In a command window, go to the Python scripts folder and run

easy_install pymongo

Then start your Python editor (PythonWin, for example).

The next three blogs look at first steps in MongoDB, Python and PyMongo.



Tuesday 22 October 2013

Welcome

A Series of Blogs on MongoDB and Python

This is a blog about MongoDB and Python. I'm using them for a new project and I'll document what I learn as I go. MongoDB is a NoSQL database. An unfortunate set of naming decisions, but there you go. NoSQL doesn't even mean No SQL, it means not relational - an alternative to relational database models.

The MongoDB model involves storing data as collections of documents. There are no tables - one difference with the relational model. The definition of a document, while specific in the technical sense, is more general than the English language definition. It needn't be a written document, such as a letter or article or blog entry (though it could be). A document can contain data for any attributes, and each document in a collection can store data about different attributes from other documents in the same collection. Documents are stored as binary JSON objects (BSON), but we'll worry about that later.

For example, we might store details of people:

People
Name:Tom, Gender:Male
Name:Sally, Gender:Female

Now Tom and Sally are documents in the collection, People. So far, so like a relational table, except we can add more data such as:

Name:Jim, Age 25

Note, we don't store Jim's gender but we do store a new attribute - Age. This can be difficult in a relational table where we need to define all the fields at the start (or keep adding fields to every row each time a new one is needed, which is not very nice).

Python has a data structure called a dictionary, which is essentially an associative array. Dictionary objects can be populated from JSON objects, and MongoDB queries in Python can return dictionary objects, so it all works very nicely together.

In the next blog, I'll get all the software we need installed.