Saturday, April 15, 2017

Generic relations in Django

While using django's content management system (admin), for adding different objects on it, you may want things like audit notes, that can be used for reference in the future or get some more info. you have two choice -


  1. Add a explicit field to the object, though the down side is you will need to add that extra field with all the objects where you want note.
  2. Create a generic model which can be used with different objects in your app. 

It's no brainer between above two choice, answer is 2nd. Generic relations are at help.

You can checkout django-contrib-comments, it's one of the example of generic relations. Let's take example of how Notes like foreign key reference can be added freely to any model in your app.

notes/models.py
from django.contrib.contenttypes.fields import GenericForeignKey
from django.contrib.contenttypes.models import ContentType

class Note(models.Model):
    note = models.TextField("note",
                            max_length=2000)
    date = models.DateTimeField(auto_now_add=True)
    # Below the mandatory fields for generic relation
    content_type = models.ForeignKey(
        ContentType, on_delete=models.CASCADE)
    object_id = models.PositiveIntegerField()
    content_object = GenericForeignKey('content_type', 'object_id')

    class Meta:
        verbose_name = 'Staff Note'

    def __unicode__(self):
        return self.date.strftime("%B %d, %Y")


Now, this Notes object can be referenced with any other model you want to use it with. Here is one example -

mymodel/models.py
from django.contrib.contenttypes.fields import GenericRelation
from notes.models import Note


class MyModel(models.Model):
    name = models.CharField("Name", max_length=100)
    notes = GenericRelation(Note)

When you run migration, it doesn't add anything in MyModel, as its generic relation it gets its references via django content type and object id.

How to add it in admin?

notes/admin.py
from django.contrib.contenttypes.admin import GenericTabularInline
from .models import Note

class NoteInline(GenericTabularInline):
    model = Note
    extra = 0


mymodel/admin.py
from misc.admin import NoteInline
from .models import MyModel

class MyModelAdmin(admin.ModelAdmin):
    inlines = [NoteInline, ]

admin.site.register(MyModel, MyModelAdmin)

And you are all set. It's easy to plug, efficient, and clean implementation. Good luck!

Django rest framework serializer with self-referential foreign key for comments

Django had default comments package sometime back (django.contrib.comments), since it's deprecated, there is external repo available for now.  It provides flat comments, so if you want threaded comments, you can use django-threadedcomments.

We are using django-rest-framework, and access the threaded comments via drf.

resources.py

class ObjectSerializer(serializers.ModelSerializer):

    comments = CommentsSerilizer(
        source='comments_set',
        many=True)

    class Meta:
        model = ObjectName
        fields = (
            "id",
            .....,
            "comments",
        )


serializers.py

from rest_framework import serializers
from .models import FluentComment

class RecursiveField(serializers.Serializer):
    def to_representation(self, value):
        serializer = self.parent.parent.__class__(
            value,
            context=self.context)
        return serializer.data

class CommentsSerilizer(serializers.ModelSerializer):
    children = RecursiveField(many=True)

    class Meta:
        model = FluentComment
        fields = (
            'comment',
            'url',
            'submit_date',
            'id',
            'children',
        )

Recursive Field is one way to get the self-referential objects via serializer, it handles parent-child relationship. Here is another way to do it -


class CommentsSerilizer(serializers.ModelSerializer):
    user = UserLightSerializer()

    class Meta:
        model = FluentComment
        fields = (
            'user',
            'comment',
            'submit_date',
            'id',
            'children',
        )

CommentsSerilizer._declared_fields[
     'children'] = CommentsSerilizer(many=True)


Above both options are fine, and it should do the magic. Good luck!

Wednesday, January 4, 2017

Limit celery concurrency workers to limit memory usage

Limit the number of concurrent workers for celery message processing.

We are using sentry for error logging and monitoring, and recently my sentry server moved from one shared server to another and with that it started crashing because of memory usage way over limit (shared server has limit of 1024MB and my celery worker was taking 2759MB).

When I saw the log, I was surprised looking at number of worker processes -

user - 66MB - 0:06:45 - 21356 - [celeryd: celery@ servername:Worker:MainProcess] -active- (celery worker -B)
user - 68MB - 0:06:41 - 21657 - gunicorn: worker [Sentry]
user - 69MB - 0:06:40 - 21658 - gunicorn: worker [Sentry]
user - 61MB - 0:06:40 - 21659 - [celery beat]
user - 61MB - 0:06:40 - 21660 - [celeryd: celery@servername:Worker-2]
user - 61MB - 0:06:40 - 21661 - [celeryd: celery@servername:Worker-3]
user - 61MB - 0:06:40 - 21662 - [celeryd: celery@servername:Worker-4]
user - 61MB - 0:06:40 - 21663 - [celeryd: celery@servername:Worker-5]
user - 61MB - 0:06:40 - 21664 - [celeryd: celery@servername:Worker-6]
user - 59MB - 0:06:40 - 21665 - [celeryd: celery@servername:Worker-7]
user - 59MB - 0:06:40 - 21666 - [celeryd: celery@servername:Worker-8]
user - 61MB - 0:06:40 - 21667 - [celeryd: celery@servername:Worker-9]
user - 59MB - 0:06:40 - 21668 - [celeryd: celery@servername:Worker-10]
user - 61MB - 0:06:40 - 21669 - [celeryd: celery@servername:Worker-11]
user - 61MB - 0:06:40 - 21670 - [celeryd: celery@servername:Worker-12]
user - 59MB - 0:06:40 - 21671 - [celeryd: celery@servername:Worker-13]
user - 61MB - 0:06:40 - 21672 - [celeryd: celery@servername:Worker-14]
user - 59MB - 0:06:40 - 21673 - [celeryd: celery@servername:Worker-15]
user - 59MB - 0:06:40 - 21675 - [celeryd: celery@servername:Worker-16]
user - 61MB - 0:06:40 - 21677 - [celeryd: celery@servername:Worker-17]
user - 61MB - 0:06:40 - 21678 - [celeryd: celery@servername:Worker-18]
.....
user - 61MB - 0:06:40 - 21678 - [celeryd: celery@servername:Worker-44]


Looking at above, obviously it was going to kill the process as the usage was going much higher than allowed.

It was happening because of default concurrent process setting for celery. As per the documentation -

-c--concurrency
Number of child processes processing the queue. The default is the number of CPUs available on your system.

In this case probably the number of CPU of the server was much higher. So the solution was to set the restriction by providing concurrency you want. 


> celery worker -B --concurrency=4

It resolves the issue. You can validate number of celery process by - 

> ps -ef | grep celeryd

Hope it helps!

Saturday, December 17, 2016

RabbitMQ monitoring on New Relic - Webfaction

We are using celery with Django for some time now, and it uses RabbitMQ for the messaging backend. Sometimes post server updates, RabbitMQ goes down or due to heavy message load it slows down, in those cases currently we don't have any visibility. In order to monitor the queue, and get the the notification/alert in case of any alarming situation, we have setup monitoring on New Relic. Here are the steps -

There are various choice for RabbitMQ monitoring on New Relic, but we used this one.

On your server run following command - 

1> pip install newrelic-plugin-agent

If it’s dedicated server and you have full access, get configuration file /opt/newrelic-plugin-agent/newrelic-plugin-agent.cfg to /etc/newrelic/newrelic-plugin-agent.cfg and edit the configuration in that file. In my case its shared webfaction server, so we installed on /home/username/newrelic/ folder. (we created newrelic folder on the home).

2> Once create the newrelic folder, copy this sample config file there.

Update the license key, user under Daemon (who has access to run the process, also make sure you give proper access to that user to the newrelic folder)

3> Update the settings for rabbitmq

 rabbitmq:
   name: PROD-RABBITMQ
   host: localhost
   port: 15672
   verify_ssl_cert: false
   username:
   password:
vhosts:
:
queues: []

In above vhosts is optional, if you have simple setup and not that many hosts and queues, you can skip that and it will monitor all.

4> To run in debug mode for testing -
newrelic-plugin-agent -c newrelic-plugin-agent.cfg -f

Once you see that your setup is running fine, start it in background mode by removing -f
newrelic-plugin-agent -c newrelic-plugin-agent.cfg
Notes -

It needs few things in order to this api calls to work for monitoring (Enable http UI for rabbitmq using rabbitmq_plugins)

> rabbitmq-plugins enable rabbitmq_management

Without enabling HTTP UI, it won’t be able to allow api access to monitoring service.

> once its enabled, you can try curl calls on /api on port 15672

Issues -

Getting nothing reported to newrelic

Reason -
curl -i -u username:password http://localhost:15672/api/vhosts
HTTP/1.1 401 Unauthorized
Server: MochiWeb/1.1 WebMachine/1.10.0 (never breaks eye contact)
Date: Fri, 16 Dec 2016 16:56:44 GMT
Content-Length: 57

{"error":"not_authorised","reason":"Not management user"}

Solution -
I had to create another user specifically for the monitoring, with administrator tag and proper permissioning as per the RabbitMQ documentation

Tuesday, June 21, 2016

Django 1.7, 1.8 - queryset.extra is deprecated, how to do group by on datetime with date

In Django 1.5/1.6 version -

signup_count = list(User.objects.filter(
            profile__user_type='learner').order_by(
                '-id').extra({
                             'date_only': "date(date_joined)"}).values(
                                 'date_only').annotate(
                                     signup_count=Count('id'))[:40])

It used to return list with
[{ 'date_only': , 'signup_count': }]

Now, as the extra is deprecated in newer version of Django, here the work around to get the same -

from django.db.models.expressions import Func

# Create custom sql function
class ExtractDateFunction(Func):

    function = "DATE"

signup_signup = list(User.objects.filter(
            profile__user_type='learner').order_by('-id')annotate(
date_only=ExtractDateFunction("date_joined")).values(
'date_only').annotate(
teacher_count=Count('id'))[:40])

This should give you the same results as before. 

Monday, June 6, 2016

There is no South database module 'south.db.mysql' for your database - Django


Django 1.8+

Recently I come across this error, while running my Django application 
python manage.py runserver

There is no South database module 'south.db.mysql' for your database. Please either choose a supported database, check for SOUTH_DATABASE_ADAPTER[S] settings, or remove South from INSTALLED_APPS.

To fix it, you would try to lookup for south.db.mysql or try to search if you SOUTH_DATABASE_ADAPTERS. But you won't find it in your solution. To fix it you have two choices -

1. Manually downgrade to lower Django version i.e.1.6 or so. 
pip install Django==1.6.10

2. Uninstall South from your environment (virtual environment) and move to built-in migration process. 
pip uninstall south


Good luck!

Saturday, January 16, 2016

Search in Django with Haystack using Solr or Elastic Search

Lets say you want to provide search on your Django application. In specific model, or file search on your media files or data files uploaded by users.

Here are tech solution for it -

HayStack - Modular search for Django
It allows querying on top of any search engine from following - Solr, ElasticSearch, Xapian, Whoosh.

Solr and ElasticSearch is built on top of powerful search server Apache Lucene. Both are free, and under Apache License 2.

Interesting presentation on Solr vs ElasticSearch

ElasticSearch is distributed, some functions in Solr doesn't not allow distributed execution. Easy cloud support with third party. easy to scale, add/remove nodes. ES is realtime and distributed.

Solr and ElasticSearch both provides admin page, in ES its called ElasticSearch-Head. ES also provides concept of GateWay, which allows index recovering if the system crash in any case.

Use ES if - index is big and real time, several indices, multi tenancy requirement, want to save administrative effort and cost.
Don’t use ES if - your company is relatively new, and already using Solr, or no real-time search indexing required,  relatively small indices

Utility other than ElasticSearch-Head, is ElasticSearch-bigdesk which provides analytics and charts.

Solr There are some concern when real time index updates and search queries been performed. For plain vanilla search Solr out performs and works much better than ES.

You can find more comparison here.

Solr is older than ElasticSearch, so it got bigger community and help available online. At the same time ElasticSearch was built in order to overcome the scaling limitation of Solr. ES is stable, though Solr is more mature.In terms of scalability, ElasticSearch is easy to scale compare to Solr, but with Solr 4.0 that limitation will be gone as per the documentation.

Sematext provides support for both Solr and ElasticSearch, you can find good overview and comparisons on various categories in this series of blog post by them.

and now the competition is joined by Amazon CloudSearch, applications which use AWS for hosting also seems widely using CloudSearch. Here is comparison between CloudSearch and Solr. There is no clear winner! Make choice based on requirement of your environment. Try to keep it simple, unless its really required.