python - Celery, Django and Scrapy: error importing from django app -


i'm using celery (and django-celery) allow user launch periodic scrapes through django admin. part of larger project i've boiled issue down minimal example.

firstly, celery/celerybeat running daemonized. if instead run them celery -a evofrontend worker -b -l info django project dir i no issues weirdly.

when run celery/celerybeat daemons strange import error:

[2016-01-06 03:05:12,292: error/mainprocess] task evosched.tasks.scrapingtask[e18450ad-4dc3-47a0-b03d-4381a0e65c31] raised unexpected: importerror('no module named myutils',) traceback (most recent call last):   file "/home/lee/desktop/pyco/evo-scraping-min/venv/local/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task     r = retval = fun(*args, **kwargs)   file "/home/lee/desktop/pyco/evo-scraping-min/venv/local/lib/python2.7/site-packages/celery/app/trace.py", line 438, in __protected_call__     return self.run(*args, **kwargs)   file "evosched/tasks.py", line 35, in scrapingtask     cs = crawlerscript('testspider', scrapy_settings)   file "evosched/tasks.py", line 13, in __init__     self.crawler = crawlerprocess(scrapy_settings)   file "/home/lee/desktop/pyco/evo-scraping-min/venv/local/lib/python2.7/site-packages/scrapy/crawler.py", line 209, in __init__     super(crawlerprocess, self).__init__(settings)   file "/home/lee/desktop/pyco/evo-scraping-min/venv/local/lib/python2.7/site-packages/scrapy/crawler.py", line 115, in __init__     self.spider_loader = _get_spider_loader(settings)   file "/home/lee/desktop/pyco/evo-scraping-min/venv/local/lib/python2.7/site-packages/scrapy/crawler.py", line 296, in _get_spider_loader     return loader_cls.from_settings(settings.frozencopy())   file "/home/lee/desktop/pyco/evo-scraping-min/venv/local/lib/python2.7/site-packages/scrapy/spiderloader.py", line 30, in from_settings     return cls(settings)   file "/home/lee/desktop/pyco/evo-scraping-min/venv/local/lib/python2.7/site-packages/scrapy/spiderloader.py", line 21, in __init__     module in walk_modules(name):   file "/home/lee/desktop/pyco/evo-scraping-min/venv/local/lib/python2.7/site-packages/scrapy/utils/misc.py", line 71, in walk_modules     submod = import_module(fullpath)   file "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module     __import__(name)   file "retail/spiders/retail_spider.py", line 16, in <module> importerror: no module named myutils 

i.e. spider having issues importing django project app despite adding relevant things syslog, , doing django.setup().

my hunch may caused " circular import" during initialization, i'm not sure (see here notes on same error)

celery daemon config

for completeness celeryd , celerybeat configuration scripts are:

# /etc/default/celeryd celeryd_nodes="worker1"  celery_bin="/home/lee/desktop/pyco/evo-scraping-min/venv/bin/celery"  celery_app="evofrontend" django_settings_module="evofrontend.settings"  celeryd_chdir="/home/lee/desktop/pyco/evo-scraping-min/evofrontend"  celeryd_opts="--concurrency=1"  # workers should run unprivileged user. celeryd_user="lee" celeryd_group="lee"  celery_create_dirs=1 

and

# /etc/default/celerybeat  celery_bin="/home/lee/desktop/pyco/evo-scraping-min/venv/bin/celery"  celery_app="evofrontend" celerybeat_chdir="/home/lee/desktop/pyco/evo-scraping-min/evofrontend/"  # django settings module export django_settings_module="evofrontend.settings" 

they largely based on the generic ones, django settings thrown in , using celery bin in virtualenv rather system.

i'm using init.d scripts the generic ones.

project structure

as project: lives @ /home/lee/desktop/pyco/evo-scraping-min. files under have ownership lee:lee. dir contains both scrapy (evo-retail) , django (evofrontend) project live under , complete tree structure looks like

├── evofrontend │   ├── db.sqlite3 │   ├── evofrontend │   │   ├── celery.py │   │   ├── __init__.py │   │   ├── settings.py │   │   ├── urls.py │   │   └── wsgi.py │   ├── evosched │   │   ├── __init__.py │   │   ├── myutils.py │   │   └── tasks.py │   └── manage.py └── evo-retail     └── retail         ├── logs         ├── retail         │   ├── __init__.py         │   ├── settings.py         │   └── spiders         │       ├── __init__.py         │       └── retail_spider.py         └── scrapy.cfg 

django project relevant files

now relevant files: evofrontend/evofrontend/celery.py looks like

# evofrontend/evofrontend/celery.py __future__ import absolute_import import os celery import celery  # set default django settings module 'celery' program. os.environ.setdefault('django_settings_module', 'evofrontend.settings')  django.conf import settings  app = celery('evofrontend')  # using string here means worker not have # pickle object when using windows. app.config_from_object('django.conf:settings') app.autodiscover_tasks(lambda: settings.installed_apps) 

the potentially relevant settings django settings file, evofrontend/evofrontend/settings.py are

import os base_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) project_root = os.path.abspath(os.path.join(os.path.dirname(__file__), os.pardir))  installed_apps = (     ...     'djcelery',     'evosched', )  # celery settings broker_url = 'amqp://guest:guest@localhost//' celery_accept_content = ['json'] celery_task_serializer = 'json' celery_result_serializer = 'json' celery_timezone = 'europe/london' celeryd_max_tasks_per_child = 1  # each worker killed after 1 task, prevents issues reactor not being restartable # use django-celery backend database celery_result_backend = 'djcelery.backends.database:databasebackend' # set periodic task celerybeat_scheduler = "djcelery.schedulers.databasescheduler" 

the tasks.py in scheduling app, evosched, looks (it launches scrapy spider using relevant settings after changing dir)

# evofrontend/evosched/tasks.py __future__ import absolute_import celery import shared_task celery.utils.log import get_task_logger logger = get_task_logger(__name__) import os scrapy.crawler import crawlerprocess scrapy.utils.project import get_project_settings django.conf import settings django_settings   class crawlerscript(object):     def __init__(self, spider, scrapy_settings):         self.crawler = crawlerprocess(scrapy_settings)         self.spider = spider  # string      def run(self, **kwargs):         # pass kwargs (usually command line args) crawler         self.crawler.crawl(self.spider, **kwargs)         self.crawler.start()   @shared_task def scrapingtask(**kwargs):      logger.info("start scrape...")      # scrapy.cfg file here pointing settings...     base_dir = django_settings.base_dir     os.chdir(os.path.join(base_dir, '..', 'evo-retail/retail'))     scrapy_settings = get_project_settings()      # run crawler     cs = crawlerscript('testspider', scrapy_settings)     cs.run(**kwargs) 

the evofrontend/evosched/myutils.py contains (in min example):

 # evofrontend/evosched/myutils.py  scrapy_xhr_headers = 'something' 

scrapy project relevant files

in complete scrapy project settings file looks like

# evo-retail/retail/retail/settings.py bot_name = 'retail'  import os project_root = os.path.dirname(os.path.abspath(__file__))  spider_modules = ['retail.spiders'] newspider_module = 'retail.spiders' 

and (in min example) spider just

# evo-retail/retail/retail/spiders/retail_spider.py scrapy.conf import settings scrapy_settings scrapy.spiders import spider scrapy.http import request import sys import django import os import posixpath scrapy_base_dir = scrapy_settings['project_root'] django_dir = posixpath.normpath(os.path.join(scrapy_base_dir, '../../../', 'evofrontend')) sys.path.insert(0, django_dir) os.environ.setdefault("django_settings_module", 'evofrontend.settings') django.setup() evosched.myutils import scrapy_xhr_headers  class retailspider(spider):      name = "testspider"      def start_requests(self):         print scrapy_xhr_headers         yield request(url='http://www.google.com', callback=self.parse)      def parse(self, response):         print response.url         return [] 

edit:

i discovered through lots of trial , error if app i'm trying import in installed_apps django setting, fails import error, if remove app there no longer import error (e.g. removing evosched installed_apps import in spider goes through fine...). not solution, may clue.

edit 2

i put print of sys.path before failing import in spider, result was

/home/lee/desktop/pyco/evo-scraping-min/evofrontend/../evo-retail/retail  /home/lee/desktop/pyco/evo-scraping-min/venv/lib/python2.7 /home/lee/desktop/pyco/evo-scraping-min/venv/lib/python2.7/plat-x86_64-linux-gnu /home/lee/desktop/pyco/evo-scraping-min/venv/lib/python2.7/lib-tk /home/lee/desktop/pyco/evo-scraping-min/venv/lib/python2.7/lib-old   /home/lee/desktop/pyco/evo-scraping-min/venv/lib/python2.7/lib-dynload /usr/lib/python2.7 /usr/lib/python2.7/plat-x86_64-linux-gnu /usr/lib/python2.7/lib-tk /home/lee/desktop/pyco/evo-scraping-min/venv/local/lib/python2.7/site-packages /home/lee/desktop/pyco/evo-scraping-min/evofrontend  /home/lee/desktop/pyco/evo-scraping-min/evo-retail/retail` 

edit 3

if import evosched print dir(evosched), see "tasks" , if choose include such file, can see "models", importing models possible. don't see " myutils". from evosched import myutils fails , fails if statement put in function below rather global(i thought might route out circular import issue...). direct import evosched works...possibly import evosched.utils work. not yet tried...

it seems celery daemon running using system's python , not python binary inside virtualenv. need use

# python interpreter environment.  env_python="$celeryd_chdir/env/bin/python" 

as mentioned here tell celeryd run using python inside virtualenv.


Comments

Popular posts from this blog

how to insert data php javascript mysql with multiple array session 2 -

multithreading - Exception in Application constructor -

windows - CertCreateCertificateContext returns CRYPT_E_ASN1_BADTAG / 8009310b -