python - Celery, Django and Scrapy: error importing from django app -
i'm using celery (and django-celery) allow user launch periodic scrapes through django admin. part of larger project i've boiled issue down minimal example.
firstly, celery/celerybeat running daemonized. if instead run them celery -a evofrontend worker -b -l info django project dir i no issues weirdly.
when run celery/celerybeat daemons strange import error:
[2016-01-06 03:05:12,292: error/mainprocess] task evosched.tasks.scrapingtask[e18450ad-4dc3-47a0-b03d-4381a0e65c31] raised unexpected: importerror('no module named myutils',) traceback (most recent call last): file "/home/lee/desktop/pyco/evo-scraping-min/venv/local/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task r = retval = fun(*args, **kwargs) file "/home/lee/desktop/pyco/evo-scraping-min/venv/local/lib/python2.7/site-packages/celery/app/trace.py", line 438, in __protected_call__ return self.run(*args, **kwargs) file "evosched/tasks.py", line 35, in scrapingtask cs = crawlerscript('testspider', scrapy_settings) file "evosched/tasks.py", line 13, in __init__ self.crawler = crawlerprocess(scrapy_settings) file "/home/lee/desktop/pyco/evo-scraping-min/venv/local/lib/python2.7/site-packages/scrapy/crawler.py", line 209, in __init__ super(crawlerprocess, self).__init__(settings) file "/home/lee/desktop/pyco/evo-scraping-min/venv/local/lib/python2.7/site-packages/scrapy/crawler.py", line 115, in __init__ self.spider_loader = _get_spider_loader(settings) file "/home/lee/desktop/pyco/evo-scraping-min/venv/local/lib/python2.7/site-packages/scrapy/crawler.py", line 296, in _get_spider_loader return loader_cls.from_settings(settings.frozencopy()) file "/home/lee/desktop/pyco/evo-scraping-min/venv/local/lib/python2.7/site-packages/scrapy/spiderloader.py", line 30, in from_settings return cls(settings) file "/home/lee/desktop/pyco/evo-scraping-min/venv/local/lib/python2.7/site-packages/scrapy/spiderloader.py", line 21, in __init__ module in walk_modules(name): file "/home/lee/desktop/pyco/evo-scraping-min/venv/local/lib/python2.7/site-packages/scrapy/utils/misc.py", line 71, in walk_modules submod = import_module(fullpath) file "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module __import__(name) file "retail/spiders/retail_spider.py", line 16, in <module> importerror: no module named myutils i.e. spider having issues importing django project app despite adding relevant things syslog, , doing django.setup().
my hunch may caused " circular import" during initialization, i'm not sure (see here notes on same error)
celery daemon config
for completeness celeryd , celerybeat configuration scripts are:
# /etc/default/celeryd celeryd_nodes="worker1" celery_bin="/home/lee/desktop/pyco/evo-scraping-min/venv/bin/celery" celery_app="evofrontend" django_settings_module="evofrontend.settings" celeryd_chdir="/home/lee/desktop/pyco/evo-scraping-min/evofrontend" celeryd_opts="--concurrency=1" # workers should run unprivileged user. celeryd_user="lee" celeryd_group="lee" celery_create_dirs=1 and
# /etc/default/celerybeat celery_bin="/home/lee/desktop/pyco/evo-scraping-min/venv/bin/celery" celery_app="evofrontend" celerybeat_chdir="/home/lee/desktop/pyco/evo-scraping-min/evofrontend/" # django settings module export django_settings_module="evofrontend.settings" they largely based on the generic ones, django settings thrown in , using celery bin in virtualenv rather system.
i'm using init.d scripts the generic ones.
project structure
as project: lives @ /home/lee/desktop/pyco/evo-scraping-min. files under have ownership lee:lee. dir contains both scrapy (evo-retail) , django (evofrontend) project live under , complete tree structure looks like
├── evofrontend │ ├── db.sqlite3 │ ├── evofrontend │ │ ├── celery.py │ │ ├── __init__.py │ │ ├── settings.py │ │ ├── urls.py │ │ └── wsgi.py │ ├── evosched │ │ ├── __init__.py │ │ ├── myutils.py │ │ └── tasks.py │ └── manage.py └── evo-retail └── retail ├── logs ├── retail │ ├── __init__.py │ ├── settings.py │ └── spiders │ ├── __init__.py │ └── retail_spider.py └── scrapy.cfg django project relevant files
now relevant files: evofrontend/evofrontend/celery.py looks like
# evofrontend/evofrontend/celery.py __future__ import absolute_import import os celery import celery # set default django settings module 'celery' program. os.environ.setdefault('django_settings_module', 'evofrontend.settings') django.conf import settings app = celery('evofrontend') # using string here means worker not have # pickle object when using windows. app.config_from_object('django.conf:settings') app.autodiscover_tasks(lambda: settings.installed_apps) the potentially relevant settings django settings file, evofrontend/evofrontend/settings.py are
import os base_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) project_root = os.path.abspath(os.path.join(os.path.dirname(__file__), os.pardir)) installed_apps = ( ... 'djcelery', 'evosched', ) # celery settings broker_url = 'amqp://guest:guest@localhost//' celery_accept_content = ['json'] celery_task_serializer = 'json' celery_result_serializer = 'json' celery_timezone = 'europe/london' celeryd_max_tasks_per_child = 1 # each worker killed after 1 task, prevents issues reactor not being restartable # use django-celery backend database celery_result_backend = 'djcelery.backends.database:databasebackend' # set periodic task celerybeat_scheduler = "djcelery.schedulers.databasescheduler" the tasks.py in scheduling app, evosched, looks (it launches scrapy spider using relevant settings after changing dir)
# evofrontend/evosched/tasks.py __future__ import absolute_import celery import shared_task celery.utils.log import get_task_logger logger = get_task_logger(__name__) import os scrapy.crawler import crawlerprocess scrapy.utils.project import get_project_settings django.conf import settings django_settings class crawlerscript(object): def __init__(self, spider, scrapy_settings): self.crawler = crawlerprocess(scrapy_settings) self.spider = spider # string def run(self, **kwargs): # pass kwargs (usually command line args) crawler self.crawler.crawl(self.spider, **kwargs) self.crawler.start() @shared_task def scrapingtask(**kwargs): logger.info("start scrape...") # scrapy.cfg file here pointing settings... base_dir = django_settings.base_dir os.chdir(os.path.join(base_dir, '..', 'evo-retail/retail')) scrapy_settings = get_project_settings() # run crawler cs = crawlerscript('testspider', scrapy_settings) cs.run(**kwargs) the evofrontend/evosched/myutils.py contains (in min example):
# evofrontend/evosched/myutils.py scrapy_xhr_headers = 'something' scrapy project relevant files
in complete scrapy project settings file looks like
# evo-retail/retail/retail/settings.py bot_name = 'retail' import os project_root = os.path.dirname(os.path.abspath(__file__)) spider_modules = ['retail.spiders'] newspider_module = 'retail.spiders' and (in min example) spider just
# evo-retail/retail/retail/spiders/retail_spider.py scrapy.conf import settings scrapy_settings scrapy.spiders import spider scrapy.http import request import sys import django import os import posixpath scrapy_base_dir = scrapy_settings['project_root'] django_dir = posixpath.normpath(os.path.join(scrapy_base_dir, '../../../', 'evofrontend')) sys.path.insert(0, django_dir) os.environ.setdefault("django_settings_module", 'evofrontend.settings') django.setup() evosched.myutils import scrapy_xhr_headers class retailspider(spider): name = "testspider" def start_requests(self): print scrapy_xhr_headers yield request(url='http://www.google.com', callback=self.parse) def parse(self, response): print response.url return [] edit:
i discovered through lots of trial , error if app i'm trying import in installed_apps django setting, fails import error, if remove app there no longer import error (e.g. removing evosched installed_apps import in spider goes through fine...). not solution, may clue.
edit 2
i put print of sys.path before failing import in spider, result was
/home/lee/desktop/pyco/evo-scraping-min/evofrontend/../evo-retail/retail /home/lee/desktop/pyco/evo-scraping-min/venv/lib/python2.7 /home/lee/desktop/pyco/evo-scraping-min/venv/lib/python2.7/plat-x86_64-linux-gnu /home/lee/desktop/pyco/evo-scraping-min/venv/lib/python2.7/lib-tk /home/lee/desktop/pyco/evo-scraping-min/venv/lib/python2.7/lib-old /home/lee/desktop/pyco/evo-scraping-min/venv/lib/python2.7/lib-dynload /usr/lib/python2.7 /usr/lib/python2.7/plat-x86_64-linux-gnu /usr/lib/python2.7/lib-tk /home/lee/desktop/pyco/evo-scraping-min/venv/local/lib/python2.7/site-packages /home/lee/desktop/pyco/evo-scraping-min/evofrontend /home/lee/desktop/pyco/evo-scraping-min/evo-retail/retail` edit 3
if import evosched print dir(evosched), see "tasks" , if choose include such file, can see "models", importing models possible. don't see " myutils". from evosched import myutils fails , fails if statement put in function below rather global(i thought might route out circular import issue...). direct import evosched works...possibly import evosched.utils work. not yet tried...
it seems celery daemon running using system's python , not python binary inside virtualenv. need use
# python interpreter environment. env_python="$celeryd_chdir/env/bin/python" as mentioned here tell celeryd run using python inside virtualenv.
Comments
Post a Comment