OOP in python (related to scrapy) -


the question how share data between objecs in safe , maintainable manner.

example: i've build scrapy application spawns numerous spiders. although each spider connected separate pipeline object, need compare , sort data between different pipelines (e.g. need outputs sorted different item attributes: prices, date etc.), need shared data area. same applies spiders (e.g. need count maximum total requests). first implementation used class variables shared data between between spiders/pipelines , instance variables each object.

class mypipeline(object): max_price = 0  def process_item(self, item, spider): if item['price'] > max_price :   max_price = item['price'] 

(the actual structures more complex) thought out having bunch of statics not oop , next solution have private class data each class , use store values:

class mypipelinedata: def __init__(self):    self.max_price = 0  class spidersdata:   def __init___(self, total_requests, pipeline_data):     self.total_requests = total_requests     self.pipeline_data = pipeline_data #the shared data between pipelines  class mypipeline(object): pipeline_data = none  def process_item(self, item, spider):   if _data none:        _data = spider.data.pipeline_data  #the shared data between pipelines     if item['price'] > _data.max_price :     _data.max_price = item['price']   class spider(scrapy.spider):  def __init__(self, spider_data):    self._data = spider_data   # , same object of spiderdata passed spiders  

now have 1 instance of data shared between pipeplines (and same spiders). correct this? should apply same oop approaches in python in c++ ?

from understand, approach proposing keep reference each object shared object captures of shared data, , and think fine, if name appropriately name suggests it's being shared, readability.

also, you're hiding internals of shared object , encapsulating them inside methods such process_item(), think important maintainability (because changes in internals of shared object don't have affect other object).

but i'm not sure way bootstrapping (i.e. initializing) shared object. have these 2 lines

if _data none:   _data = ... 

which little surprising. didn't quite understand _data , defined. pipeline_data assigned none , never assigned else, i'm not sure meant there.

if possible, prefer see function called create_spiders() creates shared object, , creates different spiders 1 one, giving them reference shared object. makes logic clear.


however, in special case want shared object singleton, consider making static object in module name appropriately, maybe globals.py. , inside spider code see things like

import globals  class spiderdata:  def update(self):   self.data.price = 200   globals.spiders_data_collector.process(self.data) 

inside module globals initialize object spiders_data_collector. think requires less code, , important maintainability.


Comments

Popular posts from this blog

how to insert data php javascript mysql with multiple array session 2 -

multithreading - Exception in Application constructor -

windows - CertCreateCertificateContext returns CRYPT_E_ASN1_BADTAG / 8009310b -