Just For Coding

Keep learning, keep living …

PowerDNS中PacketCache实现

PowerDNS中DNS解析由各类Backend模块处理。如果解析相关的数据存储于MySQL, Postgres等数据库,Backend需要在这些数据库中查询相应的记录是否存在。查询数据库的性能很低。因而PowerDNS中实现了PacketCache来提高性能。PowerDNS接收到请求后,先在PacketCache中查询是否已经有相应的DNS响应。如果有则直接返回该缓存。否则交由backend处理,处理后再添加到PacketCache中。

PacketCache实现主要位于packetcache.hh和packetcache.cc中。

common_startup.cc中定义了一些全局对象,其中包括一个PacketCache对象PC,PowerDNS中所有线程会共享这个对象。

1
PacketCache PC; //!< This is the main PacketCache, shared accross all threads

来看PacketCache的构造函数,它初始化了一个读写锁和一些变量,接着添加了3个统计变量:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
PacketCache::PacketCache()
{
  pthread_rwlock_init(&d_mut, 0);
  // d_ops = 0;

  d_ttl=-1;
  d_recursivettl=-1;

  S.declare("packetcache-hit");
  S.declare("packetcache-miss");
  S.declare("packetcache-size");

  d_statnumhit=S.getPointer("packetcache-hit");
  d_statnummiss=S.getPointer("packetcache-miss");
  d_statnumentries=S.getPointer("packetcache-size");
}

PowerDNS有多个线程负责接收请求。线程的简化逻辑如下:

1
2
3
4
5
6
7
8
9
10
11
12
  for(;;) {
    if(!(P=N->receive(&question))) { // receive a packet         inline
      continue;                    // packet was broken, try again
    }

    if(P->couldBeCached() && PC.get(P, &cached)) { // short circuit - does the PacketCache recognize this question?
      NS->send(&cached);   // answer it then
      continue;
    }

    distributor->question(P, &sendout);
  }

首先调用NameServer对象的receiver方法接收请求并解析到DNSPacket对象。接着调用DNSPacket::couldBeCached()判断请求是否可以被缓存, 可以看到PowerDNS只缓存Class为IN(Internet)的DNS请求。

1
2
3
4
bool DNSPacket::couldBeCached()
{
  return d_ednsping.empty() && !d_wantsnsid && qclass==QClass::IN;
}

如果请求可以被缓存,则调用PC.get方法查询PacketCache中是否有相应缓存。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
int PacketCache::get(DNSPacket *p, DNSPacket *cached)
{
  extern StatBag S;

  if(d_ttl<0)
    getTTLS();

  if(!((++d_ops) % 300000)) {
    cleanup();
  }

  ...

  if(ntohs(p->d.qdcount)!=1) // we get confused by packets with more than one question
    return 0;

  string value;
  bool haveSomething;
  {
    TryReadLock l(&d_mut); // take a readlock here
    if(!l.gotIt()) {
      S.inc("deferred-cache-lookup");
      return 0;
    }

    uint16_t maxReplyLen = p->d_tcp ? 0xffff : p->getMaxReplyLen();
    haveSomething=getEntryLocked(p->qdomain, p->qtype, PacketCache::PACKETCACHE, value, -1, packetMeritsRecursion, maxReplyLen, p->d_dnssecOk, p->hasEDNS());
  }
  if(haveSomething) {
    (*d_statnumhit)++;
    if(cached->noparse(value.c_str(), value.size()) < 0) {
      return 0;
    }
    cached->spoofQuestion(p); // for correct case
    return 1;
  }

  //  cerr<<"Packet cache miss for '"<<p->qdomain<<"', merits: "<<packetMeritsRecursion<<endl;
  (*d_statnummiss)++;
  return 0; // bummer
}

get方法在第一次被调用时会调用getTTLS方法获取配置,”cache-ttl”为缓存有效时间。

1
2
3
4
5
6
7
void PacketCache::getTTLS()
{
  d_ttl=::arg().asNum("cache-ttl");
  d_recursivettl=::arg().asNum("recursive-cache-ttl");

  d_doRecursion=::arg().mustDo("recursor");
}

每进行300000次查询操作(PacketCache::get和PacketCache::getEntry)时,get方法会调用一次cleanup函数。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
void PacketCache::cleanup()
{
  WriteLock l(&d_mut);

  *d_statnumentries=d_map.size();

  unsigned int maxCached=::arg().asNum("max-cache-entries");
  unsigned int toTrim=0;

  unsigned int cacheSize=*d_statnumentries;

  if(maxCached && cacheSize > maxCached) {
    toTrim = cacheSize - maxCached;
  }

  unsigned int lookAt=0;
  // two modes - if toTrim is 0, just look through 10%  of the cache and nuke everything that is expired
  // otherwise, scan first 5*toTrim records, and stop once we've nuked enough
  if(toTrim)
    lookAt=5*toTrim;
  else
    lookAt=cacheSize/10;

  //  cerr<<"cacheSize: "<<cacheSize<<", lookAt: "<<lookAt<<", toTrim: "<<toTrim<<endl;
  time_t now=time(0);

  DLOG(L<<"Starting cache clean"<<endl);
  if(d_map.empty())
    return; // clean

  typedef cmap_t::nth_index<1>::type sequence_t;
  sequence_t& sidx=d_map.get<1>();
  unsigned int erased=0, lookedAt=0;
  for(sequence_t::iterator i=sidx.begin(); i != sidx.end(); lookedAt++) {
    if(i->ttd < now) {
      sidx.erase(i++);
      erased++;
    }
    else
      ++i;

    if(toTrim && erased > toTrim)
      break;

    if(lookedAt > lookAt)
      break;
  }
  //  cerr<<"erased: "<<erased<<endl;
  *d_statnumentries=d_map.size();
  DLOG(L<<"Done with cache clean"<<endl);
}

cleanup函数首先获取”max-cache-entries”选项。如果配置了该选项,并且缓存数目已经超过该选项的值,则应从缓存的map结构中清除一部分过期缓存项。在清除时,搜寻个数为应清除个数的5倍。如果当前缓存个数没有超过“max-cache-entries”, 则搜寻总个数的1/10。

个人感觉,这样实现并不太好,会导致这次请求处理时间过长。可以创建一个独立的线程周期性地清除缓存map中的过期项。

之后,get函数调用getEntryLocked来查找缓存中是否有该请求对应响应结果的缓存,通过map::find实现。

1
2
3
4
5
6
7
8
9
10
11
12
bool PacketCache::getEntryLocked(const string &qname, const QType& qtype, CacheEntryType cet, string& value, int zoneID, bool meritsRecursion,
  unsigned int maxReplyLen, bool dnssecOK, bool hasEDNS)
{
  uint16_t qt = qtype.getCode();
  cmap_t::const_iterator i=d_map.find(tie(qname, qt, cet, zoneID, meritsRecursion, maxReplyLen, dnssecOK, hasEDNS));
  time_t now=time(0);
  bool ret=(i!=d_map.end() && i->ttd > now);
  if(ret)
    value = i->value;

  return ret;
}

如果从缓存中找到了响应数据包,则调用DNSPacket::noparse方法将结果保存到需要发回的响应包的相应结构中。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
int DNSPacket::noparse(const char *mesg, int length)
{
  d_rawpacket.assign(mesg,length);
  if(length < 12) {
    L << Logger::Warning << "Ignoring packet: too short from "
      << getRemote() << endl;
    return -1;
  }
  d_wantsnsid=false;
  d_ednsping.clear();
  d_maxreplylen=512;
  memcpy((void *)&d,(const void *)d_rawpacket.c_str(),12);
  return 0;
}

接着调用DNSPacket::spoofQuestion访求将请求包的请求域名部分复制到响应包的请求域名部分。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
void DNSPacket::spoofQuestion(const DNSPacket *qd)
{
  d_wrapped=true; // if we do this, don't later on wrapup

  int labellen;
  string::size_type i=sizeof(d);

  for(;;) {
    labellen = qd->d_rawpacket[i];
    if(!labellen) break;
    i++;
    d_rawpacket.replace(i, labellen, qd->d_rawpacket, i, labellen);
    i = i + labellen;
  }
}

个人感觉,请求域名部分的复制操作没有必要,缓存中的响应包的域名应该与本次请求包中的域名是相同的。

调用PC.get获取响应后,修改DNS头部的相应标志位及ID后,将响应发送出去。

1
2
3
4
5
cached.d.rd=P->d.rd; // copy in recursion desired bit
cached.d.id=P->d.id;
cached.commitD();

N->send(&cached);

下面来看添加缓存的过程。

如果没有找到缓存,PowerDNS交由distributor处理DNS请求。

1
distributor->question(P, &sendout); // otherwise, give to the distributor

这会调用PacketHandler::question函数。question函数会又会调用questionOrRecurse函数。questionOrRecurse函数在一系列检查后,对于CLASS为IN的请求,调用getAuth判断该DNS服务器是否是请求域名的授权服务器。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
bool PacketHandler::getAuth(DNSPacket *p, SOAData *sd, const string &target, int *zoneId)
{
  bool found=false;
  string subdomain(target);
  do {
    if( B.getSOA( subdomain, *sd, p ) ) {
      sd->qname = subdomain;
      if(zoneId)
        *zoneId = sd->domain_id;

      if(p->qtype.getCode() == QType::DS && pdns_iequals(subdomain, target)) {
        // Found authoritative zone but look for parent zone with 'DS' record.
        found=true;
      } else
        return true;
    }
  }
  while( chopOff( subdomain ) );   // 'www.powerdns.org' -> 'powerdns.org' -> 'org' -> ''
  return found;
}

getAuth函数对域名或其子域调用Backend的getSOA函数获得SOA记录,找到则返回。比如,请求解析”www.foo.com”,首先查找”www.foo.com”是否存在SOA记录。如果没有,接着查找”foo.com”是否存在SOA记录。直到查找部分为”“。如果没有找到SOA记录,返回某种错误的响应。接下来,questionOrRecurse函数以QTYPE::ANY为参数调用Backend的lookup函数,并依次调用B.get获取找到的记录。

1
2
3
4
5
6
7
8
B.lookup(QType(QType::ANY), target, p, sd.domain_id);
rrset.clear();
weDone = weRedirected = weHaveUnauth = 0;

while(B.get(rr)) {
  ...
  rrset.push_back(rr);
}

如果没有找到任何记录,则尝试泛解析,处理过程不详述。

1
2
3
4
5
if(rrset.empty()) {
  if(tryWildcard(p, r, sd, target, wildcard, wereRetargeted, nodata)) {
    ...
  }
}

如果有CNAME记录,则重新针对CNAME目标域名进行上述逻辑。如果找到符合请求的记录,则添加到响应里发送出去。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
if(weRedirected) {
  BOOST_FOREACH(rr, rrset) {
    if(rr.qtype.getCode() == QType::CNAME) {
      r->addRecord(rr);
      target = rr.content;
      retargetcount++;
      goto retargeted;
    }
  }
}
else if(weDone) {
  bool haveRecords = false;
  BOOST_FOREACH(rr, rrset) {
    if((p->qtype.getCode() == QType::ANY || rr.qtype == p->qtype) && rr.qtype.getCode() && rr.auth) {
      r->addRecord(rr);
      haveRecords = true;
    }
  }

  if (haveRecords) {
    if(p->qtype.getCode() == QType::ANY)
      completeANYRecords(p, r, sd, target);
  }
  else
    makeNOError(p, r, rr.qname, "", sd, 0);

  goto sendit;
}

在发送前,会调用PC.insert将响应添加进PacketCache,实现通过map::insert。

1
PC.insert(p, r, r->getMinTTL()); // in the packet cache