Database Reference
In-Depth Information
monthly_zeros
=
[
(
'daily.
%d
'
%
d
,
0
)
for
for
d
iin
range
(
1
,
32
) ]
# Perform upserts, setting metadata
db
.
stats
.
daily
.
update
(
{
'_id'
:
id_daily
,
'metadata'
:
daily_metadata
},
{
'$inc'
:
dict
(
daily_zeros
) },
upsert
=
True
)
db
.
stats
.
monthly
.
update
(
{
'_id'
:
id_monthly
,
'daily'
:
daily
},
{
'$inc'
:
dict
(
monthly_zeros
) },
upsert
=
True
)
This function pre-allocates both the monthly
and
daily documents at the same time. The per-
formance benefits from separating these operations are negligible, so it's reasonable to keep
both operations in the same function.
The question now arises as to
when
to pre-allocate the documents. Obviously, for best per-
formance, they need to be pre-allocated before they are used (although the
upsert
code will
actually work correctly even if it executes against a document that already exists). While we
could
pre-allocate the documents all at once, this leads to poor performance during the pre-
allocation time. A better solution is to pre-allocate the documents probabilistically each time
we log a hit:
from
from
random
random
import
import
random
from
from
datetime
datetime
import
import
datetime
,
timedelta
,
time
# Example probability based on 500k hits per day per page
prob_preallocate
=
1.0
/
500000
def
def
log_hit
(
db
,
dt_utc
,
site
,
page
):
iif
random
.
random
()
<
prob_preallocate
:
preallocate
(
db
,
dt_utc
+
timedelta
(
days
=
1
),
site_page
)
# Update daily stats doc
...