Database Reference
In-Depth Information
/user/CompanyWebsiteA/sitelogs
/user/CompanyWebsiteA/useractivity
/user/CompanyWebsiteB/sitelogs
/user/CompanyWebsiteB/useractivity
By structuring the folders in this manner, you can easily implement security
for each folder at the website level, to prevent unauthorized access.
Conversely, if security were not a critical element, you might choose to
reverse the order and store the format of the data first. This would make it
easier to know what each folder contains and to do processing that spans all
websites.
NOTE
ASV doesn't support a directory hierarchy. Instead, you have a
container, and it stores key/value pairs for the data. However, ASV does
allow the forward slash ( / ) to be used inside a key name (for example,
CompanyWebsiteA/sitelogs/sitelog1.txt ). By using the
forward slash, the key keeps the appearance of a folder-based structure.
You can easily modify the folder structures by using the dfs -cp and -mv
commands.Thismeansthatifaparticularfolderstructureisn'tworkingout,
you can try new ones.
Rebalancing Data
Generally, HDFS manages the placement of data across nodes very well.
As discussed previously, it attempts to balance the placement of data to
ensure acombination ofreliability and performance. However, asmore data
is added to a Hadoop cluster, it is normal to add more nodes to it. This can
lead to the cluster being out of balance; that is, some nodes have more data
and, therefore, more activity than other nodes. It can also lead to certain
nodes having most of the more recently added data, which can create some
issues, because newly added data is often more heavily accessed.
You can rebalance the cluster by using the balancer tool, which is simple to
use. It takes one optional parameter, which defines a threshold of disk usage
Search WWH ::




Custom Search