Amazon CloudFront: Looking Back, Looking Forward, Making Plans
Looking Back
In 2011 we added a total of seven edge locations to Amazon CloudFront and Route 53. We also added lots of new features, as I documented last year.
Looking Forward
Our newest edge locations are located in Milan, Italy and Osaka, Japan. This brings our total worldwide location count to 26 (see the CloudFront page for a complete list). Each new edge location helps lower latency and improves performance for your end users.
Making Plans
We have additional locations in the pipeline for 2012 and beyond. Our planning process takes a number of factors in to account including notes from our sales team and discussions on the Amazon CloudFront forum. We also collect latency measurements from a number of points around the globe to our current set of locations and correlate them with broadband Internet penetration and existing Amazon CloudFront usage in the area.
I would also like to invite you to participate in the Amazon CloudFront Edge Location Survey. We are very interested in your suggestions for additional locations. We'd also like to learn a bit more about the type of content that you deliver to your customers.
All Aboard
The CloudFront team is hiring. We need some Software Development Engineers, a Senior Systems Engineer,a Senior Software Development Manager, a Product Manager, and a Business Development Representative.
-- Jeff;
New Elastic MapReduce Features: Metrics, Updates, VPC, and Cluster Compute Support (Guest Post)
Today's guest blogger is Adam Gray. Adam is a Product Manager on the Elastic MapReduce Team.
-- Jeff;
Weâre always excited when we can bring features to our customers that make it easier for them to derive value from their dataâso itâs been a fun month for the EMR team. Here is a sampling of the things weâve been working on.
Free CloudWatch Metrics
Starting today customers can view graphs of 23 job flow metrics within the EMR Console by selecting the Monitoring tab in the Job Flow Details page. These metrics are pushed CloudWatch every five minutes at no cost to you and include information on:
- Job flow progress including metrics on the number of map and reduce tasks running and remaining in your job flow and the number of bytes read and written to S3 and HDFS.
- Job flow contention including metrics on HDFS utilization, map and reduce slots open, jobs running, and the ratio between map tasks remaining and map slots.
- Job flow health including metrics on whether your job flow is idle, if there are missing data blocks, and if there are any dead nodes.
Please watch this video to see how to view CloudWatch graphs in the EMR Console:
You can also learn more from the Viewing CloudWatch Metrics section of the EMR Developer Guide.
You can view the new metrics in the AWS Management Console:
Further, through the CloudWatch Console, API, or SDK you can set alarms to be notified via SNS if any of these metrics go outside of specified thresholds. For example, you can receive an email notification whenever a job flow is idle for more than 30 minutes, HDFS Utilization goes above 80%, or there are five times as many remaining map tasks as there are map slots, indicating that you may want to expand your cluster size.
Please watch this video to see how to set EMR alarms through the CloudWatch Console:
Hadoop 0.20.205, Pig 0.9.1, and AMI Versioning
EMR now supports running your job flows using Hadoop 0.20.205 and Pig 0.9.1. To simplify the upgrade process, we have also introduced the concept of AMI versions. You can now provide a specific AMI version to use at job flow launch or specify that you would like to use our âlatestâ AMI, ensuring that you are always using our most up-to-date features. The following AMI versions are now available:
- Version 2.0.x: Hadoop 0.20.205, Hive 0.7.1, Pig 0.9.1, Debian 6.0.2 (Squeeze)
- Version 1.0.x: Hadoop 0.18.3 and 0.20.2, Hive 0.5 and 0.7.1, Pig 0.3 and 0.6, Debian 5.0 (Lenny)
You can specify an AMI version when launching a job flow in the Ruby CLI using the --ami-version argument (note that you will have to download the latest version of the Ruby CLI):
$ ./elastic-mapreduce --create --alive --name "Test AMI Versioning" --ami-version latest --num-instances 5 --instance-type m1.smallPlease visit the AMI Versioning section of the Elastic MapReduce Developer Guide for more information.
S3DistCp for Efficient Copy between S3 and HDFS
We have also made available S3DistCp, an extension of the open source Apache DistCp tool for distributed data copy, that has been optimized to work with Amazon S3. Using S3DistCp, you can efficiently copy large amounts of data between Amazon S3 and HDFS on your Amazon EMR job flow or copy files between Amazon S3 buckets. During data copy you can also optimize your files for Hadoop processing. This includes modifying compression schemes, concatenating small files, and creating partitions.
For example, you can load Amazon CloudFront logs from S3 into HDFS for processing while simultaneously modifying the compression format from Gzip (the Amazon CloudFront default) to LZO and combining all the logs for a given hour into a single file. As Hadoop jobs are more efficient processing a few, large, LZO-compressed files than processing many, small, Gzip-compressed files, this can improve performance significantly.
Please see Distributed Copy Using S3DistCp in the Amazon Elastic MapReduce documentation for more details and code examples.
cc2.8xlarge Support
Amazon Elastic MapReduce also now supports the new Amazon EC2 Cluster Compute instance, Cluster Compute Eight Extra Large (cc2.8xlarge). Like other Cluster Compute instances, cc2.8xlarge instances are optimized for high performance computing, giving customers very high CPU capabilities and the ability to launch instances within a high bandwidth, low latency, full bisection bandwidth network. cc2.8xlarge instances provide customers with more than 2.5 times the CPU performance of the first Cluster Compute instance (cc1.4xlarge) instance, more memory, and more local storage at a very compelling cost. Please visit the Instance Types section of the Amazon Elastic MapReduce detail page for more details.
In addition, we are pleased to announce an 18% reduction in Amazon Elastic MapReduce pricing for cc1.4xlarge instances, dropping the total per hour cost to $1.57. Please visit the Amazon Elastic MapReduce Pricing Page for more details.
VPC Support
Finally, we are excited to announce support for running job flows in an Amazon Virtual Private Cloud (Amazon VPC), making it easier for customers to:
- Process sensitive data - Launching a job flow on Amazon VPC is similar to launching the job flow on a private network and provides additional tools, such as routing tables and Network ACLs, for defining who has access to the network. If you are processing sensitive data in your job flow, you may find these additional access control tools useful.
- Access resources on an internal network - If your data is located on a private network, it may be impractical or undesirable to regularly upload that data into AWS for import into Amazon Elastic MapReduce, either because of the volume of data or because of its sensitive nature. Now you can launch your job flow on an Amazon VPC and connect to your data center directly through a VPN connection.
You can launch Amazon Elastic MapReduce job flows into your VPC through the Ruby CLI by using the --subnet argument and specifying the subnet address (note that you will have to download the latest version of the Ruby CLI):
$ ./elastic-mapreduce --create --alive --subnet "subnet-identifier"Please visit the Running Job Flows on an Amazon VPC section in the Elastic MapReduce Developer Guide for more information.
-- Adam Gray, Product Manager, Amazon Elastic MapReduce.
Amazon S3 Growth for 2011 - Now 762 Billion Objects
As of the end of 2011, there are 762 billion (762,000,000,000) objects in Amazon S3. We process over 500,000 requests per second for these objects at peak times.
Here's the annual growth chart:
This represents year-over-year growth of 192%; S3 grew faster last year than it did in any year since it launched in 2006.
Where are all of these objects coming from? Although we definitely made it easier for you to delete objects using Multi-Object Deletion and Object Expiration, we also gave you plenty of ways to upload new objects using Multipart upload, AWS Direct Connect, and AWS Import/Export.
As you can imagine, building, running, and adding new features to a system as large and as complex as S3 is no simple task. Here are some of the open positions on the S3 team:
- Software Development Engineer
- Senior Software Development Engineer
- Systems Engineer
- Senior Product Manager
- Director
-- Jeff;
New AWS Premium Support Features: Third-Party Software Support and AWS Trusted Advisor
We have added two new benefits to the Gold and Platinum levels of AWS Premium Support. The following features are now in beta testing:
- We now offer third-party support for popular operating systems running on Amazon EC2. We also support a number of pieces of system software.
- The AWS Trusted Advisor monitors your use of AWS and recommends configuration changes and new services that may help save you money, improve system performance, and close security gaps.
Third-Party Support
If you have Gold or Platinum Premium Support, you can now ask questions related to a number of popular operating systems including Microsoft Windows, Ubuntu, Red Hat Linux, SuSE Linux, and the Amazon Linux AMI. You can ask us about system software including the Apache and IIS web servers, the Amazon SDKs, Sendmail, Postfix, and FTP. A team of AWS support engineers is ready to help with setup, configuration, and troubleshooting of these important infrastructure components.
We are also enabling the use of desktop sharing software, giving you the option to share your desktop with a support engineer as needed.
AWS Trusted Advisor
AWS Trusted Advisor draws upon best practices learned from AWSâ aggregated operational history of serving hundreds of thousands of AWS customers. The AWS Trusted Advisor inspects your AWS environment and makes recommendations when opportunities exist to save money, improve system performance, or close security gaps. The initial release of the AWS Trusted Advisor includes eight separate checks; we'll be adding more throughout 2012.
The checks are grouped into three families: fault tolerance checks, security audits, and cost optimizations. Here is the initial set of eight checks performed by AWS Trusted Advisor:
- Security Group - Open Ports - This check inspects your security groups and classifies each open port into one of three categories. Green ports for common protocols such as SSH and HTTP, Red ports for protocols that don't usually need to be open on internet-facing servers (e.g. port 1443 for Microsoft SQL Server), and Yellow for all others.
- Security Group - CIDR Rules - This check inspects your security groups for rules that have errors which might allow more access than may be intended. Some people (me included) often confuse "/0"and "/32" addresses.
- Reserved Instance Recommendations - This check looks at your billing and instance utilization history and recommends optimizations that could be achieved by the purchase of Reserved Instances.
- Unused Elastic IP Addresses - Elastic IP Addresses that are not attached to an Amazon EC2 instance will be flagged since you pay for them if you don't use them.
- EBS Snapshots - This check looks for EBS volumes that don't have a snapshot, or which have only aged snapshots. The Red/Yellow/Green model is also used here: Red if there is no snapshot at all or if the most recent one is very old; Yellow if the most recent snapshot is somewhat old, and Green if the most recent snapshot is reasonably recent (we're still fine tuning the thresholds for these checks).
- Amazon EC2 Availability Zone Balance - This check identifies situations where Amazon EC2 instances are not evenly distributed across Availability Zones, or if (even worse) they are all in the same Availability Zone. The Red/Yellow/Green model is used to characterize the situation.
- Elastic Load Balancer Optimization - This check determines whether instance allocation across Availability Zones for each Load Balancer is balanced.
- Service Limits - This check gives you visibility into the per-account limits and usage of things like instances, Elastic IP addresses, and other resources (in almost every case, limits can be raised using the appropriate online form).
AWS Trusted Advisor does not have access to customer data. Recommendations are made by analyzing information gathered using a constrained set of internal and documented AWS API calls.
Here's a diagram to show you how it works:
Advice from the AWS Trusted Advisor is made available in several different forms. For certain issues, we will proactively create support cases and notify you that a given check has identified an opportunity for improvement. The AWS Support Engineers are also available to review AWS Trusted Advisor recommendations any time you call in for support. In the future a regular scorecard report will be available, as will an AWS Trusted Advisor Console with support for viewing, running, customizing, and even opting out of certain checks as desired.
These new features are available for all Gold and Platinum customers. What do you think? Leave a comment and let me know.
-- Jeff;
New Tagging for Auto Scaling Groups
You can now add up to 10 tags to any of your Auto Scaling Groups. You can also, if you'd like, propagate the tags to the EC2 instances launched from your groups.
Adding tags to your Auto Scaling groups will make it easier for you to identify and distinguish them.
Each tag has a name, a value, and an optional propagation flag. If the flag is set, then the corresponding tag will be applied to EC2 instances launched from the group. You can use this feature to label or distinguish instances created by distinct Auto Scaling groups. You might be using multiple groups to support multiple scalable applications, or multiple scalable tiers or components of a single application. Either, way the tags can help you to keep your instances straight.
Read more in the newest version of the Auto Scaling Developer Guide.
-- Jeff;
AWS HowTo: Using Amazon Elastic MapReduce with DynamoDB (Guest Post)
Today's guest blogger is Adam Gray. Adam is a Product Manager on the Elastic MapReduce Team.
-- Jeff;
Apache Hadoop and NoSQL databases are complementary technologies that together provide a powerful toolbox for managing, analyzing, and monetizing Big Data. Thatâs why we were so excited to provide out-of-the-box Amazon Elastic MapReduce (Amazon EMR) integration with Amazon DynamoDB, providing customers an integrated solution that eliminates the often prohibitive costs of administration, maintenance, and upfront hardware. Customers can now move vast amounts of data into and out of DynamoDB, as well as perform sophisticated analytics on that data, using EMRâs highly parallelized environment to distribute the work across the number of servers of their choice. Further, as EMR uses a SQL-based engine for Hadoop called Hive, you need only know basic SQL while we handle distributed application complexities such as estimating ideal data splits based on hash keys, pushing appropriate filters down to DynamoDB, and distributing tasks across all the instances in your EMR cluster.
In this article, Iâll demonstrate how EMR can be used to efficiently export DynamoDB tables to S3, import S3 data into DynamoDB, and perform sophisticated queries across tables stored in both DynamoDB and other storage services such as S3.
We will also use sample product order data stored in S3 to demonstrate how you can keep current data in DynamoDB while storing older, less frequently accessed data, in S3. By exporting your rarely used data to Amazon S3 you can reduce your storage costs while preserving low latency access required for high velocity data. Further, exported data in S3 is still directly queryable via EMR (and you can even join your exported tables with current DynamoDB tables).
The sample order data uses the schema below. This includes Order ID as its primary key, a Customer ID field, an Order Date stored as the number of seconds since epoch, and Total representing the total amount spent by the customer on that order. The data also has folder-based partitioning by both year and month, and youâll see why in a bit.
Creating a DynamoDB Table
Letâs create a DynamoDB table for the month of January, 2012 named Orders-2012-01. We will specify Order ID as the Primary Key. By using a table for each month, it is much easier to export data and delete tables over time when they no longer require low latency access.
For this sample, a read capacity and a write capacity of 100 units should be more than sufficient. When setting these values you should keep in mind that the larger the EMR cluster the more capacity it will be able to take advantage of. Further, you will be sharing this capacity with any other applications utilizing your DynamoDB table.â
Launching an EMR Cluster
Please follow Steps 1-3 in the EMR for DynamoDB section of the Elastic MapReduce Developer Guide to launch an interactive EMR cluster and SSH to its Master Node to begin submitting SQL-based queries. Note that we recommend you use at least three instances of m1.large size for this sample.
At the hadoop command prompt for the current master node, type hive. You should see a hive prompt: hive>
As no other applications will be using our DynamoDB table, letâs tell EMR to use 100% of the available read throughput (by default it will use 50%). Note that this can adversely affect the performance of other applications simultaneously using your DynamoDB table and should be set cautiously.
SET dynamodb.throughput.read.percent=1.0;Creating Hive Tables
Outside data sources are referenced in your Hive cluster by creating an EXTERNAL TABLE. First letâs create an EXTERNAL TABLE for the exported order data in S3. Note that this simply creates a reference to the data, no data is yet moved.
PARTITIONED BY (year string, month string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION 's3://elastic-mapreduce/samples/ddb-orders' ;
You can see that we specified the data location, the ordered data fields, and the folder-based partitioning scheme.
Now letâs create an EXTERNAL TABLE for our DynamoDB table.
CREATE EXTERNAL TABLE orders_ddb_2012_01 ( order_id string, customer_id string, order_date bigint, total double )STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' TBLPROPERTIES (
"dynamodb.table.name" = "Orders-2012-01",
"dynamodb.column.mapping" = "order_id:Order ID,customer_id:Customer ID,order_date:Order Date,total:Total"
);
This is a bit more complex. We need to specify the DynamoDB table name, the DynamoDB storage handler, the ordered fields, and a mapping between the EXTERNAL TABLE fields (which canât include spaces) and the actual DynamoDB fields.
Now weâre ready to start moving some data!
Importing Data into DynamoDB
In order to access the data in our S3 EXTERNAL TABLE, we first need to specify which partitions we want in our working set via the ADD PARTITION command. Letâs start with the data for January 2012.
Now if we query our S3 EXTERNAL TABLE, only this partition will be included in the results. Letâs load all of the January 2012 order data into our external DynamoDB Table. Note that this may take several minutes.
INSERT OVERWRITE TABLE orders_ddb_2012_01SELECT order_id, customer_id, order_date, total
FROM orders_s3_export ;
Looks a lot like standard SQL, doesnât it?
Querying Data in DynamoDB Using SQL
Now letâs find the top 5 customers by spend over the first week of January. Note the use of unix-timestamp as order_date is stored as the number of seconds since epoch.
FROM orders_ddb_2012_01
WHERE order_date >= unix_timestamp('2012-01-01', 'yyyy-MM-dd')
AND order_date < unix_timestamp('2012-01-08', 'yyyy-MM-dd')
GROUP BY customer_id
ORDER BY spend desc
LIMIT 5 ;
Querying Exported Data in S3
It looks like customer: âc-2cC5fF1bBâ was the biggest spender for that week. Now letâs query our historical data in S3 to see what that customer spent in each of the final 6 months of 2011. Though first we will have to include the additional data into our working set. The RECOVER PARTITIONS command makes it easy to
We will now query the 2011 exported data for customer âc-2cC5fF1bBâ from S3. Note that the partition fields, both month and year, can be used in your Hive query.
SELECT year, month, customer_id, sum(total) spend, count(*) order_countFROM orders_s3_export
WHERE customer_id = 'c-2cC5fF1bB'
AND month >= 6
AND year = 2011
GROUP BY customer_id, year, month
ORDER by month desc;
Exporting Data to S3
Now letâs export the January 2012 DynamoDB table data to a different S3 bucket owned by you (denoted by YOUR BUCKET in the command). Weâll first need to create an EXTERNAL TABLE for that S3 bucket. Note that we again partition the data by year and month.
PARTITIONED BY (year string, month string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION 's3://YOUR BUCKET';
Now export the data from DynamoDB to S3, specifying the appropriate partition values for that tableâs month and year.
INSERT OVERWRITE TABLE orders_s3_new_exportPARTITION (year='2012', month='01')
SELECT * from orders_ddb_2012_01;
Note that if this was the end of a month and you no longer needed low latency access to that tableâs data, you could also delete the table in DynamoDB. You may also now want to terminate your job flow from the EMR console to ensure you do not continue being charged.
Thatâs it for now. Please visit our documentation for more examples, including how to specify the format and compression scheme for your exported files.
-- Adam Gray, Product Manager, Amazon Elastic MapReduce.
The AWS Storage Gateway - Integrate Your Existing On-Premises Applications with AWS Cloud Storage
Warning: If you don't have a data center, or if all of your IT infrastructure is already in the cloud, you may not need to read this post! But feel free to pass it along to your friends and colleagues.
The Storage Gateway
Our new AWS Storage Gateway service connects an on-premise software appliance with cloud-based storage to integrate your existing on-premises applications with the AWS storage infrastructure in a seamless, secure, and transparent fashion. Watch this video for an introduction:
Data stored in your current data center can be backed up to Amazon S3, where it is stored as Amazon EBS snapshots. Once there, you will benefit from S3's low cost and intrinsic redundancy. In the event you need to retrieve a backup of your data, you can easily restore these snapshots locally to your on-premises hardware. You can also access them as Amazon EBS volumes, enabling you to easily mirror data between your on-premises and Amazon EC2-based applications.
You can install the AWS Storage Gateway's software appliance on a host machine in your data center. Here's how all of the pieces fit together:
The AWS Storage Gateway allows you to create storage volumes and attach these volumes as iSCSI devices to your on-premises application servers. The volumes can be Gateway-Stored (right now) or Gateway-Cached (soon) volumes. Gateway-Stored volumes retain a complete copy of the volume on the local storage attached to the on-premises host, while uploading backup snapshots to Amazon S3. This provides low-latency access to your entire data set while providing durable off-site backups. Gateway-Cached volumes will use the local storage as a cache for frequently-accessed data; the definitive copy of the data will live in the cloud. This will allow you to offload your storage to Amazon S3 while preserving low-latency access to your active data.
Gateways can connect to AWS directly or through a local proxy. You can connect through AWS Direct Connect if you would like, and you can also control the amount of inbound and outbound bandwidth consumed by each gateway. All data is compressed prior to upload.
Each gateway can support up to 12 volumes and a total of 12 TB of storage. You can have multiple gateways per account and you can choose to store data in our US East (Northern Virginia), US West (Northern California), US West (Oregon), EU (Ireland), Asia Pacific (Singapore), or Asia Pacific (Tokyo) Regions.
The first release of the AWS Storage Gateway takes the form of a VM image for VMware ESXi 4.1 (we plan on supporting other virtual environments in the future). Adequate local disk storage, either Direct Attached or SAN (Storage Area Network), is needed for your application storage (used by your iSCSI storage volumes) and working storage (data queued up for writing to AWS). We currently support mounting of our iSCSI storage volumes using the Microsoft Windows and Red Hat iSCSI Initiators.
Up and Running
During the installation and configuration process you will be able to create up to 12 iSCSI storage volumes per gateway. Once installed, each gateway will automatically download, install, and deploy updates and patches. This activity takes place during a maintenance window that you can set on a per-gateway basis.
The AWS Management Console includes complete support for the AWS Storage Gateway. You can create volumes, create and restore snapshots, and establish a schedule for snapshots. Snapshots can be scheduled at 1, 2, 4, 8, 12, or 24 hour intervals. Each gateway reports a number of metrics to Amazon CloudWatch for monitoring.
The snapshots are stored as Amazon EBS (Elastic Block Store) snapshots. You can create an EBS volume using a snapshot of one of your local gateway volumes, or vice versa. Does this give you any interesting ideas?
The Gateway in Action
I expect the AWS Storage Gateway will be put to use in all sorts of ways. Some that come to mind are:
- Disaster Recovery and Business Continuity - You can reduce your investment in hardware set aside for Disaster Recovery using a cloud-based approach. You can send snapshots of your precious data to the cloud on a regular and frequent basis and you can use our VM Import service to move your virtual machine images to the cloud.
- Backup - You can back up local data to the cloud without worrying about running out of storage space. It is easy to schedule the backups, and you don't have to arrange to ship tapes off-site or manage your own infrastructure in a second data center.
- Data Migration - You can now move data from your data center to the cloud, and back, with ease.
Security Considerations
We believe that the AWS Storage Gateway will be at home in the enterprise, so I'll cover the inevitable security questions up front. Here are the facts:
- Data traveling between AWS and each gateway is protected via SSL.
- Data at rest (stored in Amazon S3) is encrypted using AES-256.
- The iSCSI initiator authenticates itself to the target using CHAP (Challenge-Handshake Authentication protocol).
Costs
All AWS users are eligible for a free trial of the AWS Storage Gateway. After that, there is a charge of $125 per month for each activated gateway. The usual EBS snapshot storage rates apply ($0.14 per Gigabyte-month in the US-East Region), as do the usual AWS prices for outbound data transfer (there's no charge for inbound data transfer). More pricing information can be found on the Storage Gateway Home Page. If you are eligible for the AWS Free Usage Tier, you get up to 1 GB of free EBS snapshot storage per month as well as 15 GB of outbound data transfer.
On the Horizon
As I mentioned earlier, the first release of the AWS Storage Gateway supports Gateway-Stored volumes. We plan to add support for Gateway-Cached volumes in the coming months.
We'll add more features to our roadmap as soon as our users (this means you) start to use the AWS Storage Gateway and send feedback our way.
Learn More
You can visit the Storage Gateway Home Page or read the Storage Gateway User Guide to learn more.
We will be hosting a Storage Gateway webinar on Thursday, February 23rd. Please attend if you would like to learn more about the Storage Gateway and how it can be used for backup, disaster recover, and data mirroring scenarios. The webinar is free and open to all, but space is limited and you need to register!
-- Jeff;
Launch Relational Database Service Instances in the Virtual Private Cloud
You can now launch Amazon Relational Database Service (RDS) DB instances inside of a Virtual Private Cloud (VPC).
Some Background
The Relational Database Service takes care of all of the messiness associated with running a relational database. You don't have to worry about finding and configuring hardware, installing an operating system or a database engine, setting up backups, arranging for fault detection and failover, or scaling compute or storage as your needs change.
The Virtual Private Cloud lets you create a private, isolated section of the AWS Cloud. You have complete control over IP address ranges, subnetting, routing tables, and network gateways to your own data center and to the Internet.
Here We Go
Before you launch an RDS DB Instance inside of a VPC, you must first create the VPC and partition its IP address range in to the desired subnets. You can do this using the VPC wizard pictured above, the VPC command line tools, or the VPC APIs.
Then you need to create a DB Subnet Group. The Subnet Group should have at least one subnet in each Availability Zone of the target Region; it identifies the subnets (and the corresponding IP address ranges) where you would like to be able to run DB Instances within the VPC. This will allow a Multi-AZ deployment of RDS to create a new standby in another Availability Zone should the need arise. You need to do this even for Single-AZ deployments, just in case you want to convert them to Multi-AZ at some point.
You can create a DB Security Group, or you can use the default. The DB Security Group gives you control over access to your DB Instances; you can allow access from EC2 instances with specific EC2 Security Group or VPC Security Groups membership, or from designated ranges of IP addresses. You can also use VPC subnets and the associated network Access Control Lists (ACLs) if you'd like. You have a lot of control and a lot of flexibility.
The next step is to launch a DB Instance within the VPC while referencing the DB Subnet Group and a DB Security Group. With this release, you are able to use the MySQL DB engine (we plan to additional options over time). The DB Instance will have an Elastic Network Interface using an IP address selected from your DB Subnet Group. You can use the IP address to reach the instance if you'd like, but we recommend that you use the instance's DNS name instead since the IP address can change during failover of a Multi-AZ deployment.
Upgrading to VPC
If you are running an RDB DB Instance outside of a VPC, you can snapshot the DB Instance and then restore the snapshot into the DB Subnet Group of your choice. You cannot, however, access or use snapshots taken from within a VPC outside of the VPC. This is a restriction that we have put in to place for security reasons.
Use Cases and Access Options
You can put this new combination (RDS + VPC) to use in a variety of ways. Here are some suggestions:
- Private DB Instances Within a VPC - This is the most obvious and straightforward use case, and is a perfect way to run corporate applications that are not intended to be accessed from the Internet.
- Public facing Web Application with Private Database - Host the web site on a public-facing subnet and the DB Instances on a private subnet that has no Internet access. The application server and the RDB DB Instances will not have public IP addresses.
Your Turn
You can launch RDS instances in your VPCs today in all of the AWS Regions except AWS GovCloud (US). What are you waiting for?
-- Jeff;
AWS Toolkits for Eclipse and Visual Studio Now Support DynamoDB
The AWS Toolkit for Eclipse and and the AWS Toolkit for Visual Studio now support Amazon DynamoDB.You can create tables, insert and edit data, initiate table scans, and more.
Here are some screen shots from the AWS Toolkit for Visual Studio.
Create a table:
Edit a multi-valued attribute:
Set up a table scan:
The AWS Toolkit for Visual Studio also contains the latest and greatest version of the AWS SDK for .NET. This version of the SDK includes support for Amazon DynamoDB, in the form of the Amazon.DynamoDB.DocumentModel and Amazon.DynamoDB.TableModel classes and namespaces. More information about the updates to the SDK can be found in the release notes.
Similarly, the AWS Toolkit for Eclipse contains the latest and greatest version of the AWS SDK for Java. This SDK also includes support for Amazon DynamoDB, Per the release notes, you can use the AmazonDynamoDBClient object to send requests directly to Amazon DynamoDB, or you can use the high-level API in the AWS SDK for Java to annotate your Java objects and automatically map them into Amazon DynamoDB.
-- Jeff;
Identity Federation to the AWS Management Console
In August, we announced that AWS Identity and Access Management (IAM) added support for Identity Federation. This enabled customers to use their existing identities (e.g. users) to securely access AWS APIs and resources using IAM's fine-grained access controls, without the need to create an IAM user for each identity.
Today we are announcing that we have extended IAMâs Identity Federation functionality to also enable federated users to access the AWS Management Console. This allows you to enable your employees to sign in once to your corporate directory, and then use the AWS Management Console without having to sign in to AWS, providing single sign-on access to AWS.
In my previous post on the topic of Identity Federation, I discussed how you could setup an identity broker, which calls our Security Token Service (STS), requesting temporary security credentials to provide your users access to AWS. You explicitly specify the permissions that these temporary credentials give your users, as well as control the amount of time (1 to 36 hours) these credentials are valid for. Well, these same temporary security credentials can now also be used to access the AWS Management Console.
Here's the basic flow:
User signs in to the enterprise network with their enterprise credentials.
User browses to an internal site and clicks on Sign in to AWS Management Console.Page calls identity broker. Identity broker validates access rights and provides temporary security credentials which includes the user's permissions to access AWS. The page includes these temporary security credentials as part of the sign-in request to AWS.
User is logged in to the AWS Management Console with the appropriate IAM policy.
If you have already built an identity broker, perhaps using our sample application, to enable Identity Federation to AWS service APIs for users in your enterprise directory, youâre already most of the way there. All you need to do is implement an internal web page with redirect links to the AWS Management Console, and include the temporary security credentials as part of the sign in request. Below is some simple Ruby code sample that shows how to do just that (just replace the highlighed items with your own identifiers and URLs):
- require 'rubygems'
- require 'json'
- require 'open-uri'
- require 'cgi'
- require 'aws-sdk'
- # The temporary credentials will normally come from your identity
- # broker, but for simplicity we create them in place
- sts = AWS::STS.new(:access_key_id => "*** Your AWS Access Key ID ***",
- :secret_access_key => "*** Your AWS Secret Access Key ***")
- # A sample policy for accessing SNS in the console.
- policy = AWS::STS::Policy.new
- policy.allow(:actions => "sns:*",:resources => :any)
- session = sts.new_federated_session(
- "UserName",
- :policy => policy,
- :duration => 3600)
- # The issuer parameter specifies your internal sign-in
- # page, for example https://mysignin.internal.mycompany.com/.
- # The console parameter specifies the URL to the destination tab of the
- # AWS Management Console. This example goes to the sns console.
- # The signin parameter is the URL to send the request to.
- issuer_url = "https://mysignin.internal.mycompany.com/"
- console_url = "https://console.aws.amazon.com/sns"
- signin_url = "https://signin.aws.amazon.com/federation"
- # Create the signin token using temporary credentials,
- # including the Access Key ID, Secret Access Key, and security token.
- session_json = {
- :sessionId => session.credentials[:access_key_id],
- :sessionKey => session.credentials[:secret_access_key],
- :sessionToken => session.credentials[:session_token]
- }.to_json
- get_signin_token_url = signin_url + "?Action=getSigninToken&SessionType=json&Session=" + CGI.escape(session_json)
- returned_content = URI.parse(get_signin_token_url).read
- signin_token = JSON.parse(returned_content)['SigninToken']
- signin_token_param = "&SigninToken=" + CGI.escape(signin_token)
- # The issuer parameter is optional, but recommended. Use it to direct users
- # to your sign-in page when their session expires.
- issuer_param = "&Issuer=" + CGI.escape(issuer_url)
- destination_param = "&Destination=" + CGI.escape(console_url)
- login_url = signin_url + "?Action=login" + signin_token_param + issuer_param + destination_param
You can control the user name displayed in the upper right corner of the AWS Management Console when your user logs in. You can also optionally provide an "Issuer" URL when signing your users in. This URL will then be displayed to the user when their credentials expire, so they can re-authenticate with your identity system before continuing to use the AWS Console.
The following services support Identity Federation to the AWS Management Console today: Amazon EC2, Amazon S3, Amazon SNS, Amazon SQS, Amazon VPC, Amazon CloudFront, Amazon Route 53, Amazon CloudWatch, Amazon RDS, Amazon ElastiCache, Amazon SES, Elastic Load Balancing, and IAM. We'll of course be adding support for additional service consoles over time (the busy Amazon DynamoDB team is already working on it!).
-- Jeff;
Guest Post: Geo-Blocking Content With Amazon CloudFront
Today's guest blogger is Nihar Bihani, a Product Manager on the Amazon CloudFront team.
-- Jeff;
After we launched Amazon CloudFront in November 2008, customers began asking for a way to block access to their content being delivered. We heard a variety of reasons why customers wanted to have detailed control over who is able to download their files from Amazon CloudFront. Some of the more common use cases we heard included customers wanting the ability to block content delivered by Amazon CloudFront so they could sell digital goods only to paying customers on their website, deliver training materials only to their employees and offer secure video streaming for their pay-per-view or subscription access model. We listened to their feedback and we launched Amazon CloudFrontâs private content feature in late 2009 for download content and in early 2010 for streaming content. These features help customers protect their content by restricting access based on date ranges, IP addresses, and IP address ranges.
More recently, we heard Amazon CloudFront customers ask for another method of blocking access to their content based on the geographic location of their viewers. One use case is a video publisher who may only have rights to distribute video to users in a single country and needs a way to prevent users who arenât in that country from accessing their video. Another is a software delivery company that needs to limit the downloading of their content to certain territories because of licensing terms that prevent users in certain countries from downloading their software. Weâll refer to blocking access to certain countries or territories as geo-restriction.
As a result of this customer feedback, we recently published a tutorial that shows how to add geo-restriction logic to your web application using Amazon CloudFrontâs private content feature in combination with a third party geo-location product. The geo-location product translates your end user's client IP address into an estimation of the end-userâs location. The tutorial shows you how to consume this location data and issue an Amazon CloudFront private content URL based on the results. Weâve included sample code in Java, .Net, and PHP that work with two different geo-location products.
Here's how it works:
- End user requests a webpage on your site.
- Your web server sends the end userâs IP address to a geo-location service.
- Geo-location service returns the geographic location for the end user.
- Your web server determines if the end user should have access to your content on Amazon CloudFront. If so, your webserver generates an Amazon CloudFront signed URL.
- End user browser requests the content from Amazon CloudFront using the signed URL.
Using Amazon CloudFront and a third-party geo-location service to restrict access to your content from your application also provides you with control over your end user's experience if they are restricted from access. For end users whose access is blocked, your application can display a meaningful message instead of returning an error code. You can also customize the error message you display for your end users according to their location.
You can find the tutorial here. Please take a look at let us know what you think.
Nihar Bihani
Product Manager - Amazon CloudFront
Amazon DynamoDB - Internet-Scale Data Storage the NoSQL Way
We want to make it very easy for you to be able to store any amount of semistructured data and to be able to read, write, and modify it quickly, efficiently, and with predictable performance. We don't want you to have to worry about servers, disks, replication, failover, monitoring, software installation, configuration, or updating, hardware upgrades, network bandwidth, free space, sharding, rearchitecting, or a host of other things that will jump up and bite you at the worst possible time.
We want you to think big, to dream big dreams, and to envision (and then build) data-intensive applications that can scale from zero users up to tens or hundreds of millions of users before you know it. We want you to succeed, and we don't want your database to get in the way. Focus on your app and on building a user base, and leave the driving to us.
Sound good?
Hello, DynamoDB
Today we are introducing Amazon DynamoDB, our Internet-scale NoSQL database service. Built from the ground up to be efficient, scalable, and highly reliable, DynamoDB will let you store as much data as you want and to access it as often as you'd like, with predictable performance brought on by the use of Solid State Disk, better known as SSD.
DynamoDB works on the basis of provisioned throughput. When you create a DynamoDB table, you simply tell us how much read and write throughput you need. Behind the scenes we'll set things up so that we can meet your needs, while maintaining latency that's in the single-digit milliseconds. Later, if your needs change, you can simply turn the provisioned throughput dial up (or down) and we'll adjust accordingly. You can do this online, with no downtime and with no impact on the overall throughput. In other words, you can scale up even when your database is handling requests.
We've made DynamoDB ridiculously easy to use. Newly created tables will usually be ready to use within a minute or two. Once the table is ready, you simply start storing data (as much as you want) into it, paying only for the storage that you use (there's no need to pre-provision storage).Again, behind the scenes, we'll take care of provisioning adequate storage for you.
Each table must have a primary index. In this release, you can choose between two types of primary keys: Simple Hash Keys and Composite Hash Key with Range Keys.
- Simple Hash Keys give DynamoDB the Distributed Hash Table abstraction and are used to index on a unique key. The key is hashed over multiple processing and storage partitions to optimally distribute the workload.
- Composite Hash Keys with Range Keys give you the ability to create a primary key that is composed of two attributes -- a hash attribute and a range attribute. When you query against this type of key, the hash attribute must be uniquely matched but a range (low to high) can be specified for the range attribute. You can use this to run queries such as "all orders from Jeff in the last 24 hours."
Each item in a DynamoDB table consists of a set of key/value pairs. Each value can be a string, a number, a string set, or a number set. When you choose to retrieve (get) an item, you can choose between a strongly consistent read and an eventually consistent read based on your needs. The eventually consistent reads consume half as many resources, so there's a throughput consideration to think about.
Sounds great, you say, but what about reliability and data durability? Don't worry, we've got that covered too! When you create a DynamoDB table in a particular region, we'll synchronously replicate your data across servers in multiple zones. You'll never know about (or be affected by) hardware or facility failures. If something breaks, we'll get the data from another server.
I can't stress the operational performance of DynamoDB enough. You can start small (say 5 reads per second) and scale up to 50, 500, 5000, or even 50,000 reads per second. Again, online, and with no changes to your code. And (of course) you can do the same for writes. DynamoDB will grow with you, and it is not going to get between you and success.
As part of the AWS Free Usage Tier, you get 100 MB of free storage, 5 writes per second, and 10 strongly consistent reads per second (or 20 eventually consistent reads per second). Beyond that, pricing is based on how much throughput you provision and how much data you store. As is always the case with AWS, there's no charge for bandwidth between an EC2 instance and a DynamoDB table in the same Region.
You can create up to 256 tables, each provisioned for 10,000 reads and 10,000 writes per seconds. I cannot emphasize the next point strongly enough: We are ready, willing, and able to increase any of these values; simply click here and provide us with some additional information. Our early customers have, in several cases, already exceeded the default limits by an order of magnitude!
DynamoDB from the AWS Management Console
The AWS Management Console has a new DynamoDB tab. You can create a new table, provision the throughput, set up the index, and configure CloudWatch alarms with a few clicks:
You can enter your throughput requirements manually:
Or you can use the calculator embedded in the dialog:
You can easily set CloudWatch alarms that will fire when you are consuming more than a specified percentage of the throughput that you have provisioned for the table:
You can use the CloudWatch metrics to see when it is time to add additional read or write throughput:
You can easily increase or decrease the provisioned throughput:
Programming With DynamoDB
The AWS SDKs have been updated and now include complete support for DynamoDB. Here are some examples that I put together using the AWS SDK for PHP.
The first step is to include the SDK and create a reference object:
require_once("sdk.class.php");$DDB = new AmazonDynamoDB(array('credentials' => 'production'));
Creating a table requires three arguments: a table name, a key specification, and a throughput specification:
// Create a table$Schema = array('HashKeyElement' =>
array('AttributeName' => 'RecordId',
'AttributeType' => AmazonDynamoDB::TYPE_STRING));
$Throughput = array('ReadsPerSecond' => 5, 'WritesPerSecond' => 5);
$Res = $DDB->create_table(array('TableName' => 'Sample',
'KeySchema' => $Schema,
'ProvisionedThroughput' => $Throughput));
After create_table returns, the table's status will be CREATING. It will transition to ACTIVE when the table is provisioned and ready to accept data. You can use the describe_table function to get the status and other information about the table:
$Res = $DDB->describe_table(array('TableName' => 'Sample'));print_r($Res->body->Table);
Here's the result as a PHP object:
CFSimpleXML Object(
[CreationDateTime] => 1324673829.32
[ItemCount] => 0
[KeySchema] => CFSimpleXML Object
(
[HashKeyElement] => CFSimpleXML Object
(
[AttributeName] => RecordId
[AttributeType] => S
)
)
[ProvisionedThroughput] => CFSimpleXML Object
(
[ReadsPerSecond] => 5
[WritesPerSecond] => 5
)
[TableName] => Sample
[TableSizeBytes] => 0
[TableStatus] => ACTIVE
)
It is really easy to insert new items. You need to specify the data type of each item; here's how you do that (the other data type constants are TYPE_ARRAY_OF_STRINGS and TYPE_ARRAY_OF_NUMBERS):
for ($i = 1; $i < 100; $i++){
print($i);
$Item = array('RecordId' => array(AmazonDynamoDB::TYPE_STRING => (string) $i),
'Square' => array(AmazonDynamoDB::TYPE_NUMBER => (string) ($i * $i)));
$Res = $DDB->put_item(array('TableName' => 'Sample', 'Item' => $Item));
}
Retrieval by the RecordId key is equally easy:
for ($i = 1; $i < 100; $i++){
$Key = array('HashKeyElement' => array(AmazonDynamoDB::TYPE_STRING => (string) $i));
$Item = $DDB->get_item(array('TableName' => TABLE,
'Key' => $Key));
print_r($Item->body->Item);
}
Each returned item looks like this as a PHP object:
CFSimpleXML Object(
[RecordId] => CFSimpleXML Object
(
[S] => 44
)
[Square] => CFSimpleXML Object
(
[N] => 1936
)
)
The DynamoDB API also includes query and scan functions. The query function queries primary key attribute values and supports the use of comparison operators. The scan function scans the entire table with optional filtering of the results of the scan. Queries are generally more efficient than scans.
You can also update items, retrieve multiple items, delete items, or delete multiple items. DynamoDB includes conditional updates (to ensure that some other write hasn't occurred within a read/modify/write operation as well as atomic increment and decrement operations). Read more in the Amazon DynamoDB Developer Guide.
And there you have it, our first big release of 2012. I would enjoy hearing more about how you plan to put DynamoDB to use in your application. Please feel free to leave a comment on the blog.
-- Jeff;
AWS Free Usage Tier now Includes Microsoft Windows on EC2
The AWS Free Usage Tier now allows you to run Microsoft Windows Server 2008 R2 on an EC2 t1.micro instance for up to 750 hours per month. This benefit is open to new AWS customers and to those who are already participating in the Free Usage Tier, and is available in all AWS Regions with the exception of GovCloud. This is an easy way for Windows users to start learning about and enjoying the benefits of cloud computing with AWS.
The micro instances provide a small amount of consistent processing power and the ability to burst to a higher level of usage from time to time. You can use this instance to learn about Amazon EC2, support a development and test environment, build an AWS application, or host a web site (or all of the above). We've fine-tuned the micro instances to make them even better at running Microsoft Windows Server.
You can launch your instance from the AWS Management Console:
We have lots of helpful resources to get you started:
- An updated (and even more helpful) Amazon EC2 Microsoft Windows Guide.
- Getting Started Guide: Web Application Hosting for Microsoft Windows.
- The Getting Started Guide includes a new section on Deploying a WordPress Blog.
- Our Windows and .NET Developer Center.
- A brand new AWS Microsite, with a focus on running Windows on Amazon EC2.
- Additional documentation on the AWS free usage tier, including eligibility information and some tips for making the most of it.
Along with 750 instance hours of Windows Server 2008 R2 per month, the Free Usage Tier also provides another 750 instance hours to run Linux (also on a t1.micro), Elastic Load Balancer time and bandwidth, Elastic Block Storage, Amazon S3 Storage, and SimpleDB storage, a bunch of Simple Queue Service and Simple Notification Service requests, and some CloudWatch metrics and alarms (see the AWS Free Usage Tier page for details). We've also boosted the amount of EBS storage space offered in the Free Usage Tier to 30GB, and we've doubled the I/O requests in the Free Usage Tier, to 2 million.
I look forward to hearing more about your experience with this new offering. Please feel free to leave a comment!
-- Jeff;
PS - If you want to learn more about what's next in the AWS Cloud, please sign up for our live event.
AWS Direct Connect - Now Available in Four Additional Locations
AWS Direct Connect lets you create a dedicated network connection between your office, data center, or colocation facility to an AWS Region. You might want to do this for privacy, to reduce your network costs, or to get a more consistent network experience than is possible across the Internet.
We launched AWS Direct Connect in US East (Northern Virginia) this past summer and we expanded it to Silicon Valley shortly thereafter.
Today we are making Direct Connect available in four more locations. Here's the complete list of Regions and the associated data centers:
- US East (Northern Virginia): Equinix Northern Virginia.
- US West (Northern California): Equinix San Jose, Coresite One Wilshire Los Angeles (New).
- EU West (Ireland): TelecityGroup Docklands, London (New)..
- Asia Pacific (Singapore): Equinix Singapore - (New).
- Asia Pacific (Tokyo): Equinix Tokyo - (New).
Two of the locations listed above are not in the same city as the associated AWS Region. These locations provide you with additional flexibility when connecting to AWS from those cities.
You can initiate the Direct Connect provisioning process by simply filling out a form:
-- Jeff;
Additional Reserved Instance Options for Amazon RDS
Hot on the heels of our announcement of Additional Reserved Instance Options for Amazon EC2, I would like to tell you about a similar option for the Amazon Relational Database Service.
We have added Light and Heavy Utilization Reserved Instances for the MySQL and Oracle database engines. You can save 30% to 55% of your On-Demand DB Instance costs, depending on your usage.
Light Utilization Reserved Instances offer the lowers upfront payment, and ideal for DB instances that are used sporadically for development and testing, or for short-term projects. You can save up to 30% on a 1-year term and 35% on a 3-year term when compared to the same instance on an On-Demand basis.
Medium Utilization Reserved Instances have a higher upfront payment than Light Utilization Reserved Instances, but a much lower hourly usage fee. They are suitable for workloads that run most of the time, with some variability in usage. Savings range up to 35% for a 1-year term and 48% for a 3-year term when compared to On-Demand. These are the same Reserved Instances that we have offered since August 2010.
Heavy Utilization Reserved Instances are the best value for steady-state production database instances that are destinated to be running 24x7. With this type of Reserved Instance you pay an upfront fee and a low hourly rate for every hour of the one or three year term. You can save 41% for a 1-year term and 55% for a 3-year term.
These Reserved Instance offerings allow you to optimize your costs depending on your workload. The table below shows which Amazon RDS offerings you can use to lower your RDS costs. For example, if you need a DB instance for 5 months, a Light Utilization Reserved Instance will provide you the lowest effective cost.
1-Year Term 3-Year TermOn-Demand 1-3 Months 1-4 Months Light Utilization
4-8 Months 5-12 Months Medium Utilization
9-10 Months 13-29 Months Heavy Utilization
11-12 Months 30-36 Months
Learn more about this feature and other RDS pricing options on the Amazon RDS pricing page.
As always, we enjoy lowering our prices so that AWS becomes an even better value for you.
-- Jeff;