Documents I've found helpful are:
Some highlights:
One of the most simple optimization tips to limit memory usage is to
  use the appropriate type of display object. For simple shapes that are
  not interactive, use Shape objects. For interactive objects that don’t
  need a timeline, use Sprite objects. For animation that uses a
  timeline, use MovieClip objects.
getSize() returns the size in memory of a specified object.
All primitive types except String use 4 - 8 bytes in memory.  A
  Number, which represents a 64-bit value, is allocated 8 bytes by the
  ActionScript Virtual Machine (AVM), if it is not assigned a value. 
  The behavior differs for the String type.  Benchmark code and
  determine the most efficient object for the task.
Optimize memory by reusing objects and avoid recreating them whenever
  possible.
Reusing objects reduces the need to instantiate objects, which can be
  expensive. It also reduces the chances of the garbage collector
  running, which can slow down your application.
To make sure that an object is garbage collected, delete all
  references to the object.  Memory allocation, rather than object
  deletion, triggers garbage collection.  Try to limit garbage
  collection passes by reusing objects as much as possible. Also, set
  references to null, when possible, so that the garbage collector
  spends less processing time finding the objects. Think of garbage
  collection as insurance, and always manage object lifetimes
  explicitly, when possible.
Setting a reference to a display object to null does not ensure that
  the object is frozen. The object continues consume CPU cycles until it
  is garbage collected.
BitmapData class includes a dispose() method, although the dispose
  method removes the pixels from memory, the reference must still be set
  to null to release it completely.
Using vectors, especially in large numbers, dramatically increases the
  need for CPU or GPU resources. Using bitmaps is a good way to optimize
  rendering, because the runtime needs fewer processing resources to
  draw pixels on the screen than to render vector content.
When a filter is applied to a display object, the runtime creates two
  bitmaps in memory.  Using externally authored bitmaps helps the
  runtime to reduce the CPU or GPU load.
Use mipmapping sparingly. Although it improves the quality of
  downscaled bitmaps, it has an impact on bandwidth, memory, and speed.
For read-only text, it’s best to use the Flash Text Engine, which
  offers low memory usage and better rendering. For input text,
  TextField objects are a better choice, because less ActionScript code
  is required to create typical behaviors, such as input handling and
  word-wrap.
Using the native event model can be slower and consume more memory
  than using a traditional callback function. Event objects must be
  created and allocated in memory, which creates a performance slowdown.
  For example, when listening to the Event.ENTER_FRAME event, a new
  event object is created on each frame for the event handler.
  Performance can be especially slow for display objects, due to the
  capture and bubbling phases, which can be expensive if the display
  list is complex.
Even if display objects are no longer in the display list and are
  waiting to be garbage collected, they could still be using
  CPU-intensive code.
The concept of freezing is also important when loading remote content
  with the Loader class.
unloadAndStop() method allows you to unload a SWF file,
  automatically freeze every object in the loaded SWF file, and force
  the garbage collector to run.
Event.ACTIVATE and Event.DEACTIVATE events allow you to detect
  when the runtime gains or loses focus. As a result, code can be
  optimized to react to context changes.
The activate and deactivate events allow you to implement a similar
  mechanism to the "Pause and Resume" feature sometimes found on mobile
  devices and Netbooks.
Detecting mouse interaction can be CPU-intensive when many interactive
  objects are shown onscreen, especially if they overlap.  When
  possible, consider disabling mouse interaction, which helps your
  application to use less CPU processing, and as a result, reduce
  battery usage on mobile devices.
Timers are preferred over Event.ENTER_FRAME events for non-animated
  content that executes for a long time.
A timer can behave in a similar way to an Event.ENTER_FRAME event, but an
  event can be dispatched without being tied to the frame rate. This
  behavior can offer some significant optimization. Consider a video
  player application as an example. In this case, you do not need to use
  a high frame rate, because only the application controls are moving.
Limit the use of tweening, which saves CPU processing, memory, and
  battery life helping content run faster on low-tier devices.
The Vector class allows faster read and write access than the Array
  class.
Array element access and iteration are much faster when using a Vector
  instance than they are when using an Array.
In strict mode the compiler can identify data type errors.
Runtime range checking (or fixed-length checking) increases
  reliability significantly over Arrays.
Reduce amount of code execution using drawPath(),
  drawGraphicsData(), drawTriangles() Fewer lines of
  code can provide better ActionScript execution performance.
Taking advantage of the bubbling of an event can help you to optimize
  ActionScript code execution time. You can register an event handler on
  one object, instead of multiple objects, to improve performance.
When painting pixels, some simple optimizations can be made just by
  using the appropriate methods of the BitmapData class. A fast way to
  paint pixels is to use the setVector() method.
Calling lock() and unlock() prevents the screen from being updated
  unnecessarily. Methods that iterate over pixels, such as getPixel(),
  getPixel32(), setPixel(), and setPixel32(), are likely to be slow,
  especially on mobile devices. If possible, use methods that retrieve
  all the pixels in one call. For reading pixels, use the getVector()
  method, which is faster than the getPixels() method. Also, remember to
  use APIs that rely on Vector objects, when possible, as they are
  likely to run faster.
When a String class method is available, it runs faster than the
  equivalent regular expression and does not require the creation of
  another object.
Using the appendText() method provides performance improvements.
Using the square bracket operator can slow down performance. You can
  avoid using it by storing your reference in a local variable.
Calling functions can be expensive. Try to reduce the number of
  function calls by moving code inline.
Moving the function call inline results in code that is more than four
  times faster.
Even if the off-stage elements are not shown onscreen and are not
  rendered, they still exist on the display list. The runtime continues
  to run internal tests on these elements to make sure that they are
  still off-stage and the user is not interacting with them.
When a display object uses alpha blending, the runtime must combine
  the color values of every stacked display object and the background
  color to determine the final color. Thus, alpha blending can be more
  processor-intensive than drawing an opaque color. This extra
  computation can hurt performance on slow devices.
A higher frame rate expends more CPU cycles and energy from the
  battery than a lower rate.
Runtime code execution fundamentals

This feature caches a vector object, renders it as a bitmap
  internally, and uses that bitmap for rendering.  Bitmap caching
  improves rendering if the cached content is not rotated, scaled, or
  changed on each frame. Any transformation other than translation on
  the x- and y-axes, rendering is not improved.
cacheAsBitmapMatrix in the AIR mobile profile you can apply any
  two-dimensional transformation to the object without regenerating the
  cached bitmap. You can also change the alpha property without
  regenerating the cached bitmap.
Using only a single cached bitmap is used in memory and shared by all
  instances.
This technique saves CPU resources.
The bitmap caching feature allows you to cache vector content as
  bitmaps to improve rendering performance. This feature is helpful for
  complex vector content and also when used with text content that
  requires processing to be rendered.
Alpha transparency places an additional burden on the runtime when
  drawing transparent bitmap images. You can use the
  opaqueBackground  property to bypass that, by specifying a
  color as a background.
In order to leverage GPU acceleration of Flash content with AIR for
  mobile platforms, Adobe recommends that you use renderMode="direct"
  (that is, Stage3D) rather than renderMode="gpu". Adobe officially
  supports and recommends the following Stage3D based frameworks:
  Starling (2D) and Away3D (3D).
Avoid using wmode=transparent or wmode=opaque in HTML embed
  parameters. These modes can result in decreased performance. They can
  also result in a small loss in audio-video synchronization in both
  software and hardware rendering. Furthermore, many platforms do not
  support GPU rendering when these modes are in effect, significantly
  impairing performance.
Application code in the current execution thread continues executing.
Asynchronous operations are scheduled and divided to avoid rendering
  issues. Consequently, it is much easier to have a responsive
  application using asynchronous versions of operations. See Perceived
  performance versus actual performance for more information.
Unlike bitmaps, rendering vector content requires many calculations,
  especially for gradients and complex paths that contain many control
  points. As a designer or developer, make sure that shapes are
  optimized enough.
If your application loads assets such as media or data, cache the
  assets by saving them to the local device. For assets that change
  infrequently, consider updating the cache at intervals.
Use the StageVideo class to take advantage of hardware acceleration to
  present video.
This approach takes full advantage of the underlying video hardware.
  The result is a much lower load on the CPU, which translates into
  higher frame rates on less powerful devices and also less memory
  usage.
Similar to video decoding, audio decoding requires high CPU cycles and
  can be optimized by leveraging available hardware on the device.
The AAC format offers better quality and smaller file size than the
  mp3 format at an equivalent bitrate.
Initialization function such as constructors are interpreted,
  everything else is JIT.