Blog

Installing Cocoon on Ubuntu

10/11/10 -- This guide was written for Intrepid, and doesn't work on the latest Ubuntu releases. An updated and working version of the guide is available here.
This guide was prepared with help from a guide written on the GSLIS wiki by Wendell Piez. If you try it and something doesn't work, please e-mail me (quinnd -at- uchicago +dot+ edu). This document is licensed Creative Commons Attribution.
No knowledge of Ubuntu or Unix is assumed; the intended audience is someone who's managed to install Ubuntu and isn't too intimidated by the Terminal.

Step 1: Installing Java SDK

  1. Open the Terminal (Applications > Accessories > Terminal)
  2. Type: sudo apt-get install sun-java6-jdk
    (Hint: You can copy and paste, but in Terminal, pasting is Ctrl + Shift +C)
  3. There'll be some downloading, you'll have to scroll through a long TOS and agree to it, but then it will install on its own.
  4. Close Terminal.

Step 2: Installing Maven
Derived from Maven in Five Minutes.

  1. Download Maven (choose tar.bz2)
  2. Open the tar.bz file; by default, it probably saved to the Desktop
  3. Extract it to the Desktop
  4. Open the Terminal again and type: cd /usr/local

    sudo mkdir apache-mavenAt this point, the Terminal will ask you for your sudo password. It's the same as the password you use to log in to Ubuntu. Then:
    cd /home/YOUR_USER_NAME/Desktop (be sure to replace YOUR_USER_NAME with your user name)

    sudo mv apache-maven-2.0.9 /usr/local/apache-maven

    export M2_HOME=/usr/local/apache-maven/apache-maven-2.0.9

    export PATH=$M2:$PATH

  5. Cross you fingers and type:
    mvn --version
  6. If everything worked right, it should display information about the version of Maven you have installed.
  7. Close Terminal.

Step 3: Installing Cocoon

Derived from Your first Cocoon application.

  1. Make a directory for your Cocoon install.
    • Using the GUI: go to Places > Home Folder, then in that new window, File > Create Folder.
    • Or, open the Terminal:mkdir cocoon
    • In the following text, I'm assuming you make a folder called cocoon in your Home Folder; if you give it a different name or put it somewhere else, you'll have to change the commands accordingly.
  2. Open the Terminal and change directory to your cocoon folder:cd cocoon

    mvn archetype:generate -DarchetypeCatalog=http://cocoon.apache.org

  3. This begins the install process.
    • For archetype, choose 2
    • Define value for groupId: - This should be a unique value. A classic value to use is, if you own the namespace myurl.com, you could type com.myurl
    • Define value for artifactId: cocoon
    • Define value for version: 1.0-SNAPSHOT: 1.0.0
    • Define value for package: - groupID.cocoon (i.e. com.myurl.cocoon)
  4. After everything's done installing, you should see [INFO] BUILD SUCCESSFUL
  5. Make sure you're in your cocoon directory in Terminal (does it say ~/cocoon$ right before the cursor?), and type mvn jetty:run
  6. There'll be a lot more installing, but it should conclude with [INFO] Started Jetty Server
  7. Open a browser and go to http://localhost:8888/cocoonTest - you should see a message saying Apache Cocoon: Welcome

Step 4: Cocoon Add-ons
There are a couple add-ons for Cocoon that are essentials-- like generators for HTML. If you want to use XSLT 2.0, Saoxn 9 is also critical. Posibly less important are the FOP processor (to generate PDFs from XSL-FO), Batik (for SVG) and Forms (to genrate forms). If you don't need to use XSLT 2.0, you can skip the first part of this section.

  1. Installing Saxon 9 - a good idea
    1. Open your cocoon directory and navigate to src/main/resources/META-INF/cocoon
    2. Create directory avalon
    3. Create the following files in Text Editor (Applications > Accessories > Text Editor), and place them in the avalon directory:
      1. File named cocoon-core-xslt-saxon.xconf


        class="org.apache.cocoon.components.xslt.TraxProcessor">


      2. File named sitemap-transformers-saxon-transformer.xconf




        saxon


    4. Download http://prdownloads.sourceforge.net/saxon/saxonb9-1-0-5j.zip
    5. Extract the zip file to you Home folder; you can delete everything but saxon9.jar
    6. Open a new Terminalcd cocoon

      mvn install:install-file -DgroupId=net.sf.saxon -DartifactId=saxon -Dversion=9.1.0.5 -Dpackaging=jar -Dfile=../saxon9.jar

    7. Go to cocoon and open pom.xml
    8. At the bottom of , add:


      net.sf.saxon
      saxon
      9.1.0.5

    9. If for some reason you only want Saxon 9 and not the ability to generate HTML, skip to the bottom of this section
  2. Installing HTML support
    • Still in pom.xml, at the bottom of , add:


      org.apache.cocoon

      cocoon-html-impl
      1.0.0

  3. Installing FOP (for PDFs)
    • Still in pom.xml, at the bottom of , add:


      org.apache.cocoon
      cocoon-fop-impl
      1.0.0

  4. Installing Batik (SVG)
    • Still in pom.xml, at the bottom of , add:


      org.apache.cocoon

      cocoon-batik-impl
      1.0.0

  5. Installing Forms
    • Still in pom.xml, at the bottom of , add:


      org.apache.cocoon
      cocoon-forms-impl
      1.0.0-RC1

  6. There's a list of all blocks, and the syntax for the dependency code is in there someplace.
  7. Once you're done adding dependencies:
    1. If you have a Terminal open with [INFO] Started Jetty Server, close it.
    2. Open a new Terminalcd cocoon

      mvn compileAfter it's done...
      mvn jetty:run

Redirecting the Sitemap
You can add your pipelines to the sitemap.xmap in cocoon/src/main/resources/COB-INF, or (more conveniently) you can tell that base sitemap to look elsewhere for your files.
I'm assuming here that you have a folder called myproject in your Home folder where you have all your files and your sitemap. Please change that, and your user name, accordingly.

Included here is also the code to generate more useful error messages than a blank pages.
In sitemap.xmap in cocoon/src/main/resources/COB-INF, at the bottom of the






src="/home/YOUR_USER_NAME/myproject/sitemap.xmap" reload-method="synchron"/>

In this case, your project will be found at http://localhost:8888/cocoonTest/myproject/[things that match your pipelines]. But it doesn't have to match the folder name with your files. You can change the URL by chanigng to

Hints and Tips
Every time you restart Ubuntu, you have to restart Cocoon:
cd cocoon

mvn jetty:run
Be sure to keep that Terminal window open while you're working with Cocoon. You can always check if Cocoon is working by going to: http://localhost:8888/cocoonTest.

Tags: 

Bulgarian Dialect Atlas at the 17th Meeting of the Balkan and South Slavic

Andy and I will be demoing the Bulgarian Dialect Atlas on April 17th at the 17th Balkan and South Slavic Conference at the Ohio State University. The slides and handout will be posted after the talk.

While there hasn't been much additional development on the project since its presentation at Balisage last summer, this will be the first presentation for a Slavist audience.

Below is the abstract we submitted for the talk:

An XML-­Based Approach to Dialectological Data: The Development of Syllabic Liquids in Bulgarian

The reflexes of syllabic liquids (hereafter CrC) in East South Slavic are strikingly diverse and therefore of interest for linguists working on a wide range of topics. In particular, the distribution of CrC reflexes in standard Bulgarian has been a recurring topic in the phonological literature, due to the empirical observation that the place of the vowel (/ăr/ versus /ră/ or /ăl/ versus /lă/) is conditioned by the syllable structure (Scatton 1974, Scatton 1976, Petrova 1993, Barnes 1997). In this paper, we present a tool to facilitate the examination and analysis of CrC reflexes across the dialects of Bulgarian.

This tool builds upon the word lists in the Bulgarian Dialect Atlas (BDA) by providing more accessible interfaces to the data. The words have been transcribed and marked up using XML to indicate lexeme, reflex, and place of stress (where applicable). Each site is listed with its associated words and geographic coordinates. This metadata is leveraged using XSLT stylesheets to generate views onto the data that would not previously have been possible. Each site has its own profile that shows what percentage of the tokens have which reflex, lists all tokens, and notes tokens of the same lexeme that have different reflexes. The profile for each reflex shows what percentage of sites have that reflex, which reflexes co­-occur with it, and which lexemes have the given reflex and a different reflex within a single site. One of the views onto the lexemes is a sort based on how many reflexes are attested for a given lexeme, which provides insight into the lexical diffusion of reflexes. The token view identifies where a token is the unique carrier of its reflex. Dynamically generated maps are provided for most views, using color­-coded location markers that better capture the nuances of the data than those found in the printed atlas.

This allows for an extremely detailed micro-analysis of the dynamics of lexical diffusion involved in the development of Bulgarian CrC reflexes, while providing macro-analytic tools that facilitate the identification of larger­-scale trends in the data. The enhanced ability that this tool provides to identify locally divergent geographical points enables the easier identification of areas that may be of interest for more in­-depth research. The ability to compare CrC reflexes in different environments makes it more feasible to track regional variation not just in the specific tokens attested in the BDA, but also, when multiple reflexes are found, to characterize the functioning of each reflex within the overall grammatical structure of any given dialect. These features will be of use in future research on this topic by enabling the inclusion of Bulgarian dialect data to an extent that was previously not feasible. We will also discuss the applicability of similar markup schemes to other types of data sets.

References

  • Barnes, Jonathan. 1997. “Bulgarian Liquid Metathesis and Syllabification in OT.” in Bošković, Željko, Steven Franks, and William Snyder, eds. Annual Workshop on Formal Approaches to Slavic Linguistics: the Connecticut Meeting: 38­-53.
  • Petrova, Rossina. 1993. “Prosodic Theory and Schwa Metathesis in Bulgarian.” in Avrutin, Sergey, Steven Franks, and Ljiljana Progovac, eds. Annual Workshop on Formal Approaches to Slavic Linguistics: the MIT Meeting: 319-­340.
  • Scatton, Ernest. 1974. “Metathesis of Liquids and [Ъ] and the Bulgarian Verb.” in V Pamet na Prof. Dr. St. Stojkov – Ezikovedski Izsledvanija: 87­-90.
  • Scatton, Ernest. 1976. “Liquids, schwa, and vowel­-zero alternations in modern Bg.” in Butler, ed. Bulgaria Past and Present. Columbus: 323­-327.
  • Stojkov et al., ed. 1964­1975. Bălgarski dialekten atlas. BAN: Sofia.

Tags: 

Do It With Drupal: Drupal In The Cloud

Josh Koenig
drupal.org/user/3313
josh - at - chapterthree.com
getpantheon.com
About the cloud

  • "Cloud" as new model for hosting
  • Traditional hosting = real estate (rack space)
  • Most real estate customers are renters, few love their landlord - landlords sometimes cut corners and do the bare minimum to keep you happy... but you need this
  • Owning comes with lots of responsibilities and hidden costs
  • Large scale projects are expensive, slow, and prone to setbacks
  • "The Cloud" = hosting as an API: on-demand availability
  • Hourly pricing
  • Reliable, reusable start-states: people make mistakes vs. programs that do things and you know exactly what they're going to give you
  • You can say: I want a new server, here's the distro, here's the information, here's the configuration - and I want five of them
  • The cloud = less waste, more freedom, flexibility... but not a silver bullet
  • Performance can vary (don't use it for scientifically accurate benchmarks)
  • Abstractions aren't the same as the real thing (not the same as physical servers - but for what it's worth this hasn't been a problem for Drupal)
  • New tricks to learn - power of API
  • The Cloud is Drupal's destiny - increasing Drupal's reach; you can start with pennies, scale to millions
  • Create products cheaply
  • Grow organically, but still grow fast

Launch a server in the cloud

  • ElasticFox - Amazon control panel for Firefox
  • Amazon just added locations for US west coast
  • Pantheon project: create images for cloud services that are targeted towards Drupal
  • Three images: high performance production hosting image (all the tricks already done), another for an Aegir, another for a continuous integration environment for Drupal
  • Grand vision for world-class Drupal infrastructure for pennies an hour
  • High performance production has the most work since people have been the most interested
  • Ubuntu 9.04 base config, whole LAMP stack, Pressflow pre-installed, memcached, APC, all of it is already there
  • Can monitor processes, do everything you like to do as root
  • v0.8.1 beta - but people are using it in production (in spite of disclaimer)

Who are the cloud providers

  • AWS: most mature, a lot of features, still moving quickly, added a load balancer earlier in the year; they're a utility, not interested in your particular use case; they don't tell people what they're working on or how it works
  • AWS has infrastructure for giving away free images - most don't
  • Rackspace - has Rackspace Cloud Sites (you don't get root, you put your Drupal in there, they scale it for you with mixed results); scaling any particular site requires deep knowledge of it; Rackspace Cloud Servers is better (Slicehost is built on top of Rackspace Cloud Servers)
  • Rackspace is looking to break into the space; willing to do deals, talk to you, etc
  • Voxel: smaller/smarter, also in Asia; cloud product just emerging from beta, but it's good - also lets you intermingle cloud and physical infrastructure
  • And more every day!
  • VPS is becoming quite cloudy (linode.com, slicehost, vps.net)
  • Custom/managed cloud services (security, regulatory compliance issues - people will build a cloud for you: Eucalyptus, Neospire, others)
  • Cloud value-adders: Rightscale, Scalr - cloud/cluster management services
  • Cloudkick - cross-cloud services, managing different cloud providers (want to be able to move servers from one service to another); it's free; open-source LibCloud project to prevent people from getting locked into one provider
  • Cloud tools for Drupal - getpantheon.com

Questions

  • How do you do a cost-analysis? You probably won't see the financial benefits right away, if you're going to leave it on all the time. But scaling with changing use patterns, adding/removing new instances.
  • Cost/benefit comes in disk speed performance - most cloud providers have poorer I/O performance than a physical server
  • How do you solve that problem for Drupal? - All performance/scalability work is about making Drupal do less work
  • Oriented around Drupal doing only what it needs to, and not bogging it down with things like showing the user the same page he saw a minute ago
  • Database replication for read-only queries
  • Use other tools that are better at repeated-action type jobs for those things

What is it good for

  • Testing/continuous integration
  • testing.drupal.org (Drupal testing Drupal) - not in the cloud, but will soon release cloud image of it
  • People can spin these up if Drupal finds itself in a testing bottleneck, just for the day
  • Development infrastructure: new server for each site
  • Putting things like version control (unfuddle, beanstalk)
  • Products and services: Lefora (forums), crowdfactory, olark (start with pennies, scale to millions)
  • Database layer for Drupal can be a choke point - you can duplicate it
  • High availability production hosting: Acquia is on EC2
  • Most cloud infrastructure isn't cheap at this level (running many servers, keeping them always on-line), if you're really big you'll find yourself at the top end going to traditional managed hosting because there's some levels of performance that are capped by the virtualization layer
  • Control costs for traffic patterns - geographically centralized audience for most people
  • Turning things on and off to deal with daily peaks - two more servers only on during the day
  • Instances fail, though not much more often than real servers (and remember that instances exist on real servers that do break)
  • Performance can be impacted by other local activity
  • Virtual disks tend to have relatively poor I/O performance
  • Accept the inevitability of failure, embrace the paradigm of "rapid recovery", develop architecture with modular, replaceable parts (images for each server), minimize disk/CPU utilization for menial tasks
  • "RAM is cheap" - the more you can push to things that read/write out of memory, the better

Production hosting in the cloud

  • Monitor your load - you have to look more carefully than just hits
  • Spin up more instances (scale horizontally) as you need more power
    • How does this work?
    • Could be manual process ("we need a server, let's do it") - does need some manual intervention somewhere, though in theory you could script it
    • Amazon offers an auto-scaling feature (when we need more, add servers, up to X number of features - Amazon AutoScale)
    • AutoScale is simple (doesn't cost anything, too)
    • How does this work? How do the pieces work together?
    • You need to have an image with all the pieces needed at the system level; you should use version control and have a boot script as part of the image (when the image start, the script checks out the current code base from the database and all the necessary connections), then AutoScale makes the pieces aware of what's out there
    • You can also do load balancing more manually
    • Role of sysadmin is changing - new set of things where now you don't have to worry about hard drives, but scaling up/down, saving money
    • When you're doing horizontal scaling, you trigger your image to be built, it checks out the code; Amazon also offers virtual drive service (if you're working with an application with a lot of data in file system) - can connect that data quickly
    • Bake in as much as you can to the image, then have automatic processes that fire that get the latest information, check it into infrastructure, start distributing load there
  • Add layers (scale vertically) when bottlenecks emerge
  • Create images for each layer in your infrastructure
  • Use best practices to keep things speedy

About best practices

  • Front-side caching: use Pressflow with Varnish and/or NgineX (Drupal 7 will support some of this natively)
  • Drupal is slow: complex, wonderful, brainy tool - if you're looking at the same thing over and over again, go get a tool that does only that, and quickly
  • Use APC and/or Memcached to minimize queries and the database to eliminate costly unserialize() calls
  • Drupal's native caches are good, but it does it in the database (this isn't the highest performance option, serializing/unserializing big arrays/objects)
  • Architect for vertical scaling by utilizing all service layers, even if it's one box
  • This is what "Mercury" is about
  • CREAM: Cache rules everything around me

Mercury

  • Freely available on Amazon, as VMWare image, in as many ways as we can
  • Also on-demand as a service
  • "Drupal hosting, 200 times faster"
  • Standardized high-performance stack: single server image with everything you want for cluster infrastructure
  • Features: Varnish, HTTP/PHP, APC Cache, Apache Solr, MySQL
  • Make Drupal run fast, hold up under large traffic spikes
  • From one box to cluster
  • If you're running all four layers and are still falling down, or you're doing something horribly write (Twitter) or horribly wrong (all code embedded in php content nodes)

Questions

  • Mercury: going to implement configuration management system (BCFG2, probably)
  • Mercury/Pantheon - not Amazon-centric, can roll the stack out anywhere (physical hardware, whatever)
  • You'd probably make your own variant image, and sync as necessary using the configuration management system
  • If you haven't customized things heavily, you can take the latest version of Mercury, re-apply changes, and you're done (if you don't want to use the config management)
  • You can keep old images around for pennies a month

Tags: 

Pages

Subscribe to Blog