Blog RSS Feed Subscribe

Jordi Boggiano

Jordi Boggiano Passionate web developer, specialized in web performance and php. Partner at Nelmio, information junkie and speaker.

Categories

Typo Squatting and Packagist

Earlier this month an article was published summarizing Nikolai Philipp Tschacher's thesis about typosquatting. In short typosquatting is a way to attack users of a package manager by registering a package with a name similar to a popular package, hoping that someone will accidentally typo the name and end up installing your version of it that contains malware.

The thesis mentions https://packagist.org as a good example as we use vendor namespaces:

[...] it is much more secure, if a package is named ntschacher/GoogleScraper instead of just GoogleScraper. The reason is: If the package name is misspelled and not the author name, this will not have any consequences, because the typo version cannot be registered in this namespace, since this author name is already reserved. [...] Because package names are much longer with two attributes, it is more likely that users will copy and paste the package name instead of remembering it.

Despite this mitigating fact, it is still technically possible to squat the vendor name, so I wanted to take a look at our repository data and see if I could spot any bad actors. I wrote a script that basically does the following:

  • Read the list of all vendor names which have packages with at least 1000 downloads, as the others are unlikely targets or at least low value targets.
  • Check the levenshtein distance of every vendor name against all others.
  • If the distance is 1, then it checks for package names within those two vendors to see if they have any intersecting names. Those are then candidates for being typosquatters.

What did I find? 21 vendor pairs that conflict to some degree. Only one that looked like an actual typosquatting attempt, momolog/monolog, and it even had in the package description that it was a demonstration of typosquatting. I deleted it along with 5 others packages that were useless, but the others are still in place. A lot of it is just due to people renaming their vendor names, or simply people that picked similar names but don't seem to be abusing anything.

In the future it would be nice to automate this, or prevent the creation of vendors that are too similar to popular ones. However it is reassuring to see that there is no widespread abuse going on.

June 29, 2016 // PHP // Post a comment

PHP Versions Stats - 2016.1 Edition

Last year I posted stats about PHP versions, and the year before as well, both time in November. However this year I can't wait for November as I am curious to explore the PHP7 uptake!

A quick note on methodology, because all these stats are imperfect as they just sample some subset of the PHP user base. I look in the packagist.org logs of the last 28 days for Composer installs done by someone. Composer sends the PHP version it is running with in its User-Agent header, so I can use that to see which PHP versions people are using Composer with.

PHP usage statistics

I have two datasets, from November 2015 and today, which shows the progression of various versions. Note that the previous dataset was checking for Composer updates only, while the new one includes installs as well.

November 2015

All versions Grouped
PHP 5.5.9 29.63% PHP 5.5 50.68%
PHP 5.6.14 5.63% PHP 5.6 22.09%
PHP 5.3.3 4.60% PHP 5.4 15.86%
PHP 5.4.45 3.94% PHP 5.3 9.90%
PHP 5.6.13 3.39% PHP 7.0 1.17%

May 2016

All versions Grouped
PHP 5.5.9 11.87% PHP 5.6 39.67%
PHP 7.0.6 10.39% PHP 5.5 29.56%
PHP 5.6.20 8.41% PHP 7.0 20.24%
PHP 5.6.21 7.69% PHP 5.4 7.64%
PHP 5.6.19 4.71% PHP 5.3 2.43%

A few observations: 5.3 dropped to almost nothing which is great news! 5.4 is also down by almost 10% and is definitely on the way out. 5.5 is still big but less so, while 5.6 got a huge boost to become the main version. The big surprise is that we have 20% of PHP7 already! That is great news only six months after this major release came out.

PHP requirements in Packages

The second dataset is which versions are required by all the PHP packages present on packagist. I only check the require statement in their current master version to see what the latest is.

PHP Requirements - Current Master - May 2016 (+/- diff from November 2015)

5.2 2.51% (-0.3)
5.3 45.26% (-6.43)
5.4 31.69% (-1.76)
5.5 15.48% (+5.29)
5.6 3.52% (+1.84)
7.0 1.54% (+1.34)

A few observations: 5.3/5.4 are declining slowly, 5.5 is taking the bulk of it though which makes me a bit sad :) I wish there was more love for 7 now that it shipped in Ubuntu 16.04.

All in all, it seems like package requires are way behind actual version usage, so I would like to encourage everyone to be a bit more aggressive in bumping PHP requirements when tagging new major releases of their libs. Don't forget that the old code does not go away, it's still there to be used by people using legacy PHP versions.

June 06, 2016 // PHP // Post a comment

Goddamn it.

It's not often that one messes up really bad. But today is my day apparently.

TL;DR: I accidentally wiped a github organization that had a few popular repos on it. But it's all fixed now.

How the heck did this happen?

I was trying to remove a private repository, called nelmio, which incidentally has the same name as the organization it was in, so nelmio/nelmio. Then this happened:

  • I wanted to check repo permissions so I went to https://github.com/nelmio/nelmio/settings/collaboration then followed a team link which led to https://github.com/orgs/nelmio/teams/foo
  • Then I was like OK let's go back in the settings tab to delete this repo, except at this point the settings tab points to https://github.com/organizations/nelmio/settings/profile (i.e. the org settings not the repo)
  • So at the end of the settings tab I see the familiar red delete button, hit it, it tells me to type the repo name (nelmio) as usual, but obviously I've done this many times so I don't read the fine print. It turns out in this case it wanted me to confirm the org name and not repo name.
  • As I click "Confirm Delete" I saw that something in the message wasn't quite familiar, but then it reloaded the page and I find myself on the github home. I'm like "That's odd!", then more or less 2 seconds later this horrible feeling in my guts is confirmed, the entire org was wiped.

Mitigation

I immediately emailed GitHub support, and am still waiting for an answer. I kinda wish there was a hotline for such cases, even if it was billed 10 bucks a minute :)

After doing so, I started re-pushing repos into a new organization (nelmiobackup). I then changed the packagist.org package URLs to point to this new org, so that at least package installs should continue working relatively normally for the time being.

I did not want to re-create a nelmio org as I thought this might hamper any recovery effort by the github support folks, but someone had the great idea to do that for me, so now that the potential damage is done and they added me as owner, I forked everything from nelmiobackup to nelmio so that it's present at the old URL for people doing installs using composer.lock files that point to the old URL.

I hope GitHub will be able to fix this, but if not apparently http://ghtorrent.org/ has a ton of github data. We'll see what can be done.

Updates

Update1: GH support answered, it seems they can restore. I had to rename nelmio to nelmio-old for now to make room for the restore, so clones with old URLs will temporarily fail again.

Update2: All restored, I only have to re-create teams within the org, no biggie :) Total time was a bit over 1h from deletion, which isn't too bad in the grand scale of things.

P.S: If you feel like comparing this to the left-pad incident, this is quite different because I fucked up accidentally while the guy in question did it intentionally. I guess I can't stop you though.

P.P.S: If you want to laugh or say anything mean, please go away. I don't want to hear it right now. It is stressful enough as it is, and I am doing what I can.

May 31, 2016 // News // Post a comment

Common files in PHP packages

This one started in a peculiar way. Paul M. Jones announced a new version of his Producer tool, I had a look at it and saw that it recommended having a changelog called CHANGES.md by default. This irked me a bit because I always use CHANGELOG.md and hardly ever see that as a file name (it's the little things that matter, right?).

My first thought was to report an issue asking to change the default, but then I thought it's Paul, he will not just take my word for it, he will want hard facts. So here I am two days later. I queried GitHub's API for the file listing (only the root directory) of all PHP packages listed on packagist.org.

Continue reading...

April 21, 2016 // PHP // Post a comment

Composer goes Gold

Five years ago today, Composer was born. In some ways it feels like yesterday, at least it doesn't feel like five years went by. In other ways it seems like a lifetime ago, and I can barely remember what it was like to write PHP code without having a whole ecosystem at my fingertips.

Composer 1.0.0

Today I have the pleasure of announcing that the first 1.0.0 stable release is out and available for immediate download!

It has been a long time coming, but we fixed a few last critical issues in the last few months that finally allow me to take this step. Going forward I plan on releasing more frequently as well ;)

Update channels

One big change that happened recently is that by default the Composer installer and composer self-update both install stable releases by default. This is great to avoid bad surprises if you run self-update as part of your deployment, but it also means that the feedback loop gets longer for us when we do changes. Therefore I really hope that we can get enough people running frequent self-updates using the preview (alpha/beta/..) and especially snapshot (dev builds) channels.

My recommendation would be to run regular updates for deployment/builds to have stability, run self-update --preview in CI if you can to make sure you test at least pre-release versions. And on dev environments composer self-update --snapshot would give you the latest and shiniest Composer has to offer. This will ensure we spot regressions or mistakes as early as possible, and thus avoid breaking things in stable releases.

Composer Gold Edition

Finally, in an attempt to mark the fact that Composer has finally gone gold, I wanted to do something special.

My girlfriend had a brilliant idea, and a few days and a couple express deliveries later here we are. We made an actual Composer gold master copy of the 1.0 release, on a floppy!

Collector items are no fun if you can't collect them though, so you can head to eBay now to bid on it if you'd like to own it!

Here's to the next five years (for the 2.0, hah.)!

April 04, 2016 // News, PHP // Post a comment

Toran Proxy Updates

Over the last month I spent quite some time bringing Toran Proxy up to speed with the times, and added a few features along the way. I haven't blogged about it in a while so I thought an update was overdue.

Toran what?

First of a all a quick note about Toran Proxy, in case you don't know about it. You can check the website for details but in two words it is a way to host private packages, as well as to mirror github, packagist and others so that if they break down you can still run composer installs from your Toran setup. It is a paid product but money goes to fund Composer development and Packagist hosting as well so you will hopefully agree it is for a good cause ;)

Drupal, Magento and WordPress support

v1.3 added the capability to mirror other public repositories, like the WPackagist one for WordPress, the Firegento repo for Magento or Drupal's Packagist setup. These projects have large plugin ecosystems and they have chosen to publish them on their own repositories instead of using Packagist. Toran now lets you add these in the settings so that you can mirror public packages transparently no matter if they come from Packagist or another public repo.

Performance and UI improvements

It used to be a bit slow to run updates with many packages, as it was hitting the PHP application for every package. This has been fixed and updates should now run a lot faster.

As for UI, the new release brings an actual package detail page for your private packages so you can see which versions are available and what they require, as well as trigger instant updates from the UI.


If you haven't yet, go try it out with the personal edition and I hope you will then consider getting a license to use it in your company!

April 04, 2016 // News, PHP // Post a comment

The Road to Monolog 2.0

Monolog's first commit was on February 17th, 2011. That is almost 5 years ago! I have now been thinking for quite a while that it would be nice to start on a v2, and being able to drop some baggage.

One of the main questions when doing a major release is which minimum PHP version to support going forward. Last summer I decided I wanted to do a big jump from 5.3 and directly target PHP 7. It provides a lot of nice features as well as performance improvements, and as Monolog is one of the most installed packages on Packagist I wanted to help nudge everyone towards PHP 7.

Back then 7.0 was not out though, so I played around a bit but I did not do much progress. Another point that was limiting me was that I did not want to bother people adding Monolog to their project via composer require monolog/monolog as that used to just take the last release available.

However PHP 7.0 is now out, and as you may have seen in my previous post I have fixed the issue in composer require. I also emailed several projects that had dangerous requirements on Monolog a few months ago to ensure they would not upgrade to the 2.0 version accidentally.

The road forward

Monolog's master branch now targets PHP 7, and the branch-alias has been updated to 2.0 so work can now fully begin on the upcoming version. There is an old issue with a list of ideas and tasks for 2.0, but I am open to more ideas. There is also a 2.0 milestone with some more issues and PRs that have to be considered for inclusion.

If you use Monolog a lot and have thoughts on what should change in the design, please open an issue! If you want to help grab one of those tasks (except those that aren't clear or still need to be decided on) and send a pull request! It's a great chance to play with PHP 7 features if you haven't yet. I took care of some things already but there is plenty more to be done and I definitely can't do it alone.

A word of caution

Please check your composer.json, if you require monolog/monolog dev-master you will have issues next time you update! Please fix that immediately and use ^1.17 instead, it will ensure you don't upgrade to 2.0 accidentally.

Supporting the past

Obviously, not everyone will upgrade to PHP 7 immediately, and Monolog v2 will probably not be ready and stable for a few months, so Monolog 1 will still be maintained. I don't have a concrete date in mind of when the maintenance will stop, but it is anyway pretty stable so I don't think maintaining it will be a big deal.

There is now a 1.x branch where bug fixes and features applicable to both versions should go, and 1.x releases will be created from there in the future.

December 18, 2015 // News, PHP // Post a comment

New Composer Patterns

Here is a short update on some nice little features that have become available in the last year in Composer.

Checking dependencies for bad patterns

You may know about the composer validate command, but did you know about its new --with-dependencies / -A flag? It lets you validate both your project and all your dependencies at once!

This is quite nice to check if any of your dependencies has unsafe requirements like >= or similar issues. You can also combine it with --strict to make sure that any warning results in a failure exit code, so you can detect warnings in your CI builds for example by checking the command exit code.

Try it out: composer validate -A --strict

Referencing scripts to avoid duplication

You can now reference other scripts by name to avoid having to define the exact same script command in multiple places (e.g. post-update-cmd and post-install-cmd is a common pattern). See the docs for an example. This could be applied to the symfony standard composer.json for example. The referenced script can even be array of scripts!

Defining your target production environment in composer.json

The config.platform option lets you emulate which platform packages you have available on your prod environment. That way even if you have a more recent PHP version or are missing an extension locally for example, composer will always resolve packages assuming that you have the packages you declared installed.

Let's take a concrete example. If I am running PHP 5.6 in production but use PHP 7 to develop on my machine, I might end up installing a package that depends on PHP 7 and not notice the problem until I deploy and things break on the server. Obviously it is better to develop with the exact same versions to avoid any surprises but this isn't always practical and especially when working on open source libraries I think many don't use VMs but instead work with whatever PHP they have on their host system.

In Composer for example we want to guarantee that we at least work with php5.3, so we tell Composer to fake the PHP version to be 5.3.9 when running updates, no matter what PHP version you run it with. If we did not do this for example the symfony/console package we depend on would upgrade to v3, but as symfony/console v3 requires at least PHP 5.5 it does not happen thanks to the platform config.

Excluding paths from the optimized classmap

When you run composer dump-autoload -o to get an optimized autoloader, Composer scans all files and builds a huge classmap, even for packages that define autoload rules as psr-0 or psr-4. This is great but in some cases you have some classes in the psr-4 path that you actually don't want to be included in this optimized map. One typical example of this would be Symfony2 bundles that follow the best practices layout of having all sources at the root of the repo. In this case the psr-4 path is "" (repo root) and there is a Tests/ folder which contains the test classes. Obviously in production we don't want to include those test classes in the optimized class map as it is just a waste. Adding the second line here to the autoload config will make sure they are not included:

"autoload": {
    "psr-4": { "Nelmio\\CorsBundle\\": "" },
    "exclude-from-classmap": ["/Tests/"]
},

Requiring packages easily and safely

For quite a while now we have had the ability of running composer require some/package without specifying the version and Composer just figures out the best requirement for you. However this came with a catch, as it always picked the latest version available. This usually works but if the latest version requires a newer PHP version than what you have on your machine it would actually fail. I fixed that and it now looks at your PHP version (or config.platform.php value) to determine which is the best version to install. This is great because it enables package authors to require PHP 7 in their new package version for example and anyone using composer require will not accidentally get this newer version installed until they are ready and using PHP 7 themselves. More on that note soon!

I hope these tips helped bring a bit more attention to those cool new features we have added!

December 18, 2015 // News, PHP // Post a comment

PHP Versions Stats - 2015 Edition

It's that time of the year again, where I figure it's time to update my yearly data on PHP version usage. Last year's post showed 5.5 as the main winner and 5.3 declining rapidly. Let's see what 2015 brought.

A quick note on methodology, because all these stats are imperfect as they just sample some subset of the PHP user base. I look in the packagist.org logs of the last 28 days for GET /packages.json which represents a composer update done by someone. Composer sends the PHP version it is running with in its User-Agent header, so I can use that to see which PHP versions people are using Composer with. Of course this data set is probably biased towards development machines and CI servers and as such it should also be taken with a grain of salt.

PHP usage statistics

I have two datasets, from November 2014 and today, which shows the progression of various versions. Any version below 3% usage has been removed to keep things readable.

November 2014

All versions Grouped
Total 11556916   100.00% Total 11556916   100.00%
PHP 5.5.9 2475970 21.42% PHP 5.5 5647892 48.87%
PHP 5.4.4 1022498 8.85% PHP 5.4 3305929 28.61%
PHP 5.5.17 678997 5.88% PHP 5.3 1716653 14.85%
PHP 5.5.16 529227 4.58% PHP 5.6 886260 7.67%
PHP 5.3.3 509101 4.41%
PHP 5.3.10 479750 4.15%
PHP 5.6.0 391633 3.39%

November 2015

All versions Grouped
Total 14539303   100.00% Total 14539303   100.00%
PHP 5.5.9 4307667 29.63% PHP 5.5 7368033 50.68%
PHP 5.6.14 818735 5.63% PHP 5.6 3211919 22.09%
PHP 5.3.3 669327 4.60% PHP 5.4 2305984 15.86%
PHP 5.4.45 573003 3.94% PHP 5.3 1439061 9.90%
PHP 5.6.13 492995 3.39% PHP 7.0 169411 1.17%

And here are pretty pies thanks to Ashley Hindle

A few observations: 5.3 lost 5% which is good but now I guess we are on a long tail decline of Ubuntu 12.04 machines, plus a lot of libs still test against it on Travis which might bias the numbers a bit. 5.5 is still the major platform with a stable 50%, and 5.6 adoption gained 15% that were lost by 5.4. We also see 7.0 appearing slowly, mostly I assume from travis builds again.

PHP requirements in Packages

The second dataset is which versions are required by all the PHP packages present on packagist. I only check the require statement in their current master version to see what the latest is.

PHP Requirements - Current Master - November 2015 (+/- diff from November 2014)

5.2 1367 2.78% (-0.8%)
5.3 25376 51.69% (-16.17%)
5.4 16418 33.45% (+7.04%)
5.5 5002 10.19% (+8.18%)
5.6 826 1.68% (+1.54%)
7.0 99 0.2% (+0.2%)

A few observations: 5.3 lost quite a bit of ground but it seems to go to both 5.4 and 5.5. Given that 5.4 usage is going down quite a bit I think it's safe to go from 5.3 to 5.5 directly if you are going to bump the version requirement, or I'd even argue for 5.6 as it's usage is going up quite strongly and Ubuntu 16.04 should help that as well.

I think php 7 should be required more as well as it comes with quite a few nifty features, I would say it is a good target for a new major version of any lib, but more on that in another post.

November 23, 2015 // PHP // Post a comment

MySQL's GROUP_CONCAT limitations and cascading bad luck

We had an incident today over at Teamup (where I have worked for the last 9 months by the way:) which is worth a quick blog post if it helps save anyone from having a bad day.

We are using MySQL's GROUP_CONCAT feature to fetch a list of ids to delete when cleaning up old demo calendars. You end up with a list of ids in one row, easy to fetch, split it on commas, and done. So far so good. Then we run a few DELETE ... WHERE id IN (...) queries to clean things up in a few tables. So far so good.

However if you fail to read the fine print on the MySQL docs, you might not have seen this sentence: The result is truncated to the maximum length that is given by the group_concat_max_len system variable, which has a default value of 1024. What this means is that a query that worked just fine in testing conditions, suddenly started failing in production once the data set hit a critical size. Thanks to another stroke of bad luck, it returned a list of ids truncated right after a comma (3,4,5,) so we had an empty id in our WHERE IN (3,4,5,) clause. Unfortunately combined with the fact we had optional relations in some tables (I won't bore you with details) that empty match made it wipe about 60% of the data in those.

Thankfully we have backups on top of the DB replication which let us recover the lost data pretty quickly, and it only affected a small feature in the grand scale of things, but this could have ended much worse so it is worth pointing out a few things:

  • If you use GROUP_CONCAT and expect large amounts of data returned, make sure to increase the limit before executing your query. For example this sets both the max length for the group concat and the max packet length (which caps the former) to 10MB SET SESSION group_concat_max_len = 10485760, SESSION max_allowed_packet = 10485760;. Use more if you think you need more.
  • Maybe for safety using GROUP_CONCAT should be avoided if you don't know how much data to expect, simply fetching ids and then fetching all rows at the program level does the job too.
  • Do snapshot backups even if you have replication in place, it can save your ass!

And now to hope for a more quiet rest of the week!

Edit: There is some good news, MySQL 5.8 might include a fix and turn the current warning for truncation into an error, see http://bugs.mysql.com/bug.php?id=78041

August 11, 2015 // PHP, Web // Post a comment

[1] 2 3 4 5 6 Older entries > Last page