Last edited 5 months ago
by Max Mustermann

ExternalContent space

Seite veraltetDiese Seite ist veraltet. Bitte prüfen Sie diese Seite und geben Sie diese frei.


https://github.com/hallowelt/migrate-confluence/blob/main/README.md

This is my text


# Migrate Confluence XML export to MediaWiki import data

This is a command line tool to convert the contents of a Confluence space into a MediaWiki import data format. See also the [official BlueSpice Helpdesk entry](https://en.wiki.bluespice.com/wiki/Confluence_migration).

## Docker
 The migrate confluence tool is available as docker image.

## Workflow

### Export "space" from Confluence
1. Create an export of your confluence space

Step 1:

<kbd>![Export 1][c001]</kbd>

Step 2:

<kbd>![Export 2][c002]</kbd>

Step 3:

<kbd>![Export 3][c003]</kbd>

2. Save it to a location that is accessbile by this tool (e.g. `/tmp/confluence/input/Confluence-export.zip`)
3. Extract the ZIP file (e.g. `/tmp/confluence/input/Confluence-export`)
	1. The folder should contain the files `entities.xml` and `exportDescriptor.properties`, as well as the folder `attachments`

[c001]: doc/images/Confluence_export_space_001.png
[c002]: doc/images/Confluence_export_space_002.png
[c003]: doc/images/Confluence_export_space_003.png

### Migrate the contents
1. Create the "workspace" directory (e.g. `/tmp/confluence/workspace/` )
2. From the parent directory (e.g. `/tmp/` ), run the migration commands
	1. Run `docker run -v $(pwd)/confluence:/data bluespice/migrate-confluence:latest analyze --src=/data/input --dest=/data/workspace` to create "working files". After the script has run you can check those files and maybe apply changes if required (e.g. when applying structural changes).
	2. Run `docker run -v $(pwd)/confluence:/data bluespice/migrate-confluence:latest extract --src=/data/input --dest=/data/workspace` to extract all contents, like wikipage contents, attachments and images into the workspace
	3. Run `docker run -v $(pwd)/confluence:/data bluespice/migrate-confluence:latest convert --src=/data/workspace --dest=/data/workspace` (yes, `--src /data/workspace/` ) to convert the wikipage contents from Confluence Storage XML to MediaWiki WikiText. For large spaces, see [Parallel convert](#parallel-convert) below.
	4. Run `docker run -v $(pwd)/confluence:/data bluespice/migrate-confluence:latest compose --src=/data/workspace --dest=/data/workspace` (yes, `--src /data/workspace/` ) to create importable data

If you re-run the scripts you will need to clean up the "workspace" directory!

### Import into MediaWiki
1. Copy the diretory "workspace/result" directory (e.g. `/tmp/confluence/workspace/result/`) to your target wiki server (e.g. `/tmp/result`)
2. Go to your MediaWiki installation directory
3. Make sure you have the target namespaces set up properly. See `workspace/space-id-to-prefix-map.php` for reference.
4. Make sure [$wgFileExtensions](https://www.mediawiki.org/wiki/Manual:$wgFileExtensions) is setup properly. See `workspace/attachment-file-extensions.php` for reference.
5. Use `php maintenance/importImages.php /tmp/result/images/` to first import all attachment files and images
6. Use `php maintenance/importDump.php /tmp/result/pages.xml` to import the actual pages

You may need to update your MediaWiki search index afterwards.

#### Config file
It is possible to use a yaml file to configure the commands analyze, extract and convert. As an example see `/doc/config.sample.yaml`.
The configuration file can be applied by adding the option `--config /data/config.yaml`.

Not all parameters of `config.sample.yaml` have to be used in the config file. If something is not part of it the default will be used.

#### Parallel convert

For large Confluence spaces the `convert` step can be slow. You can speed it up by running multiple worker processes in parallel using the `--workers` option.

```bash
docker run -v $(pwd)/confluence:/data bluespice/migrate-confluence:latest convert \
  --src=/data/workspace --dest=/data/workspace \
  --workers=4
```

The command spawns the requested number of child processes automatically. Each worker handles a disjoint slice of the file list, so every file is converted exactly once. Progress lines are prefixed with `[Worker N]` so you can follow each process individually. If any worker fails the command exits with a non-zero status and reports which workers were affected.

Choose `--workers` based on the number of available CPU cores. A value between 2 and 8 is typical; there is no benefit in exceeding the number of cores on your machine.

> **Note:** `--workers=1` (the default) behaves identically to running without the option — no child processes are spawned.

#### Extension:NSFileRepo compatibility
There is now a compatibility for the mediawiki extension https://www.mediawiki.org/wiki/Extension:NSFileRepo which restricts access files and images to a given set of user groups associated with protected namespaces.

If NSFileRepo is used the upload of the images can not be done with the script `maintenance/importImages.php` but with `extensions/NSFileRepo/maintenance/importFiles.php`.

Example: `php extensions/NSFileRepo/maintenance/importFiles.php /tmp/result/images/`

#### User spaces
In confluence user spaces are protected. In MediaWiki this is not possible for namespace `User`. Therefore user spaces are migrated to a namespace `User<username>` which can be protected in `BlueSpice for MediaWiki`.

#### Included MediaWiki wikitext templates
- `AttachmentsSectionEnd`
- `AttachmentsSectionStart`
- `Details`
- `DetailsSummary`
- `Excerpt`
- `ExcerptInclude`
- `Info`
- `InlineComment`
- `Layout`
- `Layouts.css`
- `Note`
- `Panel`
- `RecentlyUpdated`
- `SubpageList`
- `SubpageListRow`
- `Tip`
- `Warning`
- `PageTree`
- `SpaceDetails`
- `ViewFile`

Be aware that those pages may be overwritten by the import if they already exist in the target wiki.

#### Included upload files
- `Icon-info.svg`
- `Icon-note.svg`
- `Icon-tip.svg`
- `Icon-warning.svg`

Be aware that those files may be overwritten by the import if they already exist in the target wiki.

#### MediaWiki settings
In case your pages contain a lot of external images (`<img />` elements), be aware that MediaWiki does not show them by default. You'd need to configure `$wgAllowExternalImages`.
Read https://www.mediawiki.org/wiki/Manual:$wgAllowExternalImages for more information.

#### Jira interwiki links
Confluence pages that contain Jira macros are converted to use MediaWiki [interwiki links](https://www.mediawiki.org/wiki/Manual:Interwiki). Two separate prefixes are used because Jira issue keys and JQL queries have different URL patterns:

| Interwiki prefix | Purpose | Example URL pattern |
|---|---|---|
| `jira` | Link to a specific Jira issue by key | `https://jira.example.com/browse/$1` |
| `jira-jql` | Link to a Jira issue list filtered by JQL | `https://jira.example.com/issues/?jql=$1` |

Add both entries to the `interwiki` table of your MediaWiki database, or configure them via [`$wgExtraInterlanguageLinkPrefixes`](https://www.mediawiki.org/wiki/Manual:$wgExtraInterlanguageLinkPrefixes) and the interwiki cache. Replace `https://jira.example.com` with the base URL of your Jira instance.

#### Required MediaWiki extensions
The output generated by the tool contains certain elements that need additonal extensions to be enabled.

1. [TemplateStyles](https://www.mediawiki.org/wiki/Extension:TemplateStyles)
2. [ParserFunctions](https://www.mediawiki.org/wiki/Extension:ParserFunctions)
3. [DateTimeTools](https://www.mediawiki.org/wiki/Extension:DateTimeTools)
4. [Checklists](https://www.mediawiki.org/wiki/Extension:Checklists)
5. [SimpleTasks](https://www.mediawiki.org/wiki/Extension:SimpleTasks)
6. [EnhancedUploads](https://www.mediawiki.org/wiki/Extension:EnhancedUploads)
7. [Semantic MediaWiki](https://www.semantic-mediawiki.org/wiki/Semantic_MediaWiki)
8. [HeaderTabs](https://www.mediawiki.org/wiki/Extension:HeaderTabs)
9. [SubPageList](https://www.mediawiki.org/wiki/Extension:SubPageList)
9. [TableTools](https://www.mediawiki.org/wiki/Extension:TableTools)

#### Recommended MediaWiki extensions
These extensions are not strictly required but are recommended for full compatibility with the migrated content.

1. [WikiMarkdown](https://www.mediawiki.org/wiki/Extension:WikiMarkdown) - Renders `<markdown>` tags produced from Confluence markdown macros

### Manual post-import maintenance
#### Cleanup Categories
In the case that the tool can not migrate content or functionality it will create a category, so you can manually fix issues after the import
- `Broken_link`
- `Broken_user_link`
- `Broken_page_link`
- `Broken_image`
- `Broken_layout`
- `Broken_macro/<macro-name>`


## Not migrated
- User identities
- Comments
- Various macros
- Various layouts
- Blog posts
- Files of a space which can not be assigned to a page

## Creating a build
1. Clone this repo
2. Run `composer update --no-dev`
3. Run `box compile` to actually create the PHAR file  in `dist/`. See also https://github.com/humbug/box

# TODO
* Reduce multiple linebreaks (`<br />`) to one
* Remove line breaks and arbitrary fromatting (e.g. `<b>`) from headings
* Mask external images (`<img />`)
* Preserve filename of "Broken_attachment"
* Merge multiple `<code>` lines into `<pre>`
* Remove bold/italic formatting from wikitext headings (e.g. `=== '''Some heading''' ===`)
* Fix unconverted HTML lists in wikitext (e.g. `<ul><li>==== Lorem ipsum ====</li><li>'''<span class="confluence-link"> </span>[[Media:Some_file.pdf]]'''</li></ul><ul>`)
* Remove empty confluence storage format fragments (e.g. `<span class="confluence-link"> </span>`, `<span class="no-children icon">`)

1 Embed link to json file and line numbers[edit | edit source]

https://github.com/hallowelt/mwstake-mediawiki-component-generictaghandler/blob/master/rest-routes.json

[
	{
		"method": "GET",
		"path": "/mws/v1/tags",
		"class": "MWStake\\MediaWiki\\Component\\GenericTagHandler\\Rest\\ListTagsHandler",
		"services": [ "MWStake.GenericTagHandler.TagFactory", "MWStake.InputProcessor.Factory" ]
	},
	{
		"path": "/mws/v1/tags/parse/{tag}",
		"method": "POST",
		"class": "MWStake\\MediaWiki\\Component\\GenericTagHandler\\Rest\\RenderTagHandler",
		"services": [ "MWStake.GenericTagHandler.TagFactory", "TitleFactory" ]
	}
]


2 Embed link to js file with lines[edit | edit source]

var cache = { // eslint-disable-line no-var
	data: {},
	set: function ( key, data ) {
		cache.data[ key ] = data;
	},
	get: function ( key, defaultValue ) {
		return cache.data[ key ] || defaultValue;
	},
	has: function ( key ) {
		return cache.data[ key ] !== undefined;
	},
	delete: function ( key ) {
		if ( cache.has( key ) ) {
			delete ( cache.data[ key ] );
		}
	},
	getCachedPromise: function ( key, callback ) {
		if ( cache.has( key ) ) {
			return cache.get( key );
		}
		const promise = callback();
		cache.set( key, promise );
		promise.done( () => {
			cache.delete( key );
		} );

		return promise;
	}
};

function querySingle( store, property, value, cacheKey, recache, additionalParams ) {
	const dfd = $.Deferred();
	if ( !value || typeof value !== 'string' || value.length < 2 ) {
		return dfd.resolve( {} ).promise();
	}

	if ( !recache && cache.has( cacheKey ) ) {
		dfd.resolve( cache.get( cacheKey ) );
		return dfd.promise();
	}
	mws.commonwebapis[ store ].query( '', Object.assign( {
		filter: JSON.stringify( [
			{
				type: 'string',
				value: value,
				operator: 'eq',
				property: property
			}
		] ),
		limit: 1
	}, additionalParams || {} ) ).done( ( response ) => {
		if ( response.length > 0 ) {
			dfd.resolve( response[ 0 ] );
			return;
		}
		dfd.resolve( {} );
	} ).fail( ( err ) => {
		dfd.resolve( err );
	} );

	return dfd.promise();
}

function queryStore( store, params, cacheKey ) {
	const dfd = $.Deferred();
	const req = $.ajax( {
		method: 'GET',
		url: mw.util.wikiScript( 'rest' ) + '/mws/v1/' + store,
		data: params
	} ).done( ( response ) => {
		if ( response && response.results ) {
			for ( let i = 0; i < response.results.length; i++ ) {
				const result = response.results[ i ];
				if ( !cacheKey ) {
					continue;
				}
				// Replace named placeholders in curly braces with actual values
				const key = cacheKey.replace( /\{([^}]+)\}/g, ( match, p1 ) => result[ p1 ] );
				// if cache key contains a placeholder that is not in the result, skip
				if ( key.indexOf( '{' ) !== -1 ) {
					continue;
				}
				cache.set( key, result );
			}
			dfd.resolve( response.results );
			return;
		}
		dfd.resolve( [] );
	} ).fail( ( err ) => {
		dfd.resolve( err );
	} );
	return dfd.promise( { abort: function () {
		req.abort();
	} } );
}

mws = window.mws || {};
mws.commonwebapis = {
	user: {
		query: function ( query, params ) {
			if ( query ) {
				params = ( params || {} ).query = query;
			}
			return queryStore( 'user-query-store', params, 'user-data-{user_name}' );
		},
		getByUsername: function ( username, recache ) {
			return cache.getCachedPromise( 'promise-user-data-' + username, () => querySingle( 'user', 'user_name', username, 'user-data-' + username, recache ) );
		}
	},
	group: {
		query: function ( query, params ) {
			if ( query ) {
				params = ( params || {} ).query = query;
			}
			return queryStore( 'group-store', params, 'group-{group_name}' );
		},
		getByGroupName: function ( groupname, recache ) {
			return cache.getCachedPromise( 'promise-group-data-' + groupname, () => querySingle(
				'group', 'group_name', groupname, 'group-' + groupname, recache, {
					allowEveryone: true
				}
			) );
		}
	},
	title: {
		query: function ( query, params ) {
			return cache.getCachedPromise( 'promise-title-query', () => queryStore( 'title-query-store', Object.assign( { query: query }, params || {} ) ) );
		},
		getByPrefixedText: function ( prefixedText, recache ) {
			return cache.getCachedPromise( 'promise-title-data-' + prefixedText, () => querySingle(
				'title', 'prefixed', prefixedText, 'title-' + prefixedText, recache
			) );
		}
	},
	file: {
		query: function ( query, params ) {
			return cache.getCachedPromise( 'promise-file-query', () => queryStore( 'file-query-store', Object.assign( { query: query }, params || {} ) ) );
		}
	},
	category: {
		query: function ( query, params ) {
			return cache.getCachedPromise( 'promise-category-query', () => queryStore( 'category-query-store', Object.assign( { query: query }, params || {} ) ) );
		}
	}
};
var cache = { // eslint-disable-line no-var
	data: {},
	set: function ( key, data ) {
		cache.data[ key ] = data;
	},
	get: function ( key, defaultValue ) {
		return cache.data[ key ] || defaultValue;
	},
	has: function ( key ) {
		return cache.data[ key ] !== undefined;
	},
	delete: function ( key ) {
		if ( cache.has( key ) ) {
			delete ( cache.data[ key ] );
		}
	},
	getCachedPromise: function ( key, callback ) {
		if ( cache.has( key ) ) {
			return cache.get( key );
		}
		const promise = callback();
		cache.set( key, promise );
		promise.done( () => {
			cache.delete( key );
		} );

		return promise;
	}
};

function querySingle( store, property, value, cacheKey, recache, additionalParams ) {
	const dfd = $.Deferred();
	if ( !value || typeof value !== 'string' || value.length < 2 ) {
		return dfd.resolve( {} ).promise();
	}

	if ( !recache && cache.has( cacheKey ) ) {
		dfd.resolve( cache.get( cacheKey ) );
		return dfd.promise();
	}
	mws.commonwebapis[ store ].query( '', Object.assign( {
		filter: JSON.stringify( [
			{
				type: 'string',
				value: value,
				operator: 'eq',
				property: property
			}
		] ),
		limit: 1
	}, additionalParams || {} ) ).done( ( response ) => {
		if ( response.length > 0 ) {
			dfd.resolve( response[ 0 ] );
			return;
		}
		dfd.resolve( {} );
	} ).fail( ( err ) => {
		dfd.resolve( err );
	} );

	return dfd.promise();
}

function queryStore( store, params, cacheKey ) {
	const dfd = $.Deferred();
	const req = $.ajax( {
		method: 'GET',
		url: mw.util.wikiScript( 'rest' ) + '/mws/v1/' + store,
		data: params
	} ).done( ( response ) => {
		if ( response && response.results ) {
			for ( let i = 0; i < response.results.length; i++ ) {
				const result = response.results[ i ];
				if ( !cacheKey ) {
					continue;
				}
				// Replace named placeholders in curly braces with actual values
				const key = cacheKey.replace( /\{([^}]+)\}/g, ( match, p1 ) => result[ p1 ] );
				// if cache key contains a placeholder that is not in the result, skip
				if ( key.indexOf( '{' ) !== -1 ) {
					continue;
				}
				cache.set( key, result );
			}
			dfd.resolve( response.results );
			return;
		}
		dfd.resolve( [] );
	} ).fail( ( err ) => {
		dfd.resolve( err );
	} );
	return dfd.promise( { abort: function () {
		req.abort();
	} } );
}

mws = window.mws || {};
mws.commonwebapis = {
	user: {
		query: function ( query, params ) {
			if ( query ) {
				params = ( params || {} ).query = query;
			}
			return queryStore( 'user-query-store', params, 'user-data-{user_name}' );
		},
		getByUsername: function ( username, recache ) {
			return cache.getCachedPromise( 'promise-user-data-' + username, () => querySingle( 'user', 'user_name', username, 'user-data-' + username, recache ) );
		}
	},
	group: {
		query: function ( query, params ) {
			if ( query ) {
				params = ( params || {} ).query = query;
			}
			return queryStore( 'group-store', params, 'group-{group_name}' );
		},
		getByGroupName: function ( groupname, recache ) {
			return cache.getCachedPromise( 'promise-group-data-' + groupname, () => querySingle(
				'group', 'group_name', groupname, 'group-' + groupname, recache, {
					allowEveryone: true
				}
			) );
		}
	},
	title: {
		query: function ( query, params ) {
			return cache.getCachedPromise( 'promise-title-query', () => queryStore( 'title-query-store', Object.assign( { query: query }, params || {} ) ) );
		},
		getByPrefixedText: function ( prefixedText, recache ) {
			return cache.getCachedPromise( 'promise-title-data-' + prefixedText, () => querySingle(
				'title', 'prefixed', prefixedText, 'title-' + prefixedText, recache
			) );
		}
	},
	file: {
		query: function ( query, params ) {
			return cache.getCachedPromise( 'promise-file-query', () => queryStore( 'file-query-store', Object.assign( { query: query }, params || {} ) ) );
		}
	},
	category: {
		query: function ( query, params ) {
			return cache.getCachedPromise( 'promise-category-query', () => queryStore( 'category-query-store', Object.assign( { query: query }, params || {} ) ) );
		}
	}
};

3 Embed link to php file with render option[edit | edit source]

<?php

namespace MWStake\MediaWiki\Component\CommonWebAPIs;

use MediaWiki\MediaWikiServices;

class Setup {
	public static function onExtensionFunctions() {
		$endpointManager = MediaWikiServices::getInstance()->getService(
			'MWStakeCommonWebAPIsEndpointManager'
		);
		$endpointManager->enableEndpoints();
	}
}