Seite veraltetDiese Seite ist veraltet. Bitte prüfen Sie diese Seite und geben Sie diese frei.
https://github.com/hallowelt/migrate-confluence/blob/main/README.md
This is my text
# Migrate Confluence XML export to MediaWiki import data
This is a command line tool to convert the contents of a Confluence space into a MediaWiki import data format. See also the [official BlueSpice Helpdesk entry](https://en.wiki.bluespice.com/wiki/Confluence_migration).
## Docker
The migrate confluence tool is available as docker image.
## Workflow
### Export "space" from Confluence
1. Create an export of your confluence space
Step 1:
<kbd>![Export 1][c001]</kbd>
Step 2:
<kbd>![Export 2][c002]</kbd>
Step 3:
<kbd>![Export 3][c003]</kbd>
2. Save it to a location that is accessbile by this tool (e.g. `/tmp/confluence/input/Confluence-export.zip`)
3. Extract the ZIP file (e.g. `/tmp/confluence/input/Confluence-export`)
1. The folder should contain the files `entities.xml` and `exportDescriptor.properties`, as well as the folder `attachments`
[c001]: doc/images/Confluence_export_space_001.png
[c002]: doc/images/Confluence_export_space_002.png
[c003]: doc/images/Confluence_export_space_003.png
### Migrate the contents
1. Create the "workspace" directory (e.g. `/tmp/confluence/workspace/` )
2. From the parent directory (e.g. `/tmp/` ), run the migration commands
1. Run `docker run -v $(pwd)/confluence:/data bluespice/migrate-confluence:latest analyze --src=/data/input --dest=/data/workspace` to create "working files". After the script has run you can check those files and maybe apply changes if required (e.g. when applying structural changes).
2. Run `docker run -v $(pwd)/confluence:/data bluespice/migrate-confluence:latest extract --src=/data/input --dest=/data/workspace` to extract all contents, like wikipage contents, attachments and images into the workspace
3. Run `docker run -v $(pwd)/confluence:/data bluespice/migrate-confluence:latest convert --src=/data/workspace --dest=/data/workspace` (yes, `--src /data/workspace/` ) to convert the wikipage contents from Confluence Storage XML to MediaWiki WikiText. For large spaces, see [Parallel convert](#parallel-convert) below.
4. Run `docker run -v $(pwd)/confluence:/data bluespice/migrate-confluence:latest compose --src=/data/workspace --dest=/data/workspace` (yes, `--src /data/workspace/` ) to create importable data
If you re-run the scripts you will need to clean up the "workspace" directory!
### Import into MediaWiki
1. Copy the diretory "workspace/result" directory (e.g. `/tmp/confluence/workspace/result/`) to your target wiki server (e.g. `/tmp/result`)
2. Go to your MediaWiki installation directory
3. Make sure you have the target namespaces set up properly. See `workspace/space-id-to-prefix-map.php` for reference.
4. Make sure [$wgFileExtensions](https://www.mediawiki.org/wiki/Manual:$wgFileExtensions) is setup properly. See `workspace/attachment-file-extensions.php` for reference.
5. Use `php maintenance/importImages.php /tmp/result/images/` to first import all attachment files and images
6. Use `php maintenance/importDump.php /tmp/result/pages.xml` to import the actual pages
You may need to update your MediaWiki search index afterwards.
#### Config file
It is possible to use a yaml file to configure the commands analyze, extract and convert. As an example see `/doc/config.sample.yaml`.
The configuration file can be applied by adding the option `--config /data/config.yaml`.
Not all parameters of `config.sample.yaml` have to be used in the config file. If something is not part of it the default will be used.
#### Parallel convert
For large Confluence spaces the `convert` step can be slow. You can speed it up by running multiple worker processes in parallel using the `--workers` option.
```bash
docker run -v $(pwd)/confluence:/data bluespice/migrate-confluence:latest convert \
--src=/data/workspace --dest=/data/workspace \
--workers=4
```
The command spawns the requested number of child processes automatically. Each worker handles a disjoint slice of the file list, so every file is converted exactly once. Progress lines are prefixed with `[Worker N]` so you can follow each process individually. If any worker fails the command exits with a non-zero status and reports which workers were affected.
Choose `--workers` based on the number of available CPU cores. A value between 2 and 8 is typical; there is no benefit in exceeding the number of cores on your machine.
> **Note:** `--workers=1` (the default) behaves identically to running without the option — no child processes are spawned.
#### Extension:NSFileRepo compatibility
There is now a compatibility for the mediawiki extension https://www.mediawiki.org/wiki/Extension:NSFileRepo which restricts access files and images to a given set of user groups associated with protected namespaces.
If NSFileRepo is used the upload of the images can not be done with the script `maintenance/importImages.php` but with `extensions/NSFileRepo/maintenance/importFiles.php`.
Example: `php extensions/NSFileRepo/maintenance/importFiles.php /tmp/result/images/`
#### User spaces
In confluence user spaces are protected. In MediaWiki this is not possible for namespace `User`. Therefore user spaces are migrated to a namespace `User<username>` which can be protected in `BlueSpice for MediaWiki`.
#### Included MediaWiki wikitext templates
- `AttachmentsSectionEnd`
- `AttachmentsSectionStart`
- `Details`
- `DetailsSummary`
- `Excerpt`
- `ExcerptInclude`
- `Info`
- `InlineComment`
- `Layout`
- `Layouts.css`
- `Note`
- `Panel`
- `RecentlyUpdated`
- `SubpageList`
- `SubpageListRow`
- `Tip`
- `Warning`
- `PageTree`
- `SpaceDetails`
- `ViewFile`
Be aware that those pages may be overwritten by the import if they already exist in the target wiki.
#### Included upload files
- `Icon-info.svg`
- `Icon-note.svg`
- `Icon-tip.svg`
- `Icon-warning.svg`
Be aware that those files may be overwritten by the import if they already exist in the target wiki.
#### MediaWiki settings
In case your pages contain a lot of external images (`<img />` elements), be aware that MediaWiki does not show them by default. You'd need to configure `$wgAllowExternalImages`.
Read https://www.mediawiki.org/wiki/Manual:$wgAllowExternalImages for more information.
#### Jira interwiki links
Confluence pages that contain Jira macros are converted to use MediaWiki [interwiki links](https://www.mediawiki.org/wiki/Manual:Interwiki). Two separate prefixes are used because Jira issue keys and JQL queries have different URL patterns:
| Interwiki prefix | Purpose | Example URL pattern |
|---|---|---|
| `jira` | Link to a specific Jira issue by key | `https://jira.example.com/browse/$1` |
| `jira-jql` | Link to a Jira issue list filtered by JQL | `https://jira.example.com/issues/?jql=$1` |
Add both entries to the `interwiki` table of your MediaWiki database, or configure them via [`$wgExtraInterlanguageLinkPrefixes`](https://www.mediawiki.org/wiki/Manual:$wgExtraInterlanguageLinkPrefixes) and the interwiki cache. Replace `https://jira.example.com` with the base URL of your Jira instance.
#### Required MediaWiki extensions
The output generated by the tool contains certain elements that need additonal extensions to be enabled.
1. [TemplateStyles](https://www.mediawiki.org/wiki/Extension:TemplateStyles)
2. [ParserFunctions](https://www.mediawiki.org/wiki/Extension:ParserFunctions)
3. [DateTimeTools](https://www.mediawiki.org/wiki/Extension:DateTimeTools)
4. [Checklists](https://www.mediawiki.org/wiki/Extension:Checklists)
5. [SimpleTasks](https://www.mediawiki.org/wiki/Extension:SimpleTasks)
6. [EnhancedUploads](https://www.mediawiki.org/wiki/Extension:EnhancedUploads)
7. [Semantic MediaWiki](https://www.semantic-mediawiki.org/wiki/Semantic_MediaWiki)
8. [HeaderTabs](https://www.mediawiki.org/wiki/Extension:HeaderTabs)
9. [SubPageList](https://www.mediawiki.org/wiki/Extension:SubPageList)
9. [TableTools](https://www.mediawiki.org/wiki/Extension:TableTools)
#### Recommended MediaWiki extensions
These extensions are not strictly required but are recommended for full compatibility with the migrated content.
1. [WikiMarkdown](https://www.mediawiki.org/wiki/Extension:WikiMarkdown) - Renders `<markdown>` tags produced from Confluence markdown macros
### Manual post-import maintenance
#### Cleanup Categories
In the case that the tool can not migrate content or functionality it will create a category, so you can manually fix issues after the import
- `Broken_link`
- `Broken_user_link`
- `Broken_page_link`
- `Broken_image`
- `Broken_layout`
- `Broken_macro/<macro-name>`
## Not migrated
- User identities
- Comments
- Various macros
- Various layouts
- Blog posts
- Files of a space which can not be assigned to a page
## Creating a build
1. Clone this repo
2. Run `composer update --no-dev`
3. Run `box compile` to actually create the PHAR file in `dist/`. See also https://github.com/humbug/box
# TODO
* Reduce multiple linebreaks (`<br />`) to one
* Remove line breaks and arbitrary fromatting (e.g. `<b>`) from headings
* Mask external images (`<img />`)
* Preserve filename of "Broken_attachment"
* Merge multiple `<code>` lines into `<pre>`
* Remove bold/italic formatting from wikitext headings (e.g. `=== '''Some heading''' ===`)
* Fix unconverted HTML lists in wikitext (e.g. `<ul><li>==== Lorem ipsum ====</li><li>'''<span class="confluence-link"> </span>[[Media:Some_file.pdf]]'''</li></ul><ul>`)
* Remove empty confluence storage format fragments (e.g. `<span class="confluence-link"> </span>`, `<span class="no-children icon">`)
1 Embed link to json file and line numbers[edit | edit source]
[
{
"method": "GET",
"path": "/mws/v1/tags",
"class": "MWStake\\MediaWiki\\Component\\GenericTagHandler\\Rest\\ListTagsHandler",
"services": [ "MWStake.GenericTagHandler.TagFactory", "MWStake.InputProcessor.Factory" ]
},
{
"path": "/mws/v1/tags/parse/{tag}",
"method": "POST",
"class": "MWStake\\MediaWiki\\Component\\GenericTagHandler\\Rest\\RenderTagHandler",
"services": [ "MWStake.GenericTagHandler.TagFactory", "TitleFactory" ]
}
]
2 Embed link to js file with lines[edit | edit source]
var cache = { // eslint-disable-line no-var
data: {},
set: function ( key, data ) {
cache.data[ key ] = data;
},
get: function ( key, defaultValue ) {
return cache.data[ key ] || defaultValue;
},
has: function ( key ) {
return cache.data[ key ] !== undefined;
},
delete: function ( key ) {
if ( cache.has( key ) ) {
delete ( cache.data[ key ] );
}
},
getCachedPromise: function ( key, callback ) {
if ( cache.has( key ) ) {
return cache.get( key );
}
const promise = callback();
cache.set( key, promise );
promise.done( () => {
cache.delete( key );
} );
return promise;
}
};
function querySingle( store, property, value, cacheKey, recache, additionalParams ) {
const dfd = $.Deferred();
if ( !value || typeof value !== 'string' || value.length < 2 ) {
return dfd.resolve( {} ).promise();
}
if ( !recache && cache.has( cacheKey ) ) {
dfd.resolve( cache.get( cacheKey ) );
return dfd.promise();
}
mws.commonwebapis[ store ].query( '', Object.assign( {
filter: JSON.stringify( [
{
type: 'string',
value: value,
operator: 'eq',
property: property
}
] ),
limit: 1
}, additionalParams || {} ) ).done( ( response ) => {
if ( response.length > 0 ) {
dfd.resolve( response[ 0 ] );
return;
}
dfd.resolve( {} );
} ).fail( ( err ) => {
dfd.resolve( err );
} );
return dfd.promise();
}
function queryStore( store, params, cacheKey ) {
const dfd = $.Deferred();
const req = $.ajax( {
method: 'GET',
url: mw.util.wikiScript( 'rest' ) + '/mws/v1/' + store,
data: params
} ).done( ( response ) => {
if ( response && response.results ) {
for ( let i = 0; i < response.results.length; i++ ) {
const result = response.results[ i ];
if ( !cacheKey ) {
continue;
}
// Replace named placeholders in curly braces with actual values
const key = cacheKey.replace( /\{([^}]+)\}/g, ( match, p1 ) => result[ p1 ] );
// if cache key contains a placeholder that is not in the result, skip
if ( key.indexOf( '{' ) !== -1 ) {
continue;
}
cache.set( key, result );
}
dfd.resolve( response.results );
return;
}
dfd.resolve( [] );
} ).fail( ( err ) => {
dfd.resolve( err );
} );
return dfd.promise( { abort: function () {
req.abort();
} } );
}
mws = window.mws || {};
mws.commonwebapis = {
user: {
query: function ( query, params ) {
if ( query ) {
params = ( params || {} ).query = query;
}
return queryStore( 'user-query-store', params, 'user-data-{user_name}' );
},
getByUsername: function ( username, recache ) {
return cache.getCachedPromise( 'promise-user-data-' + username, () => querySingle( 'user', 'user_name', username, 'user-data-' + username, recache ) );
}
},
group: {
query: function ( query, params ) {
if ( query ) {
params = ( params || {} ).query = query;
}
return queryStore( 'group-store', params, 'group-{group_name}' );
},
getByGroupName: function ( groupname, recache ) {
return cache.getCachedPromise( 'promise-group-data-' + groupname, () => querySingle(
'group', 'group_name', groupname, 'group-' + groupname, recache, {
allowEveryone: true
}
) );
}
},
title: {
query: function ( query, params ) {
return cache.getCachedPromise( 'promise-title-query', () => queryStore( 'title-query-store', Object.assign( { query: query }, params || {} ) ) );
},
getByPrefixedText: function ( prefixedText, recache ) {
return cache.getCachedPromise( 'promise-title-data-' + prefixedText, () => querySingle(
'title', 'prefixed', prefixedText, 'title-' + prefixedText, recache
) );
}
},
file: {
query: function ( query, params ) {
return cache.getCachedPromise( 'promise-file-query', () => queryStore( 'file-query-store', Object.assign( { query: query }, params || {} ) ) );
}
},
category: {
query: function ( query, params ) {
return cache.getCachedPromise( 'promise-category-query', () => queryStore( 'category-query-store', Object.assign( { query: query }, params || {} ) ) );
}
}
};
var cache = { // eslint-disable-line no-var
data: {},
set: function ( key, data ) {
cache.data[ key ] = data;
},
get: function ( key, defaultValue ) {
return cache.data[ key ] || defaultValue;
},
has: function ( key ) {
return cache.data[ key ] !== undefined;
},
delete: function ( key ) {
if ( cache.has( key ) ) {
delete ( cache.data[ key ] );
}
},
getCachedPromise: function ( key, callback ) {
if ( cache.has( key ) ) {
return cache.get( key );
}
const promise = callback();
cache.set( key, promise );
promise.done( () => {
cache.delete( key );
} );
return promise;
}
};
function querySingle( store, property, value, cacheKey, recache, additionalParams ) {
const dfd = $.Deferred();
if ( !value || typeof value !== 'string' || value.length < 2 ) {
return dfd.resolve( {} ).promise();
}
if ( !recache && cache.has( cacheKey ) ) {
dfd.resolve( cache.get( cacheKey ) );
return dfd.promise();
}
mws.commonwebapis[ store ].query( '', Object.assign( {
filter: JSON.stringify( [
{
type: 'string',
value: value,
operator: 'eq',
property: property
}
] ),
limit: 1
}, additionalParams || {} ) ).done( ( response ) => {
if ( response.length > 0 ) {
dfd.resolve( response[ 0 ] );
return;
}
dfd.resolve( {} );
} ).fail( ( err ) => {
dfd.resolve( err );
} );
return dfd.promise();
}
function queryStore( store, params, cacheKey ) {
const dfd = $.Deferred();
const req = $.ajax( {
method: 'GET',
url: mw.util.wikiScript( 'rest' ) + '/mws/v1/' + store,
data: params
} ).done( ( response ) => {
if ( response && response.results ) {
for ( let i = 0; i < response.results.length; i++ ) {
const result = response.results[ i ];
if ( !cacheKey ) {
continue;
}
// Replace named placeholders in curly braces with actual values
const key = cacheKey.replace( /\{([^}]+)\}/g, ( match, p1 ) => result[ p1 ] );
// if cache key contains a placeholder that is not in the result, skip
if ( key.indexOf( '{' ) !== -1 ) {
continue;
}
cache.set( key, result );
}
dfd.resolve( response.results );
return;
}
dfd.resolve( [] );
} ).fail( ( err ) => {
dfd.resolve( err );
} );
return dfd.promise( { abort: function () {
req.abort();
} } );
}
mws = window.mws || {};
mws.commonwebapis = {
user: {
query: function ( query, params ) {
if ( query ) {
params = ( params || {} ).query = query;
}
return queryStore( 'user-query-store', params, 'user-data-{user_name}' );
},
getByUsername: function ( username, recache ) {
return cache.getCachedPromise( 'promise-user-data-' + username, () => querySingle( 'user', 'user_name', username, 'user-data-' + username, recache ) );
}
},
group: {
query: function ( query, params ) {
if ( query ) {
params = ( params || {} ).query = query;
}
return queryStore( 'group-store', params, 'group-{group_name}' );
},
getByGroupName: function ( groupname, recache ) {
return cache.getCachedPromise( 'promise-group-data-' + groupname, () => querySingle(
'group', 'group_name', groupname, 'group-' + groupname, recache, {
allowEveryone: true
}
) );
}
},
title: {
query: function ( query, params ) {
return cache.getCachedPromise( 'promise-title-query', () => queryStore( 'title-query-store', Object.assign( { query: query }, params || {} ) ) );
},
getByPrefixedText: function ( prefixedText, recache ) {
return cache.getCachedPromise( 'promise-title-data-' + prefixedText, () => querySingle(
'title', 'prefixed', prefixedText, 'title-' + prefixedText, recache
) );
}
},
file: {
query: function ( query, params ) {
return cache.getCachedPromise( 'promise-file-query', () => queryStore( 'file-query-store', Object.assign( { query: query }, params || {} ) ) );
}
},
category: {
query: function ( query, params ) {
return cache.getCachedPromise( 'promise-category-query', () => queryStore( 'category-query-store', Object.assign( { query: query }, params || {} ) ) );
}
}
};
3 Embed link to php file with render option[edit | edit source]
<?php
namespace MWStake\MediaWiki\Component\CommonWebAPIs;
use MediaWiki\MediaWikiServices;
class Setup {
public static function onExtensionFunctions() {
$endpointManager = MediaWikiServices::getInstance()->getService(
'MWStakeCommonWebAPIsEndpointManager'
);
$endpointManager->enableEndpoints();
}
}