Time |
Nickname |
Message |
00:21
🔗
|
|
Soni has quit IRC (Ping timeout: 264 seconds) |
00:22
🔗
|
|
Soni has joined #archiveteam |
00:40
🔗
|
|
BlueMax has joined #archiveteam |
00:45
🔗
|
|
vonguard has quit IRC (Read error: Operation timed out) |
02:15
🔗
|
|
Soni has quit IRC (Ping timeout: 264 seconds) |
02:30
🔗
|
|
Soni has joined #archiveteam |
03:43
🔗
|
|
qw3rty113 has joined #archiveteam |
03:45
🔗
|
|
odemg has quit IRC (Read error: Operation timed out) |
03:48
🔗
|
|
qw3rty112 has quit IRC (Read error: Operation timed out) |
04:00
🔗
|
|
odemg has joined #archiveteam |
05:52
🔗
|
|
Mateon1 has quit IRC (Read error: Operation timed out) |
05:52
🔗
|
|
Mateon1 has joined #archiveteam |
08:05
🔗
|
|
schbirid has joined #archiveteam |
08:46
🔗
|
|
REiN^ has joined #archiveteam |
09:14
🔗
|
|
Ungstein has joined #archiveteam |
10:12
🔗
|
|
jschwart has joined #archiveteam |
10:13
🔗
|
|
Somebody2 has quit IRC (Read error: Operation timed out) |
10:39
🔗
|
|
BlueMax has quit IRC (Leaving) |
10:53
🔗
|
|
apache2 has joined #archiveteam |
10:54
🔗
|
apache2 |
hey, I'm trying to secure an archive of mods and tools for an old game. archive.org won't copy it due to robots.txt, and archive.is won't take it due to not being text content. where would I go? |
10:54
🔗
|
apache2 |
I could take a WARC snapshot using whatever tool, but not sure where to put it. |
10:55
🔗
|
apache2 |
I suspect it will be a gigabyte or two, max. |
10:56
🔗
|
JAA |
Hi apache2. We can throw it into ArchiveBot perhaps, depending on how the site is structured (won't work for example when downloads are behind a captcha or similar, like on romhacking.net). |
10:56
🔗
|
JAA |
Even then, it will not be accessible in the Wayback Machine, but the data would be preserved and downloadable on IA. |
10:56
🔗
|
JAA |
What site is it? |
10:57
🔗
|
apache2 |
JAA: http://frua.rosedragon.org |
10:58
🔗
|
apache2 |
FRUA is short for "forgotten realms unlimited adventures", a game that let people make their own "gold box" adventure modules. |
10:59
🔗
|
apache2 |
there's no captcha or anything, it's a fairly straight-forward website. I suspect the robots.txt is just to prevent paying bandwidth for googlebot and similar |
10:59
🔗
|
JAA |
Yeah, it looks like it should work fine. I'll add it to ArchiveBot. |
11:00
🔗
|
apache2 |
JAA: thank you! I found some links to modules that are not listed in their module listing. would there be a way to verify that they get included in the backup? |
11:01
🔗
|
JAA |
apache2: It seems like it should find everything through http://frua.rosedragon.org/dir.htm . But to verify, you could check the relevant CDX file when the archives are uploaded on IA, which lists all URLs contained in a WARC. |
11:03
🔗
|
apache2 |
JAA: thank you! and just out of curiousity: which tool would you recommend for making my own WARCs? It seems like there are a good handful of tools. I'd prefer something cli so I can manage it over ssh. |
11:08
🔗
|
JAA |
apache2: I frequently use wpull. Make sure to use version 1.2.3 as 2.0.x is quite unstable and buggy. If you want to grab heavily scripted pages, you'll probably need a browser plus something like warcprox; IA's brozzler tool might be an option to automate that, though I have no experience with it. |
11:08
🔗
|
JAA |
If you have further questions, please come to #archiveteam-bs. |
11:35
🔗
|
|
treora has quit IRC (Remote host closed the connection) |
11:35
🔗
|
|
treora has joined #archiveteam |
12:28
🔗
|
eientei95 |
So http://www.dailymail.co.uk/ blocks IA from archiving any articles |
12:28
🔗
|
eientei95 |
User-agent: ia_archiver |
12:28
🔗
|
eientei95 |
Disallow: /*/article-* |
12:29
🔗
|
JAA |
NewsGrabber should grab it anyway. |
12:31
🔗
|
eientei95 |
Cool |
13:22
🔗
|
|
Ungstein has quit IRC (Quit: Leaving.) |
13:28
🔗
|
|
godane has quit IRC (Ping timeout: 255 seconds) |
13:44
🔗
|
|
wp494 has quit IRC (Ping timeout: 260 seconds) |
13:44
🔗
|
|
wp494 has joined #archiveteam |
13:44
🔗
|
|
svchfoo1 sets mode: +o wp494 |
13:48
🔗
|
|
godane has joined #archiveteam |
13:48
🔗
|
|
svchfoo1 sets mode: +o godane |
14:02
🔗
|
|
betamax has quit IRC (Ping timeout: 260 seconds) |
14:26
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
14:39
🔗
|
|
dashcloud has joined #archiveteam |
14:40
🔗
|
|
dashcloud has quit IRC (Remote host closed the connection) |
14:40
🔗
|
|
dashcloud has joined #archiveteam |
14:45
🔗
|
|
betamax has joined #archiveteam |
14:58
🔗
|
|
dashcloud has quit IRC (Remote host closed the connection) |
14:59
🔗
|
|
dashcloud has joined #archiveteam |
15:10
🔗
|
|
dashcloud has quit IRC (Remote host closed the connection) |
15:10
🔗
|
|
dashcloud has joined #archiveteam |
15:28
🔗
|
|
chirlu has joined #archiveteam |
16:39
🔗
|
|
betamax has quit IRC (Read error: Connection reset by peer) |
16:39
🔗
|
|
betamax has joined #archiveteam |
16:54
🔗
|
|
dashcloud has quit IRC (Remote host closed the connection) |
16:55
🔗
|
|
dashcloud has joined #archiveteam |
16:57
🔗
|
|
vonguard has joined #archiveteam |
17:04
🔗
|
|
dashcloud has quit IRC (Remote host closed the connection) |
17:04
🔗
|
|
dashcloud has joined #archiveteam |
17:11
🔗
|
|
dashcloud has quit IRC (Remote host closed the connection) |
17:17
🔗
|
|
dashcloud has joined #archiveteam |
18:13
🔗
|
|
ndiddy has joined #archiveteam |
18:22
🔗
|
|
Somebody2 has joined #archiveteam |
19:18
🔗
|
|
north_kor has joined #archiveteam |
19:18
🔗
|
north_kor |
hi? |
19:18
🔗
|
north_kor |
is anyone here? |
19:20
🔗
|
wp494 |
north_kor: yes, by my count there's 234 of us including you here, we're all ears |
19:22
🔗
|
north_kor |
oh |
19:22
🔗
|
north_kor |
hi |
19:23
🔗
|
north_kor |
is there any benefit to running multiple warrior instances for urlteam_2? / is it limited server-side to 100 scans per second? / are any of the other projects listed currently active? |
19:35
🔗
|
JAA |
north_kor: URLTeam is limited to one item per shortener and IP to prevent rate limit issues IIRC. So no, if your warrior has a sufficient concurrency to run all shorteners, another warrior is useless. |
19:35
🔗
|
north_kor |
thanks |
19:35
🔗
|
JAA |
all *active* shorteners, and there are currently two of them. |
19:37
🔗
|
north_kor |
should I bother doing this at all, since there are already a few people who are running this on dozens of IPs? |
19:39
🔗
|
north_kor |
@JAA |
19:40
🔗
|
north_kor |
how_does_irc_work.jpg |
19:40
🔗
|
JAA |
More capacity always helps. |
19:41
🔗
|
north_kor |
okay |
19:41
🔗
|
|
north_kor has quit IRC (Quit: http://www.mibbit.com ajax IRC Client) |
19:59
🔗
|
|
nertzy has joined #archiveteam |
20:51
🔗
|
|
Soni has quit IRC (Remote host closed the connection) |
20:54
🔗
|
schbirid |
archivebot this wonderful archive please http://www.mtm2.com/MTM2/ |
21:20
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
21:48
🔗
|
|
theaetetu has quit IRC (Read error: Operation timed out) |
21:51
🔗
|
|
theaetetu has joined #archiveteam |
21:52
🔗
|
|
vonguard has quit IRC (Read error: Operation timed out) |
22:26
🔗
|
|
Soni has joined #archiveteam |
23:29
🔗
|
|
DFJustin has quit IRC (Remote host closed the connection) |
23:34
🔗
|
|
DFJustin has joined #archiveteam |
23:35
🔗
|
|
svchfoo1 sets mode: +o DFJustin |
23:54
🔗
|
|
Zialus has quit IRC (i'm out!) |
23:57
🔗
|
|
Zialus has joined #archiveteam |