#archiveteam 2018-05-13,Sun

↑back Search

Time Nickname Message
00:21 🔗 Soni has quit IRC (Ping timeout: 264 seconds)
00:22 🔗 Soni has joined #archiveteam
00:40 🔗 BlueMax has joined #archiveteam
00:45 🔗 vonguard has quit IRC (Read error: Operation timed out)
02:15 🔗 Soni has quit IRC (Ping timeout: 264 seconds)
02:30 🔗 Soni has joined #archiveteam
03:43 🔗 qw3rty113 has joined #archiveteam
03:45 🔗 odemg has quit IRC (Read error: Operation timed out)
03:48 🔗 qw3rty112 has quit IRC (Read error: Operation timed out)
04:00 🔗 odemg has joined #archiveteam
05:52 🔗 Mateon1 has quit IRC (Read error: Operation timed out)
05:52 🔗 Mateon1 has joined #archiveteam
08:05 🔗 schbirid has joined #archiveteam
08:46 🔗 REiN^ has joined #archiveteam
09:14 🔗 Ungstein has joined #archiveteam
10:12 🔗 jschwart has joined #archiveteam
10:13 🔗 Somebody2 has quit IRC (Read error: Operation timed out)
10:39 🔗 BlueMax has quit IRC (Leaving)
10:53 🔗 apache2 has joined #archiveteam
10:54 🔗 apache2 hey, I'm trying to secure an archive of mods and tools for an old game. archive.org won't copy it due to robots.txt, and archive.is won't take it due to not being text content. where would I go?
10:54 🔗 apache2 I could take a WARC snapshot using whatever tool, but not sure where to put it.
10:55 🔗 apache2 I suspect it will be a gigabyte or two, max.
10:56 🔗 JAA Hi apache2. We can throw it into ArchiveBot perhaps, depending on how the site is structured (won't work for example when downloads are behind a captcha or similar, like on romhacking.net).
10:56 🔗 JAA Even then, it will not be accessible in the Wayback Machine, but the data would be preserved and downloadable on IA.
10:56 🔗 JAA What site is it?
10:57 🔗 apache2 JAA: http://frua.rosedragon.org
10:58 🔗 apache2 FRUA is short for "forgotten realms unlimited adventures", a game that let people make their own "gold box" adventure modules.
10:59 🔗 apache2 there's no captcha or anything, it's a fairly straight-forward website. I suspect the robots.txt is just to prevent paying bandwidth for googlebot and similar
10:59 🔗 JAA Yeah, it looks like it should work fine. I'll add it to ArchiveBot.
11:00 🔗 apache2 JAA: thank you! I found some links to modules that are not listed in their module listing. would there be a way to verify that they get included in the backup?
11:01 🔗 JAA apache2: It seems like it should find everything through http://frua.rosedragon.org/dir.htm . But to verify, you could check the relevant CDX file when the archives are uploaded on IA, which lists all URLs contained in a WARC.
11:03 🔗 apache2 JAA: thank you! and just out of curiousity: which tool would you recommend for making my own WARCs? It seems like there are a good handful of tools. I'd prefer something cli so I can manage it over ssh.
11:08 🔗 JAA apache2: I frequently use wpull. Make sure to use version 1.2.3 as 2.0.x is quite unstable and buggy. If you want to grab heavily scripted pages, you'll probably need a browser plus something like warcprox; IA's brozzler tool might be an option to automate that, though I have no experience with it.
11:08 🔗 JAA If you have further questions, please come to #archiveteam-bs.
11:35 🔗 treora has quit IRC (Remote host closed the connection)
11:35 🔗 treora has joined #archiveteam
12:28 🔗 eientei95 So http://www.dailymail.co.uk/ blocks IA from archiving any articles
12:28 🔗 eientei95 User-agent: ia_archiver
12:28 🔗 eientei95 Disallow: /*/article-*
12:29 🔗 JAA NewsGrabber should grab it anyway.
12:31 🔗 eientei95 Cool
13:22 🔗 Ungstein has quit IRC (Quit: Leaving.)
13:28 🔗 godane has quit IRC (Ping timeout: 255 seconds)
13:44 🔗 wp494 has quit IRC (Ping timeout: 260 seconds)
13:44 🔗 wp494 has joined #archiveteam
13:44 🔗 svchfoo1 sets mode: +o wp494
13:48 🔗 godane has joined #archiveteam
13:48 🔗 svchfoo1 sets mode: +o godane
14:02 🔗 betamax has quit IRC (Ping timeout: 260 seconds)
14:26 🔗 dashcloud has quit IRC (Read error: Operation timed out)
14:39 🔗 dashcloud has joined #archiveteam
14:40 🔗 dashcloud has quit IRC (Remote host closed the connection)
14:40 🔗 dashcloud has joined #archiveteam
14:45 🔗 betamax has joined #archiveteam
14:58 🔗 dashcloud has quit IRC (Remote host closed the connection)
14:59 🔗 dashcloud has joined #archiveteam
15:10 🔗 dashcloud has quit IRC (Remote host closed the connection)
15:10 🔗 dashcloud has joined #archiveteam
15:28 🔗 chirlu has joined #archiveteam
16:39 🔗 betamax has quit IRC (Read error: Connection reset by peer)
16:39 🔗 betamax has joined #archiveteam
16:54 🔗 dashcloud has quit IRC (Remote host closed the connection)
16:55 🔗 dashcloud has joined #archiveteam
16:57 🔗 vonguard has joined #archiveteam
17:04 🔗 dashcloud has quit IRC (Remote host closed the connection)
17:04 🔗 dashcloud has joined #archiveteam
17:11 🔗 dashcloud has quit IRC (Remote host closed the connection)
17:17 🔗 dashcloud has joined #archiveteam
18:13 🔗 ndiddy has joined #archiveteam
18:22 🔗 Somebody2 has joined #archiveteam
19:18 🔗 north_kor has joined #archiveteam
19:18 🔗 north_kor hi?
19:18 🔗 north_kor is anyone here?
19:20 🔗 wp494 north_kor: yes, by my count there's 234 of us including you here, we're all ears
19:22 🔗 north_kor oh
19:22 🔗 north_kor hi
19:23 🔗 north_kor is there any benefit to running multiple warrior instances for urlteam_2? / is it limited server-side to 100 scans per second? / are any of the other projects listed currently active?
19:35 🔗 JAA north_kor: URLTeam is limited to one item per shortener and IP to prevent rate limit issues IIRC. So no, if your warrior has a sufficient concurrency to run all shorteners, another warrior is useless.
19:35 🔗 north_kor thanks
19:35 🔗 JAA all *active* shorteners, and there are currently two of them.
19:37 🔗 north_kor should I bother doing this at all, since there are already a few people who are running this on dozens of IPs?
19:39 🔗 north_kor @JAA
19:40 🔗 north_kor how_does_irc_work.jpg
19:40 🔗 JAA More capacity always helps.
19:41 🔗 north_kor okay
19:41 🔗 north_kor has quit IRC (Quit: http://www.mibbit.com ajax IRC Client)
19:59 🔗 nertzy has joined #archiveteam
20:51 🔗 Soni has quit IRC (Remote host closed the connection)
20:54 🔗 schbirid archivebot this wonderful archive please http://www.mtm2.com/MTM2/
21:20 🔗 schbirid has quit IRC (Quit: Leaving)
21:48 🔗 theaetetu has quit IRC (Read error: Operation timed out)
21:51 🔗 theaetetu has joined #archiveteam
21:52 🔗 vonguard has quit IRC (Read error: Operation timed out)
22:26 🔗 Soni has joined #archiveteam
23:29 🔗 DFJustin has quit IRC (Remote host closed the connection)
23:34 🔗 DFJustin has joined #archiveteam
23:35 🔗 svchfoo1 sets mode: +o DFJustin
23:54 🔗 Zialus has quit IRC (i'm out!)
23:57 🔗 Zialus has joined #archiveteam

irclogger-viewer