#archiveteam 2018-05-13,Sun

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)

WhoWhatWhen
***Soni has quit IRC (Ping timeout: 264 seconds)
Soni has joined #archiveteam
[00:21]
.... (idle for 18mn)
BlueMax has joined #archiveteam [00:40]
vonguard has quit IRC (Read error: Operation timed out) [00:45]
................... (idle for 1h30mn)
Soni has quit IRC (Ping timeout: 264 seconds) [02:15]
.... (idle for 15mn)
Soni has joined #archiveteam [02:30]
............... (idle for 1h13mn)
qw3rty113 has joined #archiveteam
odemg has quit IRC (Read error: Operation timed out)
qw3rty112 has quit IRC (Read error: Operation timed out)
[03:43]
odemg has joined #archiveteam [04:00]
....................... (idle for 1h52mn)
Mateon1 has quit IRC (Read error: Operation timed out)
Mateon1 has joined #archiveteam
[05:52]
........................... (idle for 2h13mn)
schbirid has joined #archiveteam [08:05]
......... (idle for 41mn)
REiN^ has joined #archiveteam [08:46]
...... (idle for 28mn)
Ungstein has joined #archiveteam [09:14]
............ (idle for 58mn)
jschwart has joined #archiveteam
Somebody2 has quit IRC (Read error: Operation timed out)
[10:12]
...... (idle for 26mn)
BlueMax has quit IRC (Leaving) [10:39]
apache2 has joined #archiveteam [10:53]
apache2hey, I'm trying to secure an archive of mods and tools for an old game. archive.org won't copy it due to robots.txt, and archive.is won't take it due to not being text content. where would I go?
I could take a WARC snapshot using whatever tool, but not sure where to put it.
I suspect it will be a gigabyte or two, max.
[10:54]
JAAHi apache2. We can throw it into ArchiveBot perhaps, depending on how the site is structured (won't work for example when downloads are behind a captcha or similar, like on romhacking.net).
Even then, it will not be accessible in the Wayback Machine, but the data would be preserved and downloadable on IA.
What site is it?
[10:56]
apache2JAA: http://frua.rosedragon.org
FRUA is short for "forgotten realms unlimited adventures", a game that let people make their own "gold box" adventure modules.
there's no captcha or anything, it's a fairly straight-forward website. I suspect the robots.txt is just to prevent paying bandwidth for googlebot and similar
[10:57]
JAAYeah, it looks like it should work fine. I'll add it to ArchiveBot. [10:59]
apache2JAA: thank you! I found some links to modules that are not listed in their module listing. would there be a way to verify that they get included in the backup? [11:00]
JAAapache2: It seems like it should find everything through http://frua.rosedragon.org/dir.htm . But to verify, you could check the relevant CDX file when the archives are uploaded on IA, which lists all URLs contained in a WARC. [11:01]
apache2JAA: thank you! and just out of curiousity: which tool would you recommend for making my own WARCs? It seems like there are a good handful of tools. I'd prefer something cli so I can manage it over ssh. [11:03]
JAAapache2: I frequently use wpull. Make sure to use version 1.2.3 as 2.0.x is quite unstable and buggy. If you want to grab heavily scripted pages, you'll probably need a browser plus something like warcprox; IA's brozzler tool might be an option to automate that, though I have no experience with it.
If you have further questions, please come to #archiveteam-bs.
[11:08]
...... (idle for 27mn)
***treora has quit IRC (Remote host closed the connection)
treora has joined #archiveteam
[11:35]
........... (idle for 53mn)
eientei95So http://www.dailymail.co.uk/ blocks IA from archiving any articles
User-agent: ia_archiver
Disallow: /*/article-*
[12:28]
JAANewsGrabber should grab it anyway. [12:29]
eientei95Cool [12:31]
........... (idle for 51mn)
***Ungstein has quit IRC (Quit: Leaving.) [13:22]
godane has quit IRC (Ping timeout: 255 seconds) [13:28]
.... (idle for 16mn)
wp494 has quit IRC (Ping timeout: 260 seconds)
wp494 has joined #archiveteam
svchfoo1 sets mode: +o wp494
godane has joined #archiveteam
svchfoo1 sets mode: +o godane
[13:44]
betamax has quit IRC (Ping timeout: 260 seconds) [14:02]
..... (idle for 24mn)
dashcloud has quit IRC (Read error: Operation timed out) [14:26]
dashcloud has joined #archiveteam
dashcloud has quit IRC (Remote host closed the connection)
dashcloud has joined #archiveteam
[14:39]
betamax has joined #archiveteam [14:45]
dashcloud has quit IRC (Remote host closed the connection)
dashcloud has joined #archiveteam
[14:58]
dashcloud has quit IRC (Remote host closed the connection)
dashcloud has joined #archiveteam
[15:10]
.... (idle for 18mn)
chirlu has joined #archiveteam [15:28]
............... (idle for 1h11mn)
betamax has quit IRC (Read error: Connection reset by peer)
betamax has joined #archiveteam
[16:39]
.... (idle for 15mn)
dashcloud has quit IRC (Remote host closed the connection)
dashcloud has joined #archiveteam
vonguard has joined #archiveteam
[16:54]
dashcloud has quit IRC (Remote host closed the connection)
dashcloud has joined #archiveteam
[17:04]
dashcloud has quit IRC (Remote host closed the connection) [17:11]
dashcloud has joined #archiveteam [17:17]
............ (idle for 56mn)
ndiddy has joined #archiveteam [18:13]
Somebody2 has joined #archiveteam [18:22]
............ (idle for 56mn)
north_kor has joined #archiveteam [19:18]
north_korhi?
is anyone here?
[19:18]
wp494north_kor: yes, by my count there's 234 of us including you here, we're all ears [19:20]
north_koroh
hi
is there any benefit to running multiple warrior instances for urlteam_2? / is it limited server-side to 100 scans per second? / are any of the other projects listed currently active?
[19:22]
JAAnorth_kor: URLTeam is limited to one item per shortener and IP to prevent rate limit issues IIRC. So no, if your warrior has a sufficient concurrency to run all shorteners, another warrior is useless. [19:35]
north_korthanks [19:35]
JAAall *active* shorteners, and there are currently two of them. [19:35]
north_korshould I bother doing this at all, since there are already a few people who are running this on dozens of IPs?
@JAA
how_does_irc_work.jpg
[19:37]
JAAMore capacity always helps. [19:40]
north_korokay [19:41]
***north_kor has quit IRC (Quit: http://www.mibbit.com ajax IRC Client) [19:41]
.... (idle for 18mn)
nertzy has joined #archiveteam [19:59]
........... (idle for 52mn)
Soni has quit IRC (Remote host closed the connection) [20:51]
schbiridarchivebot this wonderful archive please http://www.mtm2.com/MTM2/ [20:54]
...... (idle for 26mn)
***schbirid has quit IRC (Quit: Leaving) [21:20]
...... (idle for 28mn)
theaetetu has quit IRC (Read error: Operation timed out)
theaetetu has joined #archiveteam
vonguard has quit IRC (Read error: Operation timed out)
[21:48]
....... (idle for 34mn)
Soni has joined #archiveteam [22:26]
............. (idle for 1h3mn)
DFJustin has quit IRC (Remote host closed the connection) [23:29]
DFJustin has joined #archiveteam
svchfoo1 sets mode: +o DFJustin
[23:34]
.... (idle for 19mn)
Zialus has quit IRC (i'm out!)
Zialus has joined #archiveteam
[23:54]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)