#archiveteam-bs 2017-05-22,Mon

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)

WhoWhatWhen
***ndiddy has joined #archiveteam-bs [00:17]
.... (idle for 16mn)
powerKitt has quit IRC (Ping timeout: 268 seconds)
GLaDOS has joined #archiveteam-bs
[00:33]
...... (idle for 25mn)
ndiddy has quit IRC ()
Sk1d has quit IRC (Ping timeout: 194 seconds)
sheaf has quit IRC (Quit: sheaf)
Sk1d has joined #archiveteam-bs
[01:02]
...... (idle for 25mn)
schbirid2 has joined #archiveteam-bs
schbirid has quit IRC (Read error: Operation timed out)
[01:36]
...... (idle for 25mn)
Asparagir has joined #archiveteam-bs [02:04]
Odd0002 has joined #archiveteam-bs [02:09]
................... (idle for 1h34mn)
Lord_NighSomebody2: that email i sent to info@archive i never received a response to yet, but that was friday so maybe they'll answer it on monday [03:43]
xmcyes, it is a job, not a lifestyle
well
you know what i mean
[03:44]
Somebody2Lord_Nigh: It may be longer than that, if it isn't a simple fix. [03:49]
Lord_Nighi'm guessing its a regression in the robots.txt parser and its a simple/stupid bug, heck the source code to it is probably available, maybe i can fix it... [03:50]
Somebody2Heh, I'm not sure where the source code for the new version of the Wayback Machine is.
If you do come up with a patch, that might be likely to get a response sooner
[03:51]
bsmith093the latest dump of fanfiction.net, 16gb compressed, 54gb uncompressed. 745K stories, https://archive.org/details/Fanfictiondotnet1011dump [04:04]
.... (idle for 16mn)
Lord_NighSomebody2: i'm not sure either
where the source is
[04:20]
...................................... (idle for 3h7mn)
***BlueMaxim has quit IRC (Read error: Operation timed out) [07:27]
............... (idle for 1h11mn)
Honno has joined #archiveteam-bs [08:38]
SanquiI noticed WaybackMachine added an "About this capture" thingy
So you can now identify ArchiveBot crawls
[08:48]
***GE has joined #archiveteam-bs [08:55]
.... (idle for 19mn)
Honno_ has joined #archiveteam-bs
Honno__ has joined #archiveteam-bs
Honno has quit IRC (Ping timeout: 370 seconds)
Honno_ has quit IRC (Ping timeout: 370 seconds)
Honno_ has joined #archiveteam-bs
[09:14]
Honno__ has quit IRC (Ping timeout: 370 seconds) [09:28]
............. (idle for 1h2mn)
GE has quit IRC (Remote host closed the connection) [10:30]
............. (idle for 1h0mn)
Aoedebsmith093: nice. how do you generate metadata.sqlite? [11:30]
......... (idle for 40mn)
***Jonison has joined #archiveteam-bs
Jonison has quit IRC (Read error: Connection reset by peer)
[12:10]
........ (idle for 37mn)
GE has joined #archiveteam-bs [12:47]
........ (idle for 35mn)
sheaf has joined #archiveteam-bs [13:22]
.................... (idle for 1h38mn)
Fletcher has joined #archiveteam-bs
kurt has joined #archiveteam-bs
kvieta has joined #archiveteam-bs
espes__ has joined #archiveteam-bs
SilSte has joined #archiveteam-bs
Kenshin has joined #archiveteam-bs
w0rp has joined #archiveteam-bs
dashcloud has joined #archiveteam-bs
HP has joined #archiveteam-bs
antonizoo has joined #archiveteam-bs
tapedrive has joined #archiveteam-bs
eprillios has joined #archiveteam-bs
chfoo has joined #archiveteam-bs
cf has joined #archiveteam-bs
joepie91 has joined #archiveteam-bs
brayden has joined #archiveteam-bs
hub.dk sets mode: +oo Fletcher brayden
swebb sets mode: +o brayden
jmtd has joined #archiveteam-bs
Smiley has joined #archiveteam-bs
[15:00]
Asparagir has quit IRC (Asparagir) [15:13]
...... (idle for 27mn)
RichardG has joined #archiveteam-bs [15:40]
...... (idle for 27mn)
RichardG has quit IRC (Read error: Operation timed out)
RichardG has joined #archiveteam-bs
[16:07]
..... (idle for 24mn)
powerArch has quit IRC (Remote host closed the connection)
RedType_ has quit IRC (Read error: Operation timed out)
icedice has joined #archiveteam-bs
[16:31]
...... (idle for 28mn)
RichardG has quit IRC (Read error: Operation timed out)
RichardG has joined #archiveteam-bs
[17:02]
...... (idle for 27mn)
RichardG has quit IRC (Read error: Operation timed out)
RichardG has joined #archiveteam-bs
ndiddy has joined #archiveteam-bs
Asparagir has joined #archiveteam-bs
Honno__ has joined #archiveteam-bs
GE has quit IRC (Remote host closed the connection)
Honno_ has quit IRC (Ping timeout: 370 seconds)
[17:29]
...... (idle for 29mn)
RichardG has quit IRC (Read error: Operation timed out)
RichardG has joined #archiveteam-bs
[18:12]
.......... (idle for 47mn)
RichardG has quit IRC (Read error: Operation timed out)
RichardG has joined #archiveteam-bs
[18:59]
BartoCH has quit IRC (Ping timeout: 260 seconds) [19:13]
..... (idle for 23mn)
GE has joined #archiveteam-bs
C4K3_ is now known as C4K3
[19:36]
.... (idle for 19mn)
BartoCH has joined #archiveteam-bs [19:59]
BartoCH has quit IRC (Ping timeout: 260 seconds)
BartoCH has joined #archiveteam-bs
[20:04]
BartoCH has quit IRC (Ping timeout: 260 seconds) [20:19]
.... (idle for 15mn)
BartoCH has joined #archiveteam-bs [20:34]
powerArch has joined #archiveteam-bs [20:44]
...... (idle for 27mn)
RichardG has quit IRC (Read error: Operation timed out)
RichardG has joined #archiveteam-bs
[21:11]
...... (idle for 26mn)
RedType has joined #archiveteam-bs
RichardG has quit IRC (Read error: Operation timed out)
RichardG has joined #archiveteam-bs
[21:37]
.... (idle for 17mn)
icedice has quit IRC (Ping timeout: 250 seconds) [21:54]
Asparagirim in yr internet archive archiving yr internets
no seriously, I'm working at Funston today, come say hi if you're around
[21:58]
xmcohai! [21:58]
***xmc sets mode: +o Asparagir [21:59]
AsparagirNow I'm all super-powerful, thanks!
Step two, get me one of those orbs
Step three, profit.
The WiFi here is about 165 Mbps. :-O
[21:59]
***GE has quit IRC (Remote host closed the connection) [22:14]
JAAAlright, my setup for Razer Arena with wpull and PhantomJS seems to work in principle. The main problems are that it still doesn't capture everything (wpull doesn't seem to extract links from the DOM generated by PhantomJS) and that the grab will be quite large due to duplication (each page grabs all the JavaScript, imagery, etc. again through PhantomJS). [22:20]
MrRadarI think arkiver has a script to dedup WARCs [22:22]
JAAYeah, I guess that shouldn't be too difficult. I'm more concerned about the "doesn't capture everything" part. [22:24]
..... (idle for 20mn)
***dashcloud has quit IRC (Remote host closed the connection)
dashcloud has joined #archiveteam-bs
[22:44]
JAAIf anyone has any ideas, please let me know. For the record, I'm using wpull 1.2.3 with PhantomJS 2.1.1 and the options --phantomjs --phantomjs-exe /path/to/phantomjs --no-phantomjs-snapshot .
Otherwise, I'll just grab the actual data through the API and ignore the interface.
[23:00]
..... (idle for 21mn)
***Ravenloft has joined #archiveteam-bs [23:22]
BlueMaxim has joined #archiveteam-bs [23:35]
..... (idle for 21mn)
Odd0002hmm, is there anywhere other than archive.org that I could go to look for or upload old, late 90's/early 2000's PC games? Archive doesn't seem to have them, and there's almost no information on the internet about these games [23:56]
xmcarchive.org is a good place to upload
i don't know where a good place to find is though
[23:57]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)