Time |
Nickname |
Message |
01:15
🔗
|
|
VADemon has joined #archiveteam |
01:26
🔗
|
|
Meeh_ has joined #archiveteam |
01:27
🔗
|
|
raylee has quit IRC (hub.dk irc.underworld.no) |
01:27
🔗
|
|
wm_ has quit IRC (hub.dk irc.underworld.no) |
01:27
🔗
|
|
Atluxity has quit IRC (hub.dk irc.underworld.no) |
01:32
🔗
|
|
philpem has quit IRC (Ping timeout: 252 seconds) |
01:40
🔗
|
|
primus104 has quit IRC (Leaving.) |
01:45
🔗
|
|
Aranje has joined #archiveteam |
02:00
🔗
|
|
DopefishJ is now known as DFJustin |
02:14
🔗
|
|
VADemon has quit IRC (Quit: left4dead) |
03:17
🔗
|
|
machinedr has joined #archiveteam |
03:52
🔗
|
|
Jonimus has quit IRC (Ping timeout: 252 seconds) |
03:53
🔗
|
|
T31M has quit IRC (Read error: Connection reset by peer) |
03:54
🔗
|
|
T31M has joined #archiveteam |
04:05
🔗
|
Lord_Nigh |
has archiveteam dealt with this mess yet: https://gist.githubusercontent.com/sebadoom/f0eedcba2f39e3e07a1c/raw/c168b48210bf7f85029545743891e7e4f8c95df4/gistfile1.txt |
04:05
🔗
|
Lord_Nigh |
lots of stuff to mirror |
04:32
🔗
|
|
aaaaaaaaa has quit IRC (Leaving) |
05:10
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
05:14
🔗
|
|
machinedr has quit IRC (Quit: ChatZilla 0.9.91.1 [Firefox 39.0/20150630154324]) |
05:24
🔗
|
|
machinedr has joined #archiveteam |
05:25
🔗
|
machinedr |
how is PhantomJS working out on wpull? |
05:28
🔗
|
machinedr |
I created a project similar to phantomjs, but based on java |
05:41
🔗
|
|
mistym has joined #archiveteam |
05:45
🔗
|
yipdw |
machinedr: the output is generally pretty good, but it imposes significant system load and we have seen phantomjs processes that don't terminate as expected |
05:45
🔗
|
yipdw |
this in the archivebot+wpull setup, so it may not be wpull's fault |
05:46
🔗
|
machinedr |
ok |
05:46
🔗
|
yipdw |
the future for archivebot+wpull is probably Chrome-as-crawler |
05:47
🔗
|
machinedr |
as in selenium chrome driver? |
05:48
🔗
|
yipdw |
as in the webkit remote debugging protocol |
05:48
🔗
|
yipdw |
that may be what Selenium+chromedriver uses; I haven't kept up with that |
05:48
🔗
|
yipdw |
anyway that is not a short-term thing |
05:50
🔗
|
machinedr |
yeah, not sure. I saw this issue which mentioned selenium, https://github.com/chfoo/wpull/issues/248 |
05:50
🔗
|
yipdw |
ah |
05:50
🔗
|
yipdw |
that would be nice also |
05:53
🔗
|
yipdw |
it would also get us exactly what we need, which is a web thing that can deal with JS without bombing |
05:54
🔗
|
yipdw |
the rest of wpull (WARC generation, link identification, bookkeeping) seems to do fine |
05:54
🔗
|
yipdw |
oh and scripting, concurrency, queue management, etc |
05:56
🔗
|
machinedr |
yeah I experienced bad performance in crashes using selenium's firefox driver. It motivated me to try making my own driver using only java |
05:56
🔗
|
machinedr |
javafx has a webkit embedded |
06:09
🔗
|
phillipsj |
With the major browsers dropping Java Applet, support, I was thinking it was time for a new "hotjava" browser. |
06:11
🔗
|
phillipsj |
https://en.wikipedia.org/wiki/HotJava |
06:11
🔗
|
machinedr |
ironically java's webview does not support applets :) ... at least not out of the box |
06:13
🔗
|
machinedr |
oh wait... maybe http://stackoverflow.com/questions/27949881/java-applet-in-webview |
06:28
🔗
|
|
Fusl has quit IRC (Ping timeout: 255 seconds) |
06:33
🔗
|
|
_0x2A has quit IRC (Read error: Operation timed out) |
06:44
🔗
|
|
bentpins has joined #archiveteam |
06:59
🔗
|
|
machinedr has quit IRC (Quit: ChatZilla 0.9.91.1 [Firefox 39.0/20150630154324]) |
07:07
🔗
|
|
ruukasu_ has quit IRC (Read error: No route to host) |
07:12
🔗
|
|
ruukasu has joined #archiveteam |
07:20
🔗
|
arkiver |
SketchCow: we're going to do a grab of Reddit. We'll save all posts |
07:20
🔗
|
arkiver |
#deaddit |
07:21
🔗
|
arkiver |
users are starting to remove all their posts, some subreddits are going private and some subreddits have announced they're going to delete themselves |
07:31
🔗
|
|
schbirid has joined #archiveteam |
07:51
🔗
|
|
Aranje has quit IRC (Remote host closed the connection) |
07:52
🔗
|
|
bzc6p_ has joined #archiveteam |
07:57
🔗
|
|
bzc6p has quit IRC (Ping timeout: 600 seconds) |
07:57
🔗
|
|
bzc6p_ is now known as bzc6p |
08:01
🔗
|
|
ohhdemgir has quit IRC (Quit: Leaving) |
08:07
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
08:12
🔗
|
|
primus104 has joined #archiveteam |
08:15
🔗
|
|
wm_ has joined #archiveteam |
08:15
🔗
|
|
raylee has joined #archiveteam |
08:17
🔗
|
|
WinterFox has quit IRC (Ping timeout: 483 seconds) |
08:22
🔗
|
|
ohhdemgir has joined #archiveteam |
08:22
🔗
|
|
WinterFox has joined #archiveteam |
08:23
🔗
|
|
habi has joined #archiveteam |
08:23
🔗
|
|
primus104 has quit IRC (Leaving.) |
08:32
🔗
|
|
philpem has joined #archiveteam |
08:32
🔗
|
|
habi has left |
08:59
🔗
|
|
primus104 has joined #archiveteam |
09:08
🔗
|
|
mistym has joined #archiveteam |
09:11
🔗
|
|
WinterFox has quit IRC (Remote host closed the connection) |
09:13
🔗
|
|
WinterFox has joined #archiveteam |
09:14
🔗
|
|
Ungstein has joined #archiveteam |
09:17
🔗
|
|
mistym has quit IRC (Ping timeout: 483 seconds) |
09:23
🔗
|
|
alt40409 has quit IRC (Ping timeout: 370 seconds) |
09:29
🔗
|
|
WinterFox has quit IRC (Ping timeout: 483 seconds) |
09:39
🔗
|
|
WinterFox has joined #archiveteam |
09:40
🔗
|
arkiver |
SketchCow: last.fm user discovery has started |
10:48
🔗
|
|
vitzli has joined #archiveteam |
11:10
🔗
|
|
mistym has joined #archiveteam |
11:18
🔗
|
|
mistym has quit IRC (Read error: Operation timed out) |
11:25
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
11:41
🔗
|
|
Fusl has joined #archiveteam |
11:42
🔗
|
|
thewalrus has joined #archiveteam |
11:42
🔗
|
|
thewalrus has left |
11:44
🔗
|
|
_0x2A has joined #archiveteam |
11:52
🔗
|
|
szalwia has joined #archiveteam |
12:25
🔗
|
|
VADemon has joined #archiveteam |
12:28
🔗
|
|
RichardG has joined #archiveteam |
12:30
🔗
|
|
Ungstein1 has joined #archiveteam |
12:30
🔗
|
|
Ungstein has quit IRC (Ping timeout: 265 seconds) |
12:46
🔗
|
|
Rickster has quit IRC (Ping timeout: 252 seconds) |
12:47
🔗
|
|
Muad-Dib has quit IRC (Ping timeout: 252 seconds) |
13:10
🔗
|
|
mistym has joined #archiveteam |
13:11
🔗
|
|
Rickster has joined #archiveteam |
13:18
🔗
|
|
mistym has quit IRC (Read error: Operation timed out) |
13:19
🔗
|
|
signius has quit IRC (Read error: Operation timed out) |
13:32
🔗
|
|
signius has joined #archiveteam |
13:41
🔗
|
|
Emcy_ has joined #archiveteam |
13:44
🔗
|
|
Emcy has quit IRC (Ping timeout: 306 seconds) |
13:45
🔗
|
|
VADemon has quit IRC (Read error: Connection reset by peer) |
13:46
🔗
|
|
VADemon has joined #archiveteam |
14:30
🔗
|
|
WinterFox has quit IRC (Remote host closed the connection) |
14:32
🔗
|
SketchCow |
Take a shot |
14:32
🔗
|
SketchCow |
(Reddit) |
15:12
🔗
|
|
mistym has joined #archiveteam |
15:16
🔗
|
|
mistym has quit IRC (Ping timeout: 252 seconds) |
15:17
🔗
|
arkiver |
Awesome, let's grab reddit |
15:22
🔗
|
|
Ungstein has joined #archiveteam |
15:25
🔗
|
|
Ungstein1 has quit IRC (Ping timeout: 265 seconds) |
15:40
🔗
|
|
bentpins has quit IRC (Read error: Connection reset by peer) |
15:46
🔗
|
|
primus104 has quit IRC (Leaving.) |
16:21
🔗
|
|
primus104 has joined #archiveteam |
16:23
🔗
|
|
ruukasu has quit IRC (Ping timeout: 265 seconds) |
16:27
🔗
|
|
ruukasu has joined #archiveteam |
16:36
🔗
|
|
ruukasu has quit IRC (Ping timeout: 265 seconds) |
16:46
🔗
|
|
godane has quit IRC (Read error: Operation timed out) |
16:50
🔗
|
|
mistym has joined #archiveteam |
16:54
🔗
|
|
primus104 has quit IRC (Leaving.) |
16:56
🔗
|
|
godane has joined #archiveteam |
16:57
🔗
|
SketchCow |
https://twitter.com/renesugar/status/617736740044836864 |
16:57
🔗
|
|
SN4T14 has joined #archiveteam |
17:20
🔗
|
xmc |
oh my |
17:21
🔗
|
|
primus104 has joined #archiveteam |
17:22
🔗
|
|
vitzli has quit IRC (Quit: Leaving) |
17:28
🔗
|
|
ruukasu has joined #archiveteam |
18:04
🔗
|
|
aaaaaaaaa has joined #archiveteam |
18:14
🔗
|
SketchCow |
Dear Archive Team, |
18:14
🔗
|
SketchCow |
It appears from your Wiki that you successfully archived Windows Live Spaces. I am trying to access my old space and have tried the wayback machine with no success. |
18:14
🔗
|
SketchCow |
Did you succeed in archiving Live Spaces? Is there a way I might be able to access my old 'space'? |
18:14
🔗
|
SketchCow |
Thanks for your excellent work. |
18:14
🔗
|
SketchCow |
Parag |
18:44
🔗
|
|
Stiletto has quit IRC (Ping timeout: 258 seconds) |
18:47
🔗
|
|
Stiletto has joined #archiveteam |
19:07
🔗
|
|
bzc6p has quit IRC (Ping timeout: 600 seconds) |
19:08
🔗
|
|
bzc6p has joined #archiveteam |
19:53
🔗
|
|
habi has joined #archiveteam |
19:53
🔗
|
|
habi has left |
19:54
🔗
|
arkiver |
Channel of our full grab of reddit: #deaddit |
20:03
🔗
|
|
jbaumgart has joined #archiveteam |
20:03
🔗
|
jbaumgart |
hello |
20:10
🔗
|
SketchCow |
Hey |
20:12
🔗
|
|
primus104 has quit IRC (Leaving.) |
20:15
🔗
|
jbaumgart |
did you get the link to the reddit_data torrent? |
20:15
🔗
|
SketchCow |
News to me. Others might know. |
20:15
🔗
|
jbaumgart |
here's the thread I made for it -- https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment/ |
20:17
🔗
|
|
jbaumgart has quit IRC (Leaving) |
20:18
🔗
|
|
bzc6p_ has joined #archiveteam |
20:21
🔗
|
Smiley |
magnet:?xt=urn:btih:7690f71ea949b868080401c749e878f98de34d3d&dn=reddit%5Fdata&tr=http%3A%2F%2Ftracker.pushshift.io%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80 |
20:21
🔗
|
Smiley |
the important bit ;) |
20:21
🔗
|
SketchCow |
Yes, I'm grabbing it and will put it on archive.org. |
20:23
🔗
|
|
bzc6p has quit IRC (Read error: Operation timed out) |
20:24
🔗
|
Smiley |
:) |
20:58
🔗
|
|
primus104 has joined #archiveteam |
21:22
🔗
|
|
bzc6p_ is now known as bzc6p |
21:38
🔗
|
schbirid |
who can figure out how to search for licenseurl containing by-nc on https://archive.org/advancedsearch.php ? |
21:38
🔗
|
schbirid |
oh my god that website suck |
21:45
🔗
|
SketchCow |
:) |
21:45
🔗
|
SketchCow |
Bring it on |
21:47
🔗
|
schbirid |
in a moment, i have to wait for the search field expanding animation to finish |
21:48
🔗
|
|
nox2 has quit IRC (Ping timeout: 252 seconds) |
21:48
🔗
|
SketchCow |
What are you on, a tin can connected to a windmill? |
21:49
🔗
|
SketchCow |
Sounds like someone should be using https://pypi.python.org/pypi/internetarchive |
21:49
🔗
|
SketchCow |
And utilizing ia search |
21:49
🔗
|
schbirid |
not sure i want to tell that to the friend who was asking |
21:49
🔗
|
SketchCow |
I mean, keep trashing it |
21:50
🔗
|
SketchCow |
Because as you know, my gentle and supplicant personality is legendary |
21:50
🔗
|
|
philpem has quit IRC (Remote host closed the connection) |
21:51
🔗
|
SketchCow |
Also, remember our money-back guarantee |
21:51
🔗
|
schbirid |
you can dwell in trash talk as much as you like, the site is not becoming better |
21:51
🔗
|
schbirid |
i guess patches are welcome |
21:52
🔗
|
SketchCow |
Well, as you know, we work day in and day out to make your experience as terrible as possible. |
21:52
🔗
|
SketchCow |
We wait and look over every feature, and if we find it has use or utility, we strip it out |
21:52
🔗
|
SketchCow |
That's what we do. |
21:52
🔗
|
yipdw |
schbirid: licenseurl doesn't look like it's set up for substring search, or the tokens (e.g. by-nc) are too short. using a full URL, e.g. http://creativecommons.org/licenses/by-nc-nd/3.0/, works fine |
21:53
🔗
|
SketchCow |
But, I mean, sure, nothing gets the job done quicker than whining like your subscription copy of Dark Plunders III has an overpriced optional weapon you couldn't hack on a F2P server. |
21:53
🔗
|
SketchCow |
That's how the job gets done. |
21:53
🔗
|
schbirid |
yipdw: zero results here -> https://archive.org/search.php?query=licenseurl%3A%28http%3A%2F%2Fcreativecommons.org%2Flicenses%2Fby-nc-nd%2F3.0%2F%29 |
21:53
🔗
|
yipdw |
https://archive.org/search.php?query=licenseurl%3A%22http%3A%2F%2Fcreativecommons.org%2Flicenses%2Fby-nc-nd%2F3.0%2F%22 |
21:53
🔗
|
yipdw |
nonzero cardinality there |
21:54
🔗
|
schbirid |
SketchCow: i love the search bar animation, it really adds usability. also image galleries for music! |
21:54
🔗
|
yipdw |
IA's search engine is Solr, or at least it seems Lucene-based |
21:54
🔗
|
yipdw |
it helps to know Lucene syntax |
21:54
🔗
|
schbirid |
ah, nice |
21:54
🔗
|
dashcloud |
SketchCow: when you get a chance, there's a lot of spam when you search for "Microsoft Office" |
21:54
🔗
|
schbirid |
() is what "contains" from the ui got me |
21:54
🔗
|
yipdw |
that may or may not be right, I forget what that means in Lucene |
21:55
🔗
|
yipdw |
I don't actually know what IA uses for search except that a lot of what I used to do when tweaking Solr installs seems to apply |
21:56
🔗
|
yipdw |
those were dark times |
22:02
🔗
|
|
oldcad has joined #archiveteam |
22:03
🔗
|
|
schbirid has quit IRC (Leaving) |
22:38
🔗
|
|
nox has quit IRC (Read error: Connection reset by peer) |
22:47
🔗
|
SketchCow |
http://pastebin.com/raw.php?i=qYh8E841 |
22:47
🔗
|
SketchCow |
awww yis |
23:02
🔗
|
raylee |
nice! |
23:10
🔗
|
|
WinterFox has joined #archiveteam |
23:17
🔗
|
|
VADemon_ has joined #archiveteam |
23:20
🔗
|
|
VADemon has quit IRC (Read error: Operation timed out) |
23:50
🔗
|
|
DopefishJ has joined #archiveteam |
23:59
🔗
|
|
DFJustin has quit IRC (Ping timeout: 740 seconds) |