#archiveteam 2015-07-05,Sun

↑back Search

Time Nickname Message
01:15 🔗 VADemon has joined #archiveteam
01:26 🔗 Meeh_ has joined #archiveteam
01:27 🔗 raylee has quit IRC (hub.dk irc.underworld.no)
01:27 🔗 wm_ has quit IRC (hub.dk irc.underworld.no)
01:27 🔗 Atluxity has quit IRC (hub.dk irc.underworld.no)
01:32 🔗 philpem has quit IRC (Ping timeout: 252 seconds)
01:40 🔗 primus104 has quit IRC (Leaving.)
01:45 🔗 Aranje has joined #archiveteam
02:00 🔗 DopefishJ is now known as DFJustin
02:14 🔗 VADemon has quit IRC (Quit: left4dead)
03:17 🔗 machinedr has joined #archiveteam
03:52 🔗 Jonimus has quit IRC (Ping timeout: 252 seconds)
03:53 🔗 T31M has quit IRC (Read error: Connection reset by peer)
03:54 🔗 T31M has joined #archiveteam
04:05 🔗 Lord_Nigh has archiveteam dealt with this mess yet: https://gist.githubusercontent.com/sebadoom/f0eedcba2f39e3e07a1c/raw/c168b48210bf7f85029545743891e7e4f8c95df4/gistfile1.txt
04:05 🔗 Lord_Nigh lots of stuff to mirror
04:32 🔗 aaaaaaaaa has quit IRC (Leaving)
05:10 🔗 mistym has quit IRC (Remote host closed the connection)
05:14 🔗 machinedr has quit IRC (Quit: ChatZilla 0.9.91.1 [Firefox 39.0/20150630154324])
05:24 🔗 machinedr has joined #archiveteam
05:25 🔗 machinedr how is PhantomJS working out on wpull?
05:28 🔗 machinedr I created a project similar to phantomjs, but based on java
05:41 🔗 mistym has joined #archiveteam
05:45 🔗 yipdw machinedr: the output is generally pretty good, but it imposes significant system load and we have seen phantomjs processes that don't terminate as expected
05:45 🔗 yipdw this in the archivebot+wpull setup, so it may not be wpull's fault
05:46 🔗 machinedr ok
05:46 🔗 yipdw the future for archivebot+wpull is probably Chrome-as-crawler
05:47 🔗 machinedr as in selenium chrome driver?
05:48 🔗 yipdw as in the webkit remote debugging protocol
05:48 🔗 yipdw that may be what Selenium+chromedriver uses; I haven't kept up with that
05:48 🔗 yipdw anyway that is not a short-term thing
05:50 🔗 machinedr yeah, not sure. I saw this issue which mentioned selenium, https://github.com/chfoo/wpull/issues/248
05:50 🔗 yipdw ah
05:50 🔗 yipdw that would be nice also
05:53 🔗 yipdw it would also get us exactly what we need, which is a web thing that can deal with JS without bombing
05:54 🔗 yipdw the rest of wpull (WARC generation, link identification, bookkeeping) seems to do fine
05:54 🔗 yipdw oh and scripting, concurrency, queue management, etc
05:56 🔗 machinedr yeah I experienced bad performance in crashes using selenium's firefox driver. It motivated me to try making my own driver using only java
05:56 🔗 machinedr javafx has a webkit embedded
06:09 🔗 phillipsj With the major browsers dropping Java Applet, support, I was thinking it was time for a new "hotjava" browser.
06:11 🔗 phillipsj https://en.wikipedia.org/wiki/HotJava
06:11 🔗 machinedr ironically java's webview does not support applets :) ... at least not out of the box
06:13 🔗 machinedr oh wait... maybe http://stackoverflow.com/questions/27949881/java-applet-in-webview
06:28 🔗 Fusl has quit IRC (Ping timeout: 255 seconds)
06:33 🔗 _0x2A has quit IRC (Read error: Operation timed out)
06:44 🔗 bentpins has joined #archiveteam
06:59 🔗 machinedr has quit IRC (Quit: ChatZilla 0.9.91.1 [Firefox 39.0/20150630154324])
07:07 🔗 ruukasu_ has quit IRC (Read error: No route to host)
07:12 🔗 ruukasu has joined #archiveteam
07:20 🔗 arkiver SketchCow: we're going to do a grab of Reddit. We'll save all posts
07:20 🔗 arkiver #deaddit
07:21 🔗 arkiver users are starting to remove all their posts, some subreddits are going private and some subreddits have announced they're going to delete themselves
07:31 🔗 schbirid has joined #archiveteam
07:51 🔗 Aranje has quit IRC (Remote host closed the connection)
07:52 🔗 bzc6p_ has joined #archiveteam
07:57 🔗 bzc6p has quit IRC (Ping timeout: 600 seconds)
07:57 🔗 bzc6p_ is now known as bzc6p
08:01 🔗 ohhdemgir has quit IRC (Quit: Leaving)
08:07 🔗 mistym has quit IRC (Remote host closed the connection)
08:12 🔗 primus104 has joined #archiveteam
08:15 🔗 wm_ has joined #archiveteam
08:15 🔗 raylee has joined #archiveteam
08:17 🔗 WinterFox has quit IRC (Ping timeout: 483 seconds)
08:22 🔗 ohhdemgir has joined #archiveteam
08:22 🔗 WinterFox has joined #archiveteam
08:23 🔗 habi has joined #archiveteam
08:23 🔗 primus104 has quit IRC (Leaving.)
08:32 🔗 philpem has joined #archiveteam
08:32 🔗 habi has left
08:59 🔗 primus104 has joined #archiveteam
09:08 🔗 mistym has joined #archiveteam
09:11 🔗 WinterFox has quit IRC (Remote host closed the connection)
09:13 🔗 WinterFox has joined #archiveteam
09:14 🔗 Ungstein has joined #archiveteam
09:17 🔗 mistym has quit IRC (Ping timeout: 483 seconds)
09:23 🔗 alt40409 has quit IRC (Ping timeout: 370 seconds)
09:29 🔗 WinterFox has quit IRC (Ping timeout: 483 seconds)
09:39 🔗 WinterFox has joined #archiveteam
09:40 🔗 arkiver SketchCow: last.fm user discovery has started
10:48 🔗 vitzli has joined #archiveteam
11:10 🔗 mistym has joined #archiveteam
11:18 🔗 mistym has quit IRC (Read error: Operation timed out)
11:25 🔗 BlueMaxim has quit IRC (Quit: Leaving)
11:41 🔗 Fusl has joined #archiveteam
11:42 🔗 thewalrus has joined #archiveteam
11:42 🔗 thewalrus has left
11:44 🔗 _0x2A has joined #archiveteam
11:52 🔗 szalwia has joined #archiveteam
12:25 🔗 VADemon has joined #archiveteam
12:28 🔗 RichardG has joined #archiveteam
12:30 🔗 Ungstein1 has joined #archiveteam
12:30 🔗 Ungstein has quit IRC (Ping timeout: 265 seconds)
12:46 🔗 Rickster has quit IRC (Ping timeout: 252 seconds)
12:47 🔗 Muad-Dib has quit IRC (Ping timeout: 252 seconds)
13:10 🔗 mistym has joined #archiveteam
13:11 🔗 Rickster has joined #archiveteam
13:18 🔗 mistym has quit IRC (Read error: Operation timed out)
13:19 🔗 signius has quit IRC (Read error: Operation timed out)
13:32 🔗 signius has joined #archiveteam
13:41 🔗 Emcy_ has joined #archiveteam
13:44 🔗 Emcy has quit IRC (Ping timeout: 306 seconds)
13:45 🔗 VADemon has quit IRC (Read error: Connection reset by peer)
13:46 🔗 VADemon has joined #archiveteam
14:30 🔗 WinterFox has quit IRC (Remote host closed the connection)
14:32 🔗 SketchCow Take a shot
14:32 🔗 SketchCow (Reddit)
15:12 🔗 mistym has joined #archiveteam
15:16 🔗 mistym has quit IRC (Ping timeout: 252 seconds)
15:17 🔗 arkiver Awesome, let's grab reddit
15:22 🔗 Ungstein has joined #archiveteam
15:25 🔗 Ungstein1 has quit IRC (Ping timeout: 265 seconds)
15:40 🔗 bentpins has quit IRC (Read error: Connection reset by peer)
15:46 🔗 primus104 has quit IRC (Leaving.)
16:21 🔗 primus104 has joined #archiveteam
16:23 🔗 ruukasu has quit IRC (Ping timeout: 265 seconds)
16:27 🔗 ruukasu has joined #archiveteam
16:36 🔗 ruukasu has quit IRC (Ping timeout: 265 seconds)
16:46 🔗 godane has quit IRC (Read error: Operation timed out)
16:50 🔗 mistym has joined #archiveteam
16:54 🔗 primus104 has quit IRC (Leaving.)
16:56 🔗 godane has joined #archiveteam
16:57 🔗 SketchCow https://twitter.com/renesugar/status/617736740044836864
16:57 🔗 SN4T14 has joined #archiveteam
17:20 🔗 xmc oh my
17:21 🔗 primus104 has joined #archiveteam
17:22 🔗 vitzli has quit IRC (Quit: Leaving)
17:28 🔗 ruukasu has joined #archiveteam
18:04 🔗 aaaaaaaaa has joined #archiveteam
18:14 🔗 SketchCow Dear Archive Team,
18:14 🔗 SketchCow It appears from your Wiki that you successfully archived Windows Live Spaces. I am trying to access my old space and have tried the wayback machine with no success.
18:14 🔗 SketchCow Did you succeed in archiving Live Spaces? Is there a way I might be able to access my old 'space'?
18:14 🔗 SketchCow Thanks for your excellent work.
18:14 🔗 SketchCow Parag
18:44 🔗 Stiletto has quit IRC (Ping timeout: 258 seconds)
18:47 🔗 Stiletto has joined #archiveteam
19:07 🔗 bzc6p has quit IRC (Ping timeout: 600 seconds)
19:08 🔗 bzc6p has joined #archiveteam
19:53 🔗 habi has joined #archiveteam
19:53 🔗 habi has left
19:54 🔗 arkiver Channel of our full grab of reddit: #deaddit
20:03 🔗 jbaumgart has joined #archiveteam
20:03 🔗 jbaumgart hello
20:10 🔗 SketchCow Hey
20:12 🔗 primus104 has quit IRC (Leaving.)
20:15 🔗 jbaumgart did you get the link to the reddit_data torrent?
20:15 🔗 SketchCow News to me. Others might know.
20:15 🔗 jbaumgart here's the thread I made for it -- https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment/
20:17 🔗 jbaumgart has quit IRC (Leaving)
20:18 🔗 bzc6p_ has joined #archiveteam
20:21 🔗 Smiley magnet:?xt=urn:btih:7690f71ea949b868080401c749e878f98de34d3d&dn=reddit%5Fdata&tr=http%3A%2F%2Ftracker.pushshift.io%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80
20:21 🔗 Smiley the important bit ;)
20:21 🔗 SketchCow Yes, I'm grabbing it and will put it on archive.org.
20:23 🔗 bzc6p has quit IRC (Read error: Operation timed out)
20:24 🔗 Smiley :)
20:58 🔗 primus104 has joined #archiveteam
21:22 🔗 bzc6p_ is now known as bzc6p
21:38 🔗 schbirid who can figure out how to search for licenseurl containing by-nc on https://archive.org/advancedsearch.php ?
21:38 🔗 schbirid oh my god that website suck
21:45 🔗 SketchCow :)
21:45 🔗 SketchCow Bring it on
21:47 🔗 schbirid in a moment, i have to wait for the search field expanding animation to finish
21:48 🔗 nox2 has quit IRC (Ping timeout: 252 seconds)
21:48 🔗 SketchCow What are you on, a tin can connected to a windmill?
21:49 🔗 SketchCow Sounds like someone should be using https://pypi.python.org/pypi/internetarchive
21:49 🔗 SketchCow And utilizing ia search
21:49 🔗 schbirid not sure i want to tell that to the friend who was asking
21:49 🔗 SketchCow I mean, keep trashing it
21:50 🔗 SketchCow Because as you know, my gentle and supplicant personality is legendary
21:50 🔗 philpem has quit IRC (Remote host closed the connection)
21:51 🔗 SketchCow Also, remember our money-back guarantee
21:51 🔗 schbirid you can dwell in trash talk as much as you like, the site is not becoming better
21:51 🔗 schbirid i guess patches are welcome
21:52 🔗 SketchCow Well, as you know, we work day in and day out to make your experience as terrible as possible.
21:52 🔗 SketchCow We wait and look over every feature, and if we find it has use or utility, we strip it out
21:52 🔗 SketchCow That's what we do.
21:52 🔗 yipdw schbirid: licenseurl doesn't look like it's set up for substring search, or the tokens (e.g. by-nc) are too short. using a full URL, e.g. http://creativecommons.org/licenses/by-nc-nd/3.0/, works fine
21:53 🔗 SketchCow But, I mean, sure, nothing gets the job done quicker than whining like your subscription copy of Dark Plunders III has an overpriced optional weapon you couldn't hack on a F2P server.
21:53 🔗 SketchCow That's how the job gets done.
21:53 🔗 schbirid yipdw: zero results here -> https://archive.org/search.php?query=licenseurl%3A%28http%3A%2F%2Fcreativecommons.org%2Flicenses%2Fby-nc-nd%2F3.0%2F%29
21:53 🔗 yipdw https://archive.org/search.php?query=licenseurl%3A%22http%3A%2F%2Fcreativecommons.org%2Flicenses%2Fby-nc-nd%2F3.0%2F%22
21:53 🔗 yipdw nonzero cardinality there
21:54 🔗 schbirid SketchCow: i love the search bar animation, it really adds usability. also image galleries for music!
21:54 🔗 yipdw IA's search engine is Solr, or at least it seems Lucene-based
21:54 🔗 yipdw it helps to know Lucene syntax
21:54 🔗 schbirid ah, nice
21:54 🔗 dashcloud SketchCow: when you get a chance, there's a lot of spam when you search for "Microsoft Office"
21:54 🔗 schbirid () is what "contains" from the ui got me
21:54 🔗 yipdw that may or may not be right, I forget what that means in Lucene
21:55 🔗 yipdw I don't actually know what IA uses for search except that a lot of what I used to do when tweaking Solr installs seems to apply
21:56 🔗 yipdw those were dark times
22:02 🔗 oldcad has joined #archiveteam
22:03 🔗 schbirid has quit IRC (Leaving)
22:38 🔗 nox has quit IRC (Read error: Connection reset by peer)
22:47 🔗 SketchCow http://pastebin.com/raw.php?i=qYh8E841
22:47 🔗 SketchCow awww yis
23:02 🔗 raylee nice!
23:10 🔗 WinterFox has joined #archiveteam
23:17 🔗 VADemon_ has joined #archiveteam
23:20 🔗 VADemon has quit IRC (Read error: Operation timed out)
23:50 🔗 DopefishJ has joined #archiveteam
23:59 🔗 DFJustin has quit IRC (Ping timeout: 740 seconds)

irclogger-viewer