| Time |
Nickname |
Message |
|
00:02
🔗
|
|
rejon has quit IRC (Read error: Operation timed out) |
|
00:31
🔗
|
|
garyrh has quit IRC (Remote host closed the connection) |
|
00:58
🔗
|
|
Smiley has quit IRC (Ping timeout: 370 seconds) |
|
00:59
🔗
|
|
garyrh has joined #archiveteam-bs |
|
01:02
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
|
01:19
🔗
|
|
mistym has joined #archiveteam-bs |
|
01:20
🔗
|
|
DFJustin has joined #archiveteam-bs |
|
01:20
🔗
|
|
swebb sets mode: +o DFJustin |
|
01:24
🔗
|
|
primus104 has quit IRC (Leaving.) |
|
01:34
🔗
|
|
egg_ has quit IRC (quit) |
|
01:38
🔗
|
|
nico_ is now known as nico_32 |
|
01:44
🔗
|
|
Smiley has joined #archiveteam-bs |
|
02:21
🔗
|
|
APerti has joined #archiveteam-bs |
|
02:47
🔗
|
|
APerti_ has joined #archiveteam-bs |
|
02:50
🔗
|
|
APerti has quit IRC (Read error: Operation timed out) |
|
03:49
🔗
|
chfoo |
http://thedailywh.at/2015/01/distraction-of-the-day-you-can-now-play-oregon-trail-and-other-ms-dos-games-online/ |
|
04:33
🔗
|
|
aaaaaaaaa has quit IRC (Leaving) |
|
05:21
🔗
|
|
S_aus_Eur has joined #archiveteam-bs |
|
05:21
🔗
|
|
S_aus_Eur has left |
|
05:29
🔗
|
godane |
so some of the npr morning radio episodes are going to be in real media |
|
05:29
🔗
|
godane |
these real media files don't derive right at all |
|
05:29
🔗
|
godane |
old ones don't have this problem |
|
05:30
🔗
|
godane |
i hope some one can at least look at them to see what is the problem with IA deriving them |
|
05:31
🔗
|
godane |
these are in real media only: https://archive.org/details/npr-morning-edition-01-02-2003 |
|
05:34
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
|
05:37
🔗
|
|
mistym has joined #archiveteam-bs |
|
05:59
🔗
|
DFJustin |
huh never knew there was a dos oregon trail |
|
07:13
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
|
07:25
🔗
|
|
APerti_ has quit IRC (Read error: Operation timed out) |
|
07:44
🔗
|
Ctrl-S |
I've started work on a tumblr archiver, here is code so far: https://mega.co.nz/#!bxJFzL4Z!8h1TQHKJT7WvJRkgiZTPkgbO2gDw7a4VbxFSa1Go-k4 |
|
08:06
🔗
|
|
primus104 has joined #archiveteam-bs |
|
08:38
🔗
|
joepie91 |
godane: fwiw, it has derived now |
|
08:38
🔗
|
joepie91 |
Ctrl-S: use some sort of git hosting, please :D |
|
08:38
🔗
|
joepie91 |
especially since you're already using git... |
|
08:39
🔗
|
joepie91 |
(or at the very least tar.gz, zip isn't very good at unix perms) |
|
08:51
🔗
|
|
GLaDOS has quit IRC (Ping timeout: 272 seconds) |
|
08:51
🔗
|
|
GLaDOS has joined #archiveteam-bs |
|
08:51
🔗
|
|
swebb sets mode: +o GLaDOS |
|
09:07
🔗
|
|
brayden has quit IRC (Ping timeout: 607 seconds) |
|
09:54
🔗
|
|
schbirid has joined #archiveteam-bs |
|
09:55
🔗
|
|
brayden has joined #archiveteam-bs |
|
10:01
🔗
|
|
kvieta has quit IRC (Read error: Operation timed out) |
|
10:12
🔗
|
|
kvieta has joined #archiveteam-bs |
|
11:43
🔗
|
|
primus104 has quit IRC (Leaving.) |
|
11:47
🔗
|
|
yan has joined #archiveteam-bs |
|
12:03
🔗
|
Ctrl-S |
is this good enough for you? https://github.com/woodenphone/tumblr-to-db |
|
12:03
🔗
|
Ctrl-S |
still WIP |
|
12:03
🔗
|
godane |
looks like 20100919 marshill hd video doesn't work |
|
12:04
🔗
|
joepie91 |
Ctrl-S: yes, git is good :D |
|
12:04
🔗
|
godane |
so i try to get the tv_sd_progressive version of that video |
|
12:04
🔗
|
Ctrl-S |
goal is to save tumblr blogs to a db so i can scrape remotely and retreive to my metered home connection |
|
12:04
🔗
|
Ctrl-S |
HTTrack automation just doesn't cut it |
|
12:05
🔗
|
Ctrl-S |
also HTTrack does not remember where it has been |
|
12:05
🔗
|
Ctrl-S |
or rather, it does not understand the difference between posts and the listings |
|
12:13
🔗
|
joepie91 |
Ctrl-S: shouldn't you be using WARC, though? |
|
12:14
🔗
|
Ctrl-S |
Filesize must be minimised |
|
12:14
🔗
|
Ctrl-S |
Purpose is to save the blogs, not to shove into IA |
|
12:14
🔗
|
Ctrl-S |
Most important things are the poss and the media |
|
12:15
🔗
|
joepie91 |
WARC overhead is negligible, really |
|
12:15
🔗
|
joepie91 |
WARC isn't just for IA either :) |
|
12:16
🔗
|
Ctrl-S |
basically the problem this software is supposed to address is: Tumblr makes it really easy to get a blog deleted |
|
12:16
🔗
|
joepie91 |
Ctrl-S: thing is, if you're making HTTP requests anyway, you might as well dump them into a WARC? |
|
12:16
🔗
|
joepie91 |
mm |
|
12:16
🔗
|
Ctrl-S |
I have a metered home connection |
|
12:17
🔗
|
joepie91 |
ok? |
|
12:17
🔗
|
Ctrl-S |
so unless WARC can handle lots of compression, it's not going to be suitable |
|
12:17
🔗
|
joepie91 |
wha |
|
12:17
🔗
|
Ctrl-S |
if it can, i can switch over |
|
12:17
🔗
|
joepie91 |
Ctrl-S: WARC is a storage format |
|
12:17
🔗
|
Ctrl-S |
I know |
|
12:17
🔗
|
joepie91 |
it has nothing to do with your connection |
|
12:17
🔗
|
joepie91 |
at all |
|
12:18
🔗
|
joepie91 |
it stores data that your client has *anyway* |
|
12:18
🔗
|
Ctrl-S |
I plan to run it in another country where data is cheaper |
|
12:18
🔗
|
Ctrl-S |
then pull once it's finished |
|
12:18
🔗
|
joepie91 |
Ctrl-S: what does 'data cap' have to do with WARC? you keep refering to it, but I don't see where it comes into the picture |
|
12:18
🔗
|
Ctrl-S |
to get the data from a remote machine that runs the script to my machine |
|
12:18
🔗
|
joepie91 |
...? |
|
12:19
🔗
|
joepie91 |
I still don't get it... |
|
12:19
🔗
|
Ctrl-S |
me with metered home connection <-> friend with big unmetered pipe <-> internet |
|
12:19
🔗
|
joepie91 |
yes? |
|
12:20
🔗
|
joepie91 |
again, what does this have to do with WARC? |
|
12:20
🔗
|
Ctrl-S |
If i extract data into a DB there is less data to move |
|
12:20
🔗
|
Ctrl-S |
half the HTML will be removed |
|
12:20
🔗
|
Ctrl-S |
or more |
|
12:20
🔗
|
joepie91 |
move from where to where? |
|
12:21
🔗
|
Ctrl-S |
The data follows this path: Tumblr -> Scraper machine -> my machine |
|
12:22
🔗
|
Ctrl-S |
that second link is the bottleneck |
|
12:22
🔗
|
joepie91 |
why would you need to move the WARC to your machine? |
|
12:22
🔗
|
Ctrl-S |
Can't trust remote storage |
|
12:22
🔗
|
joepie91 |
....? |
|
12:22
🔗
|
Ctrl-S |
Much better to have a HDD i can hold myself |
|
12:23
🔗
|
Ctrl-S |
unless WARC is more than HTML with metadata |
|
12:23
🔗
|
joepie91 |
Ctrl-S: I don't really understand where you're seeing a problem |
|
12:23
🔗
|
joepie91 |
you are *already* extracting the content and storing it locally |
|
12:23
🔗
|
joepie91 |
storing the WARC elsewhere doesn't make you lose anything |
|
12:23
🔗
|
joepie91 |
at best it will make you have a WARC in a remote location |
|
12:23
🔗
|
Ctrl-S |
I'm sorry, I don't understand |
|
12:23
🔗
|
snuffy |
extracting the butane from it all into a world class mma fighter how is that bullshit |
|
12:23
🔗
|
joepie91 |
at worst the WARC will be lost and you'll still have the same data as when you're not making a WARC |
|
12:23
🔗
|
Ctrl-S |
I can make it dump to warc |
|
12:24
🔗
|
Ctrl-S |
That's probably easier then using a db |
|
12:24
🔗
|
joepie91 |
Ctrl-S: I'm not saying to replace one with the other |
|
12:24
🔗
|
Ctrl-S |
it's just that I want as small a file size as possible after the download has finished |
|
12:24
🔗
|
joepie91 |
I'm saying that you can *also* dump to WARC |
|
12:24
🔗
|
snuffy |
to replace the police |
|
12:24
🔗
|
Ctrl-S |
I intend to try for both if i add warc stuff |
|
12:24
🔗
|
joepie91 |
can somebody kick that markov bot please |
|
12:25
🔗
|
Ctrl-S |
markov bot? |
|
12:25
🔗
|
joepie91 |
balrog: closure: DFJustin: ersi: Famicoman: Kenshin: SadDM: SketchCow: swebb: underscor: yipdw: sorry for the mass highlight, but we have a markov bot misbehaving (snuffy) |
|
12:25
🔗
|
joepie91 |
see above |
|
12:25
🔗
|
joepie91 |
I don't have +o |
|
12:26
🔗
|
Ctrl-S |
I'll look at libraries for WARC now |
|
12:26
🔗
|
joepie91 |
Ctrl-S: pseudo-AI bot, absorbs what people say then starts randomly outputting vaguely related-seeming sentences |
|
12:26
🔗
|
joepie91 |
can be amusing, but not in discussions... |
|
12:26
🔗
|
Ctrl-S |
you could tell that from one message? |
|
12:26
🔗
|
joepie91 |
yes |
|
12:26
🔗
|
joepie91 |
they have fairly predictable patterns |
|
12:26
🔗
|
joepie91 |
look carefully |
|
12:26
🔗
|
joepie91 |
[13:23] <joepie91> you are *already* extracting the content and storing it locally |
|
12:26
🔗
|
joepie91 |
[13:23] <snuffy> extracting the butane from it all into a world class mma fighter how is that bullshit |
|
12:26
🔗
|
Ctrl-S |
oh |
|
12:26
🔗
|
Ctrl-S |
yeah |
|
12:26
🔗
|
Ctrl-S |
i see |
|
12:26
🔗
|
joepie91 |
nonsensical sentence, valid grammar, copying an unusual word |
|
12:27
🔗
|
joepie91 |
very typical markov bot pattern :P |
|
12:27
🔗
|
joepie91 |
it uses word associations, basically |
|
12:27
🔗
|
joepie91 |
anyway |
|
12:27
🔗
|
joepie91 |
back to the topic |
|
12:27
🔗
|
Ctrl-S |
so to make a WARC, what data do I need? |
|
12:27
🔗
|
joepie91 |
Ctrl-S: extracting into a DB is fine for personal copies, but it's probably a good idea to just remotely store a copy of the WARC.. there's a python lib for it afaik |
|
12:27
🔗
|
Ctrl-S |
ATM I use mechanize for web requests |
|
12:27
🔗
|
snuffy |
my friend requests |
|
12:28
🔗
|
joepie91 |
the request headers and body (usually just headers), and the response headers and body |
|
12:28
🔗
|
joepie91 |
that's it really |
|
12:28
🔗
|
joepie91 |
warc lib should tell you the specific data needed |
|
12:28
🔗
|
joepie91 |
hopefully |
|
12:30
🔗
|
Ctrl-S |
is there an easy way to tell HTTrack to output to WARC? |
|
12:31
🔗
|
joepie91 |
httrack doesn't understand warc, as far as I am aware |
|
12:31
🔗
|
joepie91 |
that is why I recommend wget to people :P |
|
12:31
🔗
|
Ctrl-S |
windows |
|
12:31
🔗
|
joepie91 |
Ctrl-S: wget for windows is a thing |
|
12:32
🔗
|
Ctrl-S |
I think I had problems with the filename handling? |
|
12:32
🔗
|
joepie91 |
http://gnuwin32.sourceforge.net/packages/wget.htm |
|
12:32
🔗
|
joepie91 |
no idea |
|
12:33
🔗
|
Ctrl-S |
know of anythign that uses both WARC and mechanize in python? |
|
12:33
🔗
|
Ctrl-S |
example code makes everything easier |
|
12:35
🔗
|
|
rejon has joined #archiveteam-bs |
|
12:36
🔗
|
joepie91 |
Ctrl-S: no clue |
|
12:36
🔗
|
Ctrl-S |
I would honestly rather get this working than search for information on linking the warc stuff to mechanize, but once it's done i'll consider doing it |
|
12:37
🔗
|
Ctrl-S |
everything goes through a single get() function for web requests, so i suppose i coudl slip something into that afterwards |
|
12:38
🔗
|
Ctrl-S |
something that works now, perfection later |
|
12:38
🔗
|
joepie91 |
mhmm |
|
12:39
🔗
|
SketchCow |
snuffy: Destination Drigible |
|
12:40
🔗
|
SketchCow |
snuffy: Last broken maid harvey clam, bring destination forgotten grass-fed. |
|
12:40
🔗
|
|
SketchCow sets mode: +b *!*bkr@*.mindhackers.org |
|
12:40
🔗
|
|
snuffy was kicked by SketchCow (snuffy) |
|
12:40
🔗
|
Ctrl-S |
WARC doesn't need context, just URL, metadata for both directions, and the response, right? |
|
12:41
🔗
|
Ctrl-S |
if that is true, I can just change one function afterwards to set it up |
|
12:43
🔗
|
joepie91 |
Ctrl-S: also request body, but if you're only doing GET requests that doesn;t really matter |
|
12:43
🔗
|
joepie91 |
SketchCow: hehe, poisioning its word association cache? :P |
|
12:43
🔗
|
joepie91 |
also, thanks |
|
12:44
🔗
|
Ctrl-S |
the function is named get(), it takes a URL and returns the page/file |
|
12:44
🔗
|
Ctrl-S |
it hides the cookies ect from the rest of the code |
|
12:44
🔗
|
joepie91 |
yes, you'll need to capture the request headers also |
|
12:45
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
|
12:45
🔗
|
Ctrl-S |
sounds doable, one i learn how to work with the libs. |
|
12:46
🔗
|
Ctrl-S |
eurgh, I have to get the date of the post from the archive page, rather than the post itself |
|
12:47
🔗
|
Ctrl-S |
I was hoping to pass a signe numerical string |
|
12:51
🔗
|
joepie91 |
is anybody grabbing the coverage from Paris? |
|
12:51
🔗
|
Ctrl-S |
what coverage? |
|
12:52
🔗
|
joepie91 |
Ctrl-S: http://www.theguardian.com/world/live/2015/jan/07/shooting-paris-satirical-magazine-charlie-hebdo |
|
12:52
🔗
|
joepie91 |
.t |
|
12:52
🔗
|
botpie91 |
Wed, 07 Jan 2015 12:52:09 GMT |
|
12:52
🔗
|
joepie91 |
... |
|
12:52
🔗
|
joepie91 |
.title http://www.theguardian.com/world/live/2015/jan/07/shooting-paris-satirical-magazine-charlie-hebdo |
|
12:52
🔗
|
botpie91 |
joepie91: Charlie Hebdo shooting: twelve dead at Paris offices of satirical magazine – live updates | World news | The Guardian |
|
12:53
🔗
|
Ctrl-S |
do we have archives of this satirical newspaper? |
|
12:53
🔗
|
joepie91 |
I don't know, but we should |
|
12:53
🔗
|
|
ersi sets mode: +o joepie91 |
|
12:53
🔗
|
joepie91 |
ivan`: ? |
|
12:53
🔗
|
joepie91 |
what's the status on that? |
|
12:53
🔗
|
joepie91 |
ersi: thanks |
|
12:54
🔗
|
joepie91 |
uh oh |
|
12:54
🔗
|
joepie91 |
Ctrl-S: https://t.co/bHl4vKTZUg |
|
12:54
🔗
|
joepie91 |
does this load for you |
|
12:54
🔗
|
Ctrl-S |
slowly |
|
12:54
🔗
|
Ctrl-S |
blank page so far |
|
12:54
🔗
|
Ctrl-S |
connected... |
|
12:55
🔗
|
Ctrl-S |
i'm in wa.au, btw |
|
12:55
🔗
|
Ctrl-S |
perth |
|
12:55
🔗
|
Ctrl-S |
might want to ask someone in france |
|
12:55
🔗
|
Ctrl-S |
504 |
|
12:56
🔗
|
joepie91 |
:/ |
|
12:56
🔗
|
joepie91 |
yeah, it's down I think... |
|
13:03
🔗
|
midas |
joepie91: works here |
|
13:03
🔗
|
midas |
via ovh proxy |
|
13:04
🔗
|
raylee |
works here, .uk |
|
13:07
🔗
|
joepie91 |
yeah, works here now as well, but slow |
|
13:09
🔗
|
midas |
yep |
|
13:11
🔗
|
|
primus104 has joined #archiveteam-bs |
|
13:14
🔗
|
|
Ravenloft has quit IRC (Ping timeout: 370 seconds) |
|
14:06
🔗
|
Kazzy |
I can't check this right now, apparently it's a video of the shooting.. http://www.liveleak.com/view?i=bc6_1420632668 |
|
14:06
🔗
|
Kazzy |
probably nsfw/l, don't click if you don't want to. |
|
14:13
🔗
|
joepie91 |
Kazzy: contains one person shot to death :( |
|
14:14
🔗
|
Kazzy |
sigh :( |
|
14:15
🔗
|
godane |
whats the name of the magazine? |
|
14:16
🔗
|
Kazzy |
charlie hebdo |
|
14:18
🔗
|
Ctrl-S |
is someone archiving the video? |
|
14:18
🔗
|
Kazzy |
liveleak video was grabbed through archivebot |
|
14:20
🔗
|
joepie91 |
the video? or just the page? |
|
14:20
🔗
|
|
APerti has joined #archiveteam-bs |
|
14:21
🔗
|
Kazzy |
i have no idea if it grabbed the video too, if someone has stuff on hand to grab it, please do. |
|
14:22
🔗
|
joepie91 |
Kazzy: youtube-dl'ing it |
|
14:22
🔗
|
joepie91 |
looks like youtube-dl groks liveleak, so that's good |
|
14:27
🔗
|
|
sankin has joined #archiveteam-bs |
|
14:28
🔗
|
|
garyrh has quit IRC (Read error: Operation timed out) |
|
14:57
🔗
|
|
norbert79 has quit IRC (Quit: leaving) |
|
15:00
🔗
|
balrog |
chfoo: how feasible would it be for wpull to feed youtube links into youtube-dl or something like that? |
|
15:06
🔗
|
|
bauruine has joined #archiveteam-bs |
|
15:08
🔗
|
Ctrl-S |
what is wpull? |
|
15:09
🔗
|
Ctrl-S |
this is possible: https://github.com/woodenphone/Youtube-dl-runner |
|
15:09
🔗
|
joepie91 |
Ctrl-S: it;s a drop-in replacement (with some changes) for wget written in Python |
|
15:09
🔗
|
Ctrl-S |
no idea about the wpull side |
|
15:11
🔗
|
Kazzy |
Ctrl-S: https://github.com/chfoo/wpull if you're interested |
|
15:18
🔗
|
Kazzy |
if someone can grab a copy of this, please do soon.. it's liveupdating so probably not worth grabbing just yet http://www.bbc.com/news/live/world-europe-30710777 |
|
15:18
🔗
|
Ctrl-S |
Httrack with new output dir each run? |
|
15:19
🔗
|
Ctrl-S |
shell script run it at 5-10 min interval? |
|
15:21
🔗
|
Kazzy |
I'm stuck on a chromebook with 10% battery, can't do much from here :p |
|
15:24
🔗
|
Ctrl-S |
I have a linux box, you write a script to install and run the whatever it is to download the stuff, i'll run it |
|
15:25
🔗
|
Ctrl-S |
I thought that chmod -R 777 * was a good idea |
|
15:25
🔗
|
midas |
chmod -R 777 / |
|
15:25
🔗
|
Ctrl-S |
so i'm not the guy that should write it |
|
15:25
🔗
|
midas |
anddd run |
|
15:25
🔗
|
Ctrl-S |
it did help fix my problem |
|
15:25
🔗
|
Ctrl-S |
maybe |
|
15:34
🔗
|
|
mistym has joined #archiveteam-bs |
|
15:35
🔗
|
|
garyrh has joined #archiveteam-bs |
|
15:37
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
|
15:39
🔗
|
|
norbert79 has joined #archiveteam-bs |
|
15:46
🔗
|
midas |
can we grab this? https://www.youtube.com/watch?v=LeIy0zH77MM#t=1624 livestream on YT |
|
15:46
🔗
|
midas |
(dump the timemarker btw) |
|
15:51
🔗
|
|
aaaaaaaaa has joined #archiveteam-bs |
|
15:55
🔗
|
|
mistym has joined #archiveteam-bs |
|
16:16
🔗
|
|
bauruine has quit IRC (Ping timeout: 265 seconds) |
|
16:19
🔗
|
|
godane has quit IRC (Read error: Operation timed out) |
|
16:21
🔗
|
|
bauruine has joined #archiveteam-bs |
|
16:22
🔗
|
|
Start is now known as StartAway |
|
16:22
🔗
|
|
StartAway is now known as Start |
|
16:31
🔗
|
|
godane has joined #archiveteam-bs |
|
16:40
🔗
|
|
dashcloud has quit IRC (Remote host closed the connection) |
|
16:41
🔗
|
|
dashcloud has joined #archiveteam-bs |
|
16:54
🔗
|
|
rejon has quit IRC (Ping timeout: 335 seconds) |
|
16:58
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
|
17:09
🔗
|
|
Kassia19 has joined #archiveteam-bs |
|
17:10
🔗
|
|
Kassia19 has quit IRC (Read error: Connection reset by peer) |
|
17:14
🔗
|
|
mistym has joined #archiveteam-bs |
|
17:19
🔗
|
yipdw |
Ctrl-S: fyi archivebot does tumblr archiving ok |
|
17:22
🔗
|
|
rejon has joined #archiveteam-bs |
|
17:34
🔗
|
schbirid |
woot, i found a bug on github |
|
17:34
🔗
|
schbirid |
too dumb to figure out if it is a vulnerability though |
|
17:36
🔗
|
joepie91 |
schbirid: it's Ruby, I think? so yes, probably |
|
17:36
🔗
|
joepie91 |
:P |
|
17:36
🔗
|
aaaaaaaaa |
do they have a bounty program? |
|
17:37
🔗
|
schbirid |
yeah |
|
17:45
🔗
|
schbirid |
hm, seems just to escape one element too many |
|
17:45
🔗
|
schbirid |
not one too few |
|
17:59
🔗
|
|
midas1 has joined #archiveteam-bs |
|
18:12
🔗
|
|
rejon has quit IRC (Read error: Operation timed out) |
|
18:18
🔗
|
|
Coderjoe_ has joined #archiveteam-bs |
|
18:21
🔗
|
|
primus104 has quit IRC (hub.se irc.efnet.pl) |
|
18:21
🔗
|
|
schbirid has quit IRC (hub.se irc.efnet.pl) |
|
18:21
🔗
|
|
primus has quit IRC (hub.se irc.efnet.pl) |
|
18:21
🔗
|
|
Coderjoe has quit IRC (hub.se irc.efnet.pl) |
|
18:22
🔗
|
|
primus_ has joined #archiveteam-bs |
|
18:27
🔗
|
|
schbirid2 has joined #archiveteam-bs |
|
19:15
🔗
|
|
rejon has joined #archiveteam-bs |
|
19:37
🔗
|
|
rejon has quit IRC (Ping timeout: 335 seconds) |
|
19:55
🔗
|
|
Ravenloft has joined #archiveteam-bs |
|
20:12
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
|
20:36
🔗
|
|
mistym has joined #archiveteam-bs |
|
20:42
🔗
|
|
aaaaaaaaa has quit IRC (Read error: Operation timed out) |
|
21:04
🔗
|
|
mistym has quit IRC (Remote host closed the connection) |
|
21:07
🔗
|
|
aaaaaaaaa has joined #archiveteam-bs |
|
21:20
🔗
|
|
mistym has joined #archiveteam-bs |
|
21:27
🔗
|
|
bsmith093 has quit IRC (Read error: Connection reset by peer) |
|
21:34
🔗
|
|
abartov has quit IRC (Ping timeout: 258 seconds) |
|
21:39
🔗
|
|
bsmith093 has joined #archiveteam-bs |
|
21:43
🔗
|
|
yipdw has quit IRC (Quit: yipdw) |
|
21:43
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
21:43
🔗
|
|
yipdw has joined #archiveteam-bs |
|
21:45
🔗
|
|
schbirid2 has quit IRC (Quit: Leaving) |
|
21:47
🔗
|
|
dashcloud has joined #archiveteam-bs |
|
21:49
🔗
|
|
abartov has joined #archiveteam-bs |
|
21:57
🔗
|
|
sankin has quit IRC (Leaving.) |
|
22:10
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
22:11
🔗
|
chfoo |
balrog: if it works using a http proxy, it should be doable |
|
22:11
🔗
|
balrog |
chfoo: it would involve detecting a supported URL and feeding it to the program I think |
|
22:11
🔗
|
balrog |
I'm a little worried that archivebot doesn't capture youtube videos themselves |
|
22:12
🔗
|
balrog |
oh, it's in python |
|
22:13
🔗
|
yipdw |
balrog: it could be done, I'd prefer to have a working replay solution first |
|
22:13
🔗
|
balrog |
replay? |
|
22:13
🔗
|
yipdw |
that's why I pointed out that pywb-webrecorder can do it |
|
22:14
🔗
|
balrog |
doesn't archive.org already have some method of grabbing some youtube stuff? |
|
22:14
🔗
|
yipdw |
maybe, but as far as I can tell it's not documented |
|
22:14
🔗
|
balrog |
ah :/ |
|
22:14
🔗
|
yipdw |
anyway, pywb seems to have Deep Magic From Before The Dawn of Time to do this, so I keep thinking it might be interesting to use its proxy + wpull |
|
22:15
🔗
|
|
dashcloud has joined #archiveteam-bs |
|
22:15
🔗
|
yipdw |
another problem is making this not cause WARC size to blow up any more than they do in the default !a case |
|
22:22
🔗
|
balrog |
Deep Magic From Before The Dawn of Time where? |
|
22:22
🔗
|
balrog |
https://github.com/ikreymer/pywb/blob/4c08a6a06404388e673ed37a6969023712d91c18/pywb/static/vidrw.js |
|
22:22
🔗
|
balrog |
it's doing a bunch of transformation |
|
22:42
🔗
|
yipdw |
yeah |
|
22:42
🔗
|
yipdw |
also injecting flowplayer, etc. |
|
23:04
🔗
|
|
APerti has quit IRC (Read error: Operation timed out) |
|
23:13
🔗
|
|
APerti has joined #archiveteam-bs |
|
23:13
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
23:13
🔗
|
|
dashcloud has joined #archiveteam-bs |
|
23:18
🔗
|
|
BlueMaxim has joined #archiveteam-bs |
|
23:22
🔗
|
|
APerti has quit IRC (Read error: Operation timed out) |
|
23:33
🔗
|
|
abartov has quit IRC (Ping timeout: 258 seconds) |
|
23:34
🔗
|
|
Ebony27 has joined #archiveteam-bs |
|
23:35
🔗
|
|
Ebony27 has quit IRC (Read error: Connection reset by peer) |
|
23:42
🔗
|
Start |
http://techcrunch.com/2015/01/07/is-youtube-the-yahoo-of-2015/ |
|
23:58
🔗
|
joepie91 |
Even BuzzFeed knows point No. 5, and they are the intellectual toilet of the Internet. |
|
23:58
🔗
|
joepie91 |
ouch |
|
23:58
🔗
|
BlueMaxim |
*flush* |