| Time |
Nickname |
Message |
|
00:04
🔗
|
|
nightpool has quit IRC (Ping timeout: 260 seconds) |
|
00:05
🔗
|
|
espes__ has joined #archiveteam-bs |
|
00:09
🔗
|
|
nightpool has joined #archiveteam-bs |
|
00:13
🔗
|
|
DoomTay has quit IRC (Quit: Page closed) |
|
00:18
🔗
|
|
DoomTay has joined #archiveteam-bs |
|
00:34
🔗
|
godane |
so i found the San Francisco Bay Area Television Archive |
|
00:35
🔗
|
godane |
https://diva.sfsu.edu/ |
|
00:38
🔗
|
|
GLaDOS has quit IRC (Ping timeout: 260 seconds) |
|
00:40
🔗
|
DoomTay |
Looks like a job for ArchiveBot |
|
00:41
🔗
|
DoomTay |
Oh, never mind. There's stuff behind an accountwall |
|
00:46
🔗
|
DoomTay |
Also, just to make sure the word gets out real soon, here's what will likely be a REAL fun situation: http://laurapinto.tripod.com/andykim/ |
|
00:46
🔗
|
DoomTay |
Waitn no |
|
00:46
🔗
|
DoomTay |
http://blog.bioware.com/2016/07/29/concerning-our-forums/ |
|
01:10
🔗
|
godane |
SketchCo2: i will be give the CBS 1960-11-08 Election Coverage sometime next week |
|
01:10
🔗
|
godane |
its 4 hours of it on dvd |
|
01:10
🔗
|
|
SketchCo2 is now known as SketchCow |
|
01:11
🔗
|
arkiver |
godane: nice find on that archive! |
|
01:11
🔗
|
arkiver |
Are you planning on getting those videos into IA? |
|
01:11
🔗
|
arkiver |
also, I love the NASA uploads |
|
01:11
🔗
|
godane |
i upload them to FOS |
|
01:12
🔗
|
arkiver |
The videos from the archive you found? |
|
01:12
🔗
|
godane |
yes |
|
01:12
🔗
|
godane |
i upload the dvds videos i find to FOS |
|
01:12
🔗
|
arkiver |
awesome! |
|
01:13
🔗
|
godane |
i found this: http://www.cyclenews.com/cycle-news-archives/ |
|
01:13
🔗
|
godane |
but looks like it's payed wall |
|
01:14
🔗
|
arkiver |
that's really nice |
|
01:14
🔗
|
arkiver |
the paywall sucks though |
|
01:14
🔗
|
godane |
that archives magazines that go back to the 1960 |
|
01:14
🔗
|
arkiver |
well |
|
01:14
🔗
|
* |
arkiver is afk for the night |
|
01:14
🔗
|
arkiver |
keep up the awesome work godane :D |
|
01:14
🔗
|
godane |
i will |
|
01:14
🔗
|
godane |
i up to 2008-09-05 with funny or die archive |
|
01:15
🔗
|
arkiver |
for example http://magazine.cyclenews.com/i/84166-cycle-news-1972-issue-27-jul-18 doesn't seem paywalled |
|
01:15
🔗
|
godane |
ok then |
|
01:15
🔗
|
godane |
i was only looking at the 1960s ones |
|
01:15
🔗
|
arkiver |
alright |
|
01:15
🔗
|
arkiver |
I'm off anyway |
|
01:16
🔗
|
arkiver |
have a good day :) |
|
01:17
🔗
|
|
username1 has joined #archiveteam-bs |
|
01:21
🔗
|
|
schbirid2 has quit IRC (Ping timeout: 244 seconds) |
|
01:25
🔗
|
|
GLaDOS has joined #archiveteam-bs |
|
02:19
🔗
|
|
JesseW has joined #archiveteam-bs |
|
02:23
🔗
|
|
nightpool has quit IRC (Ping timeout: 260 seconds) |
|
03:31
🔗
|
|
nightpool has joined #archiveteam-bs |
|
03:36
🔗
|
|
nightpool has quit IRC (Ping timeout: 250 seconds) |
|
03:48
🔗
|
|
DoomTay has quit IRC (Quit: Page closed) |
|
03:51
🔗
|
|
GLaDOS has quit IRC (Ping timeout: 260 seconds) |
|
04:14
🔗
|
|
DoomTay has joined #archiveteam-bs |
|
04:26
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
04:29
🔗
|
|
dashcloud has joined #archiveteam-bs |
|
04:39
🔗
|
|
Sk1d has quit IRC (Ping timeout: 194 seconds) |
|
04:45
🔗
|
|
Sk1d has joined #archiveteam-bs |
|
04:50
🔗
|
|
DoomTay has quit IRC (Quit: Page closed) |
|
05:01
🔗
|
|
GLaDOS has joined #archiveteam-bs |
|
05:27
🔗
|
|
dashcloud has quit IRC (Read error: Connection reset by peer) |
|
05:28
🔗
|
|
dashcloud has joined #archiveteam-bs |
|
05:39
🔗
|
|
robink has quit IRC (Ping timeout: 246 seconds) |
|
05:43
🔗
|
|
robink has joined #archiveteam-bs |
|
05:52
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
06:00
🔗
|
|
dashcloud has joined #archiveteam-bs |
|
06:14
🔗
|
godane |
i have 787k items now |
|
06:14
🔗
|
godane |
more like 788k if you include my godanefunnyordie account |
|
06:32
🔗
|
|
zgrant has left |
|
06:40
🔗
|
|
Honno has joined #archiveteam-bs |
|
06:43
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
06:47
🔗
|
|
dashcloud has joined #archiveteam-bs |
|
06:54
🔗
|
hook54321 |
godane: what would happen if we tried to put it through archivebot? |
|
07:03
🔗
|
|
Honno has quit IRC (Ping timeout: 1208 seconds) |
|
07:21
🔗
|
godane |
hook54321: what site are you talking about? |
|
07:39
🔗
|
SketchCow |
For anyone who gives a total shit, I have been gearing a side project to turn the Apple II emulated software collection on the Internet Archive into a world-class collection |
|
07:39
🔗
|
SketchCow |
Currently, I'm doing a sweep of all redundant items. It's taking a while, because of the 10,000, there's 1,000 or so dupes. |
|
07:39
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
07:40
🔗
|
|
dashcloud has joined #archiveteam-bs |
|
07:40
🔗
|
* |
JesseW isn't particularly interested in Apple II's, but is always pleased by pretty metadata |
|
07:41
🔗
|
SketchCow |
Once the redundants are removed, I will drill against the remaining collection and metadata it beyond belief. |
|
07:43
🔗
|
godane |
we need to find old scans of Scholastic Arrow Book Club News Paper: https://www.flickr.com/photos/annainca/5659359459/in/album-72157626587471194/ |
|
07:43
🔗
|
|
JesseW has quit IRC (Remote host closed the connection) |
|
07:44
🔗
|
godane |
something i loved took like at when i was in grade 1 to 5 |
|
07:44
🔗
|
|
Sanqui is now known as SanquiGON |
|
07:44
🔗
|
|
SanquiGON is now known as SanquiAFK |
|
07:49
🔗
|
SketchCow |
After that, I'll end up adding more stuff, but everything gets scanned and only new things are added |
|
08:00
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
08:05
🔗
|
|
dashcloud has joined #archiveteam-bs |
|
08:21
🔗
|
HCross |
Anyone else having issues with livestream.com videos? |
|
08:24
🔗
|
midas |
what kind of issues HCross ? |
|
08:25
🔗
|
midas |
my first issue is that i need flash. |
|
08:26
🔗
|
HCross |
I hit play.. and nothing happens. youtube-dl also gets a content too short error |
|
08:27
🔗
|
HCross |
Trying to watch Jason's HOPE talk |
|
08:27
🔗
|
midas |
link? |
|
08:27
🔗
|
HCross |
http://livestream.com/internetsociety/hopeconf/videos/130749038 |
|
08:28
🔗
|
midas |
seems broken indeed |
|
08:28
🔗
|
midas |
firefox says it's processing - please come back later. |
|
08:29
🔗
|
midas |
the lockpicking one works |
|
08:30
🔗
|
midas |
so yeah, seems broken on their side |
|
08:33
🔗
|
HCross |
SketchCow, do you have another copy of the video please? |
|
08:35
🔗
|
|
mksplg has quit IRC (Quit: WeeChat 0.4.2) |
|
08:37
🔗
|
|
Yoshimura has joined #archiveteam-bs |
|
08:51
🔗
|
username1 |
iirc hope recordings get properly published at some point |
|
08:53
🔗
|
Yoshimura |
Hoping is useless. |
|
08:53
🔗
|
xmc |
hope is a conference |
|
08:55
🔗
|
Yoshimura |
Oh yeah, that one. |
|
08:55
🔗
|
Yoshimura |
Efnet does not +q? Instead of answering why is it offtopic using ban hammer. I am no longer surprised here. It is a hobby not a serious thing. |
|
08:56
🔗
|
xmc |
ugh, stop complaining. you're disruptive and irritating. |
|
08:57
🔗
|
xmc |
you have accomplished nothing with archiveteam, just complaining that we are doing things wrong |
|
08:57
🔗
|
xmc |
that is why you are not welcome here |
|
08:59
🔗
|
|
Yoshimura was kicked by xmc (out) |
|
09:08
🔗
|
|
Yoshimura has joined #archiveteam-bs |
|
09:08
🔗
|
Yoshimura |
Raising a concern and a question is not complaining. |
|
09:08
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
09:09
🔗
|
|
xmc sets mode: +b *!4f8dff3d@ag-255-61.sta.ji.cz |
|
09:09
🔗
|
|
Yoshimura was kicked by xmc (Yoshimura) |
|
09:12
🔗
|
|
dashcloud has joined #archiveteam-bs |
|
09:16
🔗
|
midas |
why do the idiots always find these channels :p |
|
09:17
🔗
|
username1 |
i could suggest a reason but i would probably get banned as well |
|
09:17
🔗
|
username1 |
gah, xchat... |
|
09:17
🔗
|
|
username1 is now known as schbirid |
|
09:17
🔗
|
|
fie has joined #archiveteam-bs |
|
09:18
🔗
|
midas |
lol :p |
|
09:18
🔗
|
xmc |
ah, it's you :P |
|
09:18
🔗
|
xmc |
you have a history of not being a useless twat who gets in the way |
|
09:21
🔗
|
schbirid |
oh i get in my own way alright! |
|
09:22
🔗
|
|
wp494 has quit IRC (Quit: LOUD UNNECESSARY QUIT MESSAGES) |
|
09:46
🔗
|
SmileyG |
\o/ |
|
10:07
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
10:10
🔗
|
|
dashcloud has joined #archiveteam-bs |
|
10:29
🔗
|
godane |
so in theory i'm at 2008-09-10 with funny or die videos |
|
10:42
🔗
|
|
mismatch has joined #archiveteam-bs |
|
10:44
🔗
|
mismatch |
would it be possible to download/backup ~200,000 ISP hosting sites already in the Wayback Machine to a warc? |
|
10:45
🔗
|
mismatch |
robots.txt keeps changing meaning they're sometimes inaccessible. |
|
10:45
🔗
|
mismatch |
I have a txt file with all the sites listed |
|
10:58
🔗
|
mismatch |
I guess my question should be, can archivebot download a list of websites from a txt file? |
|
11:00
🔗
|
HCross |
Is it just myVIP that is underway at the moment, or is there anything else that needs help? |
|
11:04
🔗
|
midas |
mismatch: yes, it can. but only in ao mode |
|
11:05
🔗
|
HCross |
but 200k individual sites might be a bit much |
|
11:07
🔗
|
mismatch |
midas: thanks. HCross: that's fair enough, I'll maybe try with just 50 to start with and see how it performs |
|
11:12
🔗
|
mismatch |
in !a/recursive mode, if a site links to an external url such as google.com - is that also downloaded? |
|
11:15
🔗
|
schbirid |
god https://www.youtube.com/watch?v=UqVYWP4wk3I "6 Years of Hard Work Erased in 5 Clicks" |
|
11:22
🔗
|
midas |
that's just wow... |
|
11:24
🔗
|
mismatch |
^ it'll be interesting to see if YouTube solve this |
|
11:25
🔗
|
midas |
if they even will try |
|
11:26
🔗
|
luckcolor |
Hcross i remeber it worked with a too |
|
11:26
🔗
|
luckcolor |
mismatch |
|
11:26
🔗
|
luckcolor |
join #archivebot |
|
11:27
🔗
|
mismatch |
luckcolor: already there :) - thanks though, I need to experiment a bit |
|
11:29
🔗
|
midas |
in theory it mightjust work with 200k urls |
|
11:29
🔗
|
midas |
i think ao makes one per url (warc) |
|
11:29
🔗
|
midas |
also internet is slow |
|
11:30
🔗
|
midas |
and sketcy |
|
11:31
🔗
|
luckcolor |
mismatch: if you need voiced or you want to schedule the job feel free to message |
|
11:31
🔗
|
mismatch |
true, all the urls are web.archive.org/web/[date]/[site] which I'm guessing might also cause recursion issues for archivebot |
|
11:31
🔗
|
mismatch |
luckcolor: <3 |
|
11:36
🔗
|
|
BartoCH has quit IRC (Ping timeout: 260 seconds) |
|
11:36
🔗
|
|
BartoCH has joined #archiveteam-bs |
|
11:45
🔗
|
|
joepie91 has quit IRC (Read error: Operation timed out) |
|
11:46
🔗
|
|
botpie91 has quit IRC (Read error: Operation timed out) |
|
11:48
🔗
|
|
arkiver has quit IRC (Ping timeout: 370 seconds) |
|
12:11
🔗
|
|
joepie91 has joined #archiveteam-bs |
|
12:11
🔗
|
|
arkiver has joined #archiveteam-bs |
|
12:19
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
|
12:24
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
12:31
🔗
|
|
dashcloud has joined #archiveteam-bs |
|
12:59
🔗
|
|
metalcamp has joined #archiveteam-bs |
|
13:18
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
|
13:30
🔗
|
|
bzc6p has joined #archiveteam-bs |
|
13:30
🔗
|
|
swebb sets mode: +o bzc6p |
|
13:33
🔗
|
|
bzc6p sets mode: +oooo arkiver Atluxity chfoo closure |
|
13:34
🔗
|
|
bzc6p sets mode: +oooo Coderjoe dashcloud FalconK Fletcher |
|
13:34
🔗
|
|
bzc6p sets mode: +oooo GLaDOS godane HCross HCross2 |
|
13:34
🔗
|
|
bzc6p sets mode: +oooo joepie91 JW_work1 Kaz Kenshin |
|
13:34
🔗
|
|
bzc6p sets mode: +oooo luckcolor midas PurpleSym Start |
|
13:34
🔗
|
|
bzc6p sets mode: +o yipdw |
|
13:40
🔗
|
|
bzc6p has left |
|
13:44
🔗
|
|
fie_ has joined #archiveteam-bs |
|
13:44
🔗
|
|
fie has quit IRC (Read error: Connection reset by peer) |
|
13:51
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
13:55
🔗
|
|
dashcloud has joined #archiveteam-bs |
|
14:11
🔗
|
|
nightpool has joined #archiveteam-bs |
|
14:15
🔗
|
|
DoomTay has joined #archiveteam-bs |
|
14:20
🔗
|
|
zino has quit IRC (Remote host closed the connection) |
|
14:23
🔗
|
|
bzc6p has joined #archiveteam-bs |
|
14:23
🔗
|
|
swebb sets mode: +o bzc6p |
|
14:24
🔗
|
bzc6p |
DoomTay: livejournal discovery items are usually 1,299 bytes in size, so it's fine. |
|
14:24
🔗
|
|
BartoCH has quit IRC (Ping timeout: 260 seconds) |
|
14:24
🔗
|
DoomTay |
Huh. |
|
14:24
🔗
|
DoomTay |
And I think I" can guess why items usually arn't in KB |
|
14:25
🔗
|
DoomTay |
I mean DISPLAYED in KB |
|
14:25
🔗
|
bzc6p |
Feel free to improve the software. |
|
14:27
🔗
|
|
BartoCH has joined #archiveteam-bs |
|
14:32
🔗
|
|
nightpool has quit IRC (Read error: Operation timed out) |
|
14:41
🔗
|
|
Honno has joined #archiveteam-bs |
|
14:53
🔗
|
|
bzc6p has left |
|
14:54
🔗
|
|
SanquiAFK has quit IRC (Ping timeout: 260 seconds) |
|
14:55
🔗
|
|
nightpool has joined #archiveteam-bs |
|
15:07
🔗
|
|
MrRadar has quit IRC (Quit: Restarting) |
|
15:08
🔗
|
|
Honno has quit IRC (Ping timeout: 1208 seconds) |
|
15:34
🔗
|
Frogging |
lol Yoshimura was back I see |
|
15:35
🔗
|
Frogging |
that was amusing |
|
15:35
🔗
|
|
Coderjoe has quit IRC (Read error: Operation timed out) |
|
15:39
🔗
|
|
Coderjoe has joined #archiveteam-bs |
|
15:52
🔗
|
|
Rye has quit IRC (Ping timeout: 244 seconds) |
|
15:53
🔗
|
|
DoomTay has quit IRC (Quit: Page closed) |
|
15:54
🔗
|
|
MrRadar has joined #archiveteam-bs |
|
15:54
🔗
|
|
Rye has joined #archiveteam-bs |
|
16:01
🔗
|
|
Sanqui has joined #archiveteam-bs |
|
16:10
🔗
|
|
kristian_ has joined #archiveteam-bs |
|
16:43
🔗
|
|
ndiddy has quit IRC (Read error: Connection reset by peer) |
|
16:44
🔗
|
|
ndiddy has joined #archiveteam-bs |
|
17:15
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
|
17:18
🔗
|
|
dashcloud has joined #archiveteam-bs |
|
19:02
🔗
|
|
Coderjoe has quit IRC (Ping timeout: 260 seconds) |
|
19:03
🔗
|
|
Coderjoe has joined #archiveteam-bs |
|
19:04
🔗
|
|
BartoCH has quit IRC (Ping timeout: 260 seconds) |
|
19:09
🔗
|
|
BartoCH has joined #archiveteam-bs |
|
19:18
🔗
|
|
tomwsmf has joined #archiveteam-bs |
|
19:27
🔗
|
|
JesseW has joined #archiveteam-bs |
|
19:28
🔗
|
JesseW |
I just noticed that, in the year (!) since I put together a pile of metadata about sourceforge projects, six people have forked the repo I put it in. None have *done* anything with the forks -- they merely made them. One only forked one other repo. |
|
19:28
🔗
|
JesseW |
The world is strange. |
|
19:30
🔗
|
|
kristian_ has quit IRC (Leaving) |
|
19:41
🔗
|
JesseW |
joepie91: regarding newww, I found a copy of the code here: https://github.com/rafaeljesus/newww (albeit not particularly up to date) |
|
19:42
🔗
|
JesseW |
exactly *one* of the issues from the repo was saved in the wayback machine: https://web.archive.org/web/20150224214155/https://github.com/npm/newww/issues/190 |
|
19:48
🔗
|
hook54321 |
godane: I was talking about the cyclenews site |
|
19:50
🔗
|
|
BartoCH has quit IRC (Ping timeout: 260 seconds) |
|
19:51
🔗
|
JesseW |
joepie91: https://github.com/npm/www/issues/9 -- I just asked them to make the old issues available. We'll see what the response might be. |
|
19:52
🔗
|
JesseW |
I'm not sure whether having you support that request would be a positive or a negative. :-) |
|
19:54
🔗
|
JesseW |
It would probably be good to grab copies of the issues from all the other npm repos now, just in case. |
|
19:56
🔗
|
hook54321 |
Did you check Google cache for any of the old issues? |
|
19:56
🔗
|
JesseW |
hook54321: good idea. Please do so, and dump them into archive.is if you find any |
|
19:57
🔗
|
hook54321 |
There's a script that I think does stuff like that automatically, I still haven't gotten it work though. Do you know the URL of the old repo? |
|
19:57
🔗
|
JesseW |
https://github.com/npm/newww |
|
19:58
🔗
|
JesseW |
there's an even older repo, https://github.com/npm/npm-www with over 900 issues, that it would be good to grab |
|
20:02
🔗
|
|
BartoCH has joined #archiveteam-bs |
|
20:07
🔗
|
hook54321 |
Have we put any of the second one through archivebot yet? |
|
20:08
🔗
|
Frogging |
you don't put GitHub repos in ArchiveBot, by the way |
|
20:13
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
|
20:13
🔗
|
hook54321 |
What if we just put in the issues page? |
|
20:14
🔗
|
hook54321 |
And used lots of ignores |
|
20:15
🔗
|
Frogging |
oh, issues |
|
20:16
🔗
|
Frogging |
yeah, probably |
|
20:17
🔗
|
Frogging |
there's also this. https://github.com/joeyh/github-backup |
|
20:17
🔗
|
Frogging |
that's what yipdw recommended be used for github stuff |
|
20:18
🔗
|
yipdw |
it produces more usable results, yes |
|
20:18
🔗
|
yipdw |
a WARC copy of a github repo is IMO pointless |
|
20:18
🔗
|
Frogging |
maybe for issues only it would be less useless? |
|
20:18
🔗
|
yipdw |
it's pointless |
|
20:18
🔗
|
yipdw |
if you want the issues github-backup gets it too |
|
20:18
🔗
|
Frogging |
kk |
|
20:20
🔗
|
yipdw |
a derivation of the issue data into static HTML probably has merit but the github interface has so many links that an automated crawl is a mess |
|
20:21
🔗
|
yipdw |
!ao tends to be okay if you're trying to be a jerk |
|
20:25
🔗
|
hook54321 |
My main concern is that their should be some sort of central place where people can browse and upload backups of repositories. |
|
20:25
🔗
|
yipdw |
people usually do that on github yes |
|
20:26
🔗
|
yipdw |
google code for example |
|
20:26
🔗
|
yipdw |
GitLab has a Github import function which works reasonably well, also |
|
20:27
🔗
|
yipdw |
I do wonder however someone will fund this central place |
|
20:28
🔗
|
hook54321 |
I mean we could upload backups of github repos to the Internet archive, but it would still be useless until the user downloaded the whole archive, which could potentially be huge. |
|
20:29
🔗
|
yipdw |
yep |
|
20:29
🔗
|
yipdw |
or you do a shallow clone |
|
20:31
🔗
|
yipdw |
if you're interested in saving copies of git repositories and their associated data, I think self-hosted gitlab is a good choice |
|
20:31
🔗
|
yipdw |
I run an instance that does more or less that |
|
20:31
🔗
|
yipdw |
then I have my backups of dependent libraries and I only need to worry about keeping my instance healthy |
|
20:34
🔗
|
dashcloud |
for github backups, codearchive.org is doing a backup of every repo with 10 stars or more (and less than 250 MB in size, unless whitelisted) and every change |
|
20:34
🔗
|
yipdw |
oh, actually, correction |
|
20:35
🔗
|
yipdw |
github->gitlab import requires your github OAuth tokens and so you can only do that backup for repositories you're authorized on |
|
20:35
🔗
|
yipdw |
so, yes, if you're a project member it's a good choice :P |
|
20:35
🔗
|
dashcloud |
here's the project site: https://the-code-archive.launchrock.com/ |
|
20:36
🔗
|
yipdw |
wow |
|
20:36
🔗
|
yipdw |
nice |
|
20:37
🔗
|
yipdw |
oh cool, Filippo Valsorda is involved in that |
|
20:37
🔗
|
yipdw |
(he also wrote the Warrior Dockerfile) |
|
20:38
🔗
|
hook54321 |
Interesting. About what percentage of repos are 10 stars or more and less than 250 MB? |
|
20:39
🔗
|
yipdw |
https://github.com/search?utf8=%E2%9C%93&q=stars%3A%22%3E+10%22+size%3A%22%3C%3D+262144000%22&type=Repositories&ref=advsearch&l=&l= |
|
20:40
🔗
|
yipdw |
there's 3,466,477 with at least one star |
|
20:41
🔗
|
yipdw |
oh, oops, that's supposed to be size in kB |
|
20:41
🔗
|
yipdw |
https://github.com/search?utf8=%E2%9C%93&q=stars%3A%22%3E+10%22+size%3A%22%3C%3D+262144%22&type=Repositories&ref=searchresults is fixed |
|
20:42
🔗
|
yipdw |
and it should be >= but no matter what you're looking at 10-11% |
|
20:42
🔗
|
|
robink has quit IRC (Ping timeout: 501 seconds) |
|
20:43
🔗
|
yipdw |
which rounds pretty nicely with Sturgeon's Law |
|
20:49
🔗
|
hook54321 |
Not to mention, there are probably tons of repositories that are just empty. |
|
20:49
🔗
|
dashcloud |
in case you're wondering, the repo itself is backed up because it got 10 stars by the end of the talk (which is here: http://livestream.com/internetsociety2/hopeconf/videos/130613964) |
|
20:50
🔗
|
hook54321 |
Which repo? |
|
21:02
🔗
|
|
Honno has joined #archiveteam-bs |
|
21:14
🔗
|
|
robink has joined #archiveteam-bs |
|
21:18
🔗
|
|
Honno has quit IRC (Read error: Operation timed out) |
|
21:34
🔗
|
|
DoomTay has joined #archiveteam-bs |
|
21:52
🔗
|
|
whydomain has joined #archiveteam-bs |
|
21:59
🔗
|
whydomain |
Can anyone recommend a (preferably python) browser emulator? Basically all I want to do is navigate to a web page with a cookie preloaded, and then click on two div elements. |
|
22:13
🔗
|
MrRadar |
whydomain: PhantomJS |
|
22:13
🔗
|
MrRadar |
It's basically a scriptable headless version of Chromium |
|
22:15
🔗
|
|
DoomTay_ has joined #archiveteam-bs |
|
22:15
🔗
|
whydomain |
Are there any examples of usage for loading in pages an manipulating elements? I can't find any. |
|
22:16
🔗
|
whydomain |
^and manipulating |
|
22:16
🔗
|
|
DoomTay has quit IRC (Ping timeout: 268 seconds) |
|
22:16
🔗
|
MrRadar |
This is apparently an example for injecting JS into a page: https://github.com/ariya/phantomjs/blob/master/examples/injectme.js |
|
22:18
🔗
|
MrRadar |
It looks like this talks about manipulating the DOM: http://phantomjs.org/page-automation.html |
|
22:19
🔗
|
MrRadar |
I've never use PhantomJS myself so I'm not too knowledgable about it |
|
22:22
🔗
|
whydomain |
Ah, thanks. That was just the example I was looking for. |
|
22:59
🔗
|
|
JesseW has joined #archiveteam-bs |
|
23:10
🔗
|
|
ItsYoda has quit IRC (Ping timeout: 260 seconds) |
|
23:12
🔗
|
hook54321 |
is archivebot kept more up to date than grab-site? |
|
23:13
🔗
|
JesseW |
ha, I doubt it. |
|
23:13
🔗
|
JesseW |
hook54321: what do you mean by "up to date"? |
|
23:14
🔗
|
hook54321 |
Developed more I guess? |
|
23:15
🔗
|
JesseW |
archivebot isn't developed at all currently. Effort is explicitly directed to grab-site (or the wrapper around it, I can't remember) instead. |
|
23:15
🔗
|
JesseW |
(I may be mistaken about this -- if so, yipdw should be able to correct it) |
|
23:19
🔗
|
|
ItsYoda has joined #archiveteam-bs |
|
23:43
🔗
|
|
Coderjoe has quit IRC (Read error: Operation timed out) |
|
23:45
🔗
|
|
DoomTay_ is now known as DoomTay |
|
23:50
🔗
|
|
Coderjoe has joined #archiveteam-bs |
|
23:54
🔗
|
hook54321 |
Think grab-site would run on a computer with an Atom N270 processor? |
|
23:56
🔗
|
MrRadar |
I don't see why it wouldn't |
|
23:56
🔗
|
MrRadar |
I've run wpull (the core component of grab-site/ArchiveBot that actually does the scraping) on a 1st gen Raspbery Pi |
|
23:56
🔗
|
MrRadar |
Which is almost certainly slower |
|
23:56
🔗
|
|
wp494 has joined #archiveteam-bs |
|
23:57
🔗
|
JesseW |
ah, right -- it's wpull that's the kernel, and grab-site/Archivebot that are the wrappers |