Time |
Nickname |
Message |
00:04
🔗
|
|
nightpool has quit IRC (Ping timeout: 260 seconds) |
00:05
🔗
|
|
espes__ has joined #archiveteam-bs |
00:09
🔗
|
|
nightpool has joined #archiveteam-bs |
00:13
🔗
|
|
DoomTay has quit IRC (Quit: Page closed) |
00:18
🔗
|
|
DoomTay has joined #archiveteam-bs |
00:34
🔗
|
godane |
so i found the San Francisco Bay Area Television Archive |
00:35
🔗
|
godane |
https://diva.sfsu.edu/ |
00:38
🔗
|
|
GLaDOS has quit IRC (Ping timeout: 260 seconds) |
00:40
🔗
|
DoomTay |
Looks like a job for ArchiveBot |
00:41
🔗
|
DoomTay |
Oh, never mind. There's stuff behind an accountwall |
00:46
🔗
|
DoomTay |
Also, just to make sure the word gets out real soon, here's what will likely be a REAL fun situation: http://laurapinto.tripod.com/andykim/ |
00:46
🔗
|
DoomTay |
Waitn no |
00:46
🔗
|
DoomTay |
http://blog.bioware.com/2016/07/29/concerning-our-forums/ |
01:10
🔗
|
godane |
SketchCo2: i will be give the CBS 1960-11-08 Election Coverage sometime next week |
01:10
🔗
|
godane |
its 4 hours of it on dvd |
01:10
🔗
|
|
SketchCo2 is now known as SketchCow |
01:11
🔗
|
arkiver |
godane: nice find on that archive! |
01:11
🔗
|
arkiver |
Are you planning on getting those videos into IA? |
01:11
🔗
|
arkiver |
also, I love the NASA uploads |
01:11
🔗
|
godane |
i upload them to FOS |
01:12
🔗
|
arkiver |
The videos from the archive you found? |
01:12
🔗
|
godane |
yes |
01:12
🔗
|
godane |
i upload the dvds videos i find to FOS |
01:12
🔗
|
arkiver |
awesome! |
01:13
🔗
|
godane |
i found this: http://www.cyclenews.com/cycle-news-archives/ |
01:13
🔗
|
godane |
but looks like it's payed wall |
01:14
🔗
|
arkiver |
that's really nice |
01:14
🔗
|
arkiver |
the paywall sucks though |
01:14
🔗
|
godane |
that archives magazines that go back to the 1960 |
01:14
🔗
|
arkiver |
well |
01:14
🔗
|
* |
arkiver is afk for the night |
01:14
🔗
|
arkiver |
keep up the awesome work godane :D |
01:14
🔗
|
godane |
i will |
01:14
🔗
|
godane |
i up to 2008-09-05 with funny or die archive |
01:15
🔗
|
arkiver |
for example http://magazine.cyclenews.com/i/84166-cycle-news-1972-issue-27-jul-18 doesn't seem paywalled |
01:15
🔗
|
godane |
ok then |
01:15
🔗
|
godane |
i was only looking at the 1960s ones |
01:15
🔗
|
arkiver |
alright |
01:15
🔗
|
arkiver |
I'm off anyway |
01:16
🔗
|
arkiver |
have a good day :) |
01:17
🔗
|
|
username1 has joined #archiveteam-bs |
01:21
🔗
|
|
schbirid2 has quit IRC (Ping timeout: 244 seconds) |
01:25
🔗
|
|
GLaDOS has joined #archiveteam-bs |
02:19
🔗
|
|
JesseW has joined #archiveteam-bs |
02:23
🔗
|
|
nightpool has quit IRC (Ping timeout: 260 seconds) |
03:31
🔗
|
|
nightpool has joined #archiveteam-bs |
03:36
🔗
|
|
nightpool has quit IRC (Ping timeout: 250 seconds) |
03:48
🔗
|
|
DoomTay has quit IRC (Quit: Page closed) |
03:51
🔗
|
|
GLaDOS has quit IRC (Ping timeout: 260 seconds) |
04:14
🔗
|
|
DoomTay has joined #archiveteam-bs |
04:26
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
04:29
🔗
|
|
dashcloud has joined #archiveteam-bs |
04:39
🔗
|
|
Sk1d has quit IRC (Ping timeout: 194 seconds) |
04:45
🔗
|
|
Sk1d has joined #archiveteam-bs |
04:50
🔗
|
|
DoomTay has quit IRC (Quit: Page closed) |
05:01
🔗
|
|
GLaDOS has joined #archiveteam-bs |
05:27
🔗
|
|
dashcloud has quit IRC (Read error: Connection reset by peer) |
05:28
🔗
|
|
dashcloud has joined #archiveteam-bs |
05:39
🔗
|
|
robink has quit IRC (Ping timeout: 246 seconds) |
05:43
🔗
|
|
robink has joined #archiveteam-bs |
05:52
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
06:00
🔗
|
|
dashcloud has joined #archiveteam-bs |
06:14
🔗
|
godane |
i have 787k items now |
06:14
🔗
|
godane |
more like 788k if you include my godanefunnyordie account |
06:32
🔗
|
|
zgrant has left |
06:40
🔗
|
|
Honno has joined #archiveteam-bs |
06:43
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
06:47
🔗
|
|
dashcloud has joined #archiveteam-bs |
06:54
🔗
|
hook54321 |
godane: what would happen if we tried to put it through archivebot? |
07:03
🔗
|
|
Honno has quit IRC (Ping timeout: 1208 seconds) |
07:21
🔗
|
godane |
hook54321: what site are you talking about? |
07:39
🔗
|
SketchCow |
For anyone who gives a total shit, I have been gearing a side project to turn the Apple II emulated software collection on the Internet Archive into a world-class collection |
07:39
🔗
|
SketchCow |
Currently, I'm doing a sweep of all redundant items. It's taking a while, because of the 10,000, there's 1,000 or so dupes. |
07:39
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
07:40
🔗
|
|
dashcloud has joined #archiveteam-bs |
07:40
🔗
|
* |
JesseW isn't particularly interested in Apple II's, but is always pleased by pretty metadata |
07:41
🔗
|
SketchCow |
Once the redundants are removed, I will drill against the remaining collection and metadata it beyond belief. |
07:43
🔗
|
godane |
we need to find old scans of Scholastic Arrow Book Club News Paper: https://www.flickr.com/photos/annainca/5659359459/in/album-72157626587471194/ |
07:43
🔗
|
|
JesseW has quit IRC (Remote host closed the connection) |
07:44
🔗
|
godane |
something i loved took like at when i was in grade 1 to 5 |
07:44
🔗
|
|
Sanqui is now known as SanquiGON |
07:44
🔗
|
|
SanquiGON is now known as SanquiAFK |
07:49
🔗
|
SketchCow |
After that, I'll end up adding more stuff, but everything gets scanned and only new things are added |
08:00
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
08:05
🔗
|
|
dashcloud has joined #archiveteam-bs |
08:21
🔗
|
HCross |
Anyone else having issues with livestream.com videos? |
08:24
🔗
|
midas |
what kind of issues HCross ? |
08:25
🔗
|
midas |
my first issue is that i need flash. |
08:26
🔗
|
HCross |
I hit play.. and nothing happens. youtube-dl also gets a content too short error |
08:27
🔗
|
HCross |
Trying to watch Jason's HOPE talk |
08:27
🔗
|
midas |
link? |
08:27
🔗
|
HCross |
http://livestream.com/internetsociety/hopeconf/videos/130749038 |
08:28
🔗
|
midas |
seems broken indeed |
08:28
🔗
|
midas |
firefox says it's processing - please come back later. |
08:29
🔗
|
midas |
the lockpicking one works |
08:30
🔗
|
midas |
so yeah, seems broken on their side |
08:33
🔗
|
HCross |
SketchCow, do you have another copy of the video please? |
08:35
🔗
|
|
mksplg has quit IRC (Quit: WeeChat 0.4.2) |
08:37
🔗
|
|
Yoshimura has joined #archiveteam-bs |
08:51
🔗
|
username1 |
iirc hope recordings get properly published at some point |
08:53
🔗
|
Yoshimura |
Hoping is useless. |
08:53
🔗
|
xmc |
hope is a conference |
08:55
🔗
|
Yoshimura |
Oh yeah, that one. |
08:55
🔗
|
Yoshimura |
Efnet does not +q? Instead of answering why is it offtopic using ban hammer. I am no longer surprised here. It is a hobby not a serious thing. |
08:56
🔗
|
xmc |
ugh, stop complaining. you're disruptive and irritating. |
08:57
🔗
|
xmc |
you have accomplished nothing with archiveteam, just complaining that we are doing things wrong |
08:57
🔗
|
xmc |
that is why you are not welcome here |
08:59
🔗
|
|
Yoshimura was kicked by xmc (out) |
09:08
🔗
|
|
Yoshimura has joined #archiveteam-bs |
09:08
🔗
|
Yoshimura |
Raising a concern and a question is not complaining. |
09:08
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
09:09
🔗
|
|
xmc sets mode: +b *!4f8dff3d@ag-255-61.sta.ji.cz |
09:09
🔗
|
|
Yoshimura was kicked by xmc (Yoshimura) |
09:12
🔗
|
|
dashcloud has joined #archiveteam-bs |
09:16
🔗
|
midas |
why do the idiots always find these channels :p |
09:17
🔗
|
username1 |
i could suggest a reason but i would probably get banned as well |
09:17
🔗
|
username1 |
gah, xchat... |
09:17
🔗
|
|
username1 is now known as schbirid |
09:17
🔗
|
|
fie has joined #archiveteam-bs |
09:18
🔗
|
midas |
lol :p |
09:18
🔗
|
xmc |
ah, it's you :P |
09:18
🔗
|
xmc |
you have a history of not being a useless twat who gets in the way |
09:21
🔗
|
schbirid |
oh i get in my own way alright! |
09:22
🔗
|
|
wp494 has quit IRC (Quit: LOUD UNNECESSARY QUIT MESSAGES) |
09:46
🔗
|
SmileyG |
\o/ |
10:07
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
10:10
🔗
|
|
dashcloud has joined #archiveteam-bs |
10:29
🔗
|
godane |
so in theory i'm at 2008-09-10 with funny or die videos |
10:42
🔗
|
|
mismatch has joined #archiveteam-bs |
10:44
🔗
|
mismatch |
would it be possible to download/backup ~200,000 ISP hosting sites already in the Wayback Machine to a warc? |
10:45
🔗
|
mismatch |
robots.txt keeps changing meaning they're sometimes inaccessible. |
10:45
🔗
|
mismatch |
I have a txt file with all the sites listed |
10:58
🔗
|
mismatch |
I guess my question should be, can archivebot download a list of websites from a txt file? |
11:00
🔗
|
HCross |
Is it just myVIP that is underway at the moment, or is there anything else that needs help? |
11:04
🔗
|
midas |
mismatch: yes, it can. but only in ao mode |
11:05
🔗
|
HCross |
but 200k individual sites might be a bit much |
11:07
🔗
|
mismatch |
midas: thanks. HCross: that's fair enough, I'll maybe try with just 50 to start with and see how it performs |
11:12
🔗
|
mismatch |
in !a/recursive mode, if a site links to an external url such as google.com - is that also downloaded? |
11:15
🔗
|
schbirid |
god https://www.youtube.com/watch?v=UqVYWP4wk3I "6 Years of Hard Work Erased in 5 Clicks" |
11:22
🔗
|
midas |
that's just wow... |
11:24
🔗
|
mismatch |
^ it'll be interesting to see if YouTube solve this |
11:25
🔗
|
midas |
if they even will try |
11:26
🔗
|
luckcolor |
Hcross i remeber it worked with a too |
11:26
🔗
|
luckcolor |
mismatch |
11:26
🔗
|
luckcolor |
join #archivebot |
11:27
🔗
|
mismatch |
luckcolor: already there :) - thanks though, I need to experiment a bit |
11:29
🔗
|
midas |
in theory it mightjust work with 200k urls |
11:29
🔗
|
midas |
i think ao makes one per url (warc) |
11:29
🔗
|
midas |
also internet is slow |
11:30
🔗
|
midas |
and sketcy |
11:31
🔗
|
luckcolor |
mismatch: if you need voiced or you want to schedule the job feel free to message |
11:31
🔗
|
mismatch |
true, all the urls are web.archive.org/web/[date]/[site] which I'm guessing might also cause recursion issues for archivebot |
11:31
🔗
|
mismatch |
luckcolor: <3 |
11:36
🔗
|
|
BartoCH has quit IRC (Ping timeout: 260 seconds) |
11:36
🔗
|
|
BartoCH has joined #archiveteam-bs |
11:45
🔗
|
|
joepie91 has quit IRC (Read error: Operation timed out) |
11:46
🔗
|
|
botpie91 has quit IRC (Read error: Operation timed out) |
11:48
🔗
|
|
arkiver has quit IRC (Ping timeout: 370 seconds) |
12:11
🔗
|
|
joepie91 has joined #archiveteam-bs |
12:11
🔗
|
|
arkiver has joined #archiveteam-bs |
12:19
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
12:24
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
12:31
🔗
|
|
dashcloud has joined #archiveteam-bs |
12:59
🔗
|
|
metalcamp has joined #archiveteam-bs |
13:18
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
13:30
🔗
|
|
bzc6p has joined #archiveteam-bs |
13:30
🔗
|
|
swebb sets mode: +o bzc6p |
13:33
🔗
|
|
bzc6p sets mode: +oooo arkiver Atluxity chfoo closure |
13:34
🔗
|
|
bzc6p sets mode: +oooo Coderjoe dashcloud FalconK Fletcher |
13:34
🔗
|
|
bzc6p sets mode: +oooo GLaDOS godane HCross HCross2 |
13:34
🔗
|
|
bzc6p sets mode: +oooo joepie91 JW_work1 Kaz Kenshin |
13:34
🔗
|
|
bzc6p sets mode: +oooo luckcolor midas PurpleSym Start |
13:34
🔗
|
|
bzc6p sets mode: +o yipdw |
13:40
🔗
|
|
bzc6p has left |
13:44
🔗
|
|
fie_ has joined #archiveteam-bs |
13:44
🔗
|
|
fie has quit IRC (Read error: Connection reset by peer) |
13:51
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
13:55
🔗
|
|
dashcloud has joined #archiveteam-bs |
14:11
🔗
|
|
nightpool has joined #archiveteam-bs |
14:15
🔗
|
|
DoomTay has joined #archiveteam-bs |
14:20
🔗
|
|
zino has quit IRC (Remote host closed the connection) |
14:23
🔗
|
|
bzc6p has joined #archiveteam-bs |
14:23
🔗
|
|
swebb sets mode: +o bzc6p |
14:24
🔗
|
bzc6p |
DoomTay: livejournal discovery items are usually 1,299 bytes in size, so it's fine. |
14:24
🔗
|
|
BartoCH has quit IRC (Ping timeout: 260 seconds) |
14:24
🔗
|
DoomTay |
Huh. |
14:24
🔗
|
DoomTay |
And I think I" can guess why items usually arn't in KB |
14:25
🔗
|
DoomTay |
I mean DISPLAYED in KB |
14:25
🔗
|
bzc6p |
Feel free to improve the software. |
14:27
🔗
|
|
BartoCH has joined #archiveteam-bs |
14:32
🔗
|
|
nightpool has quit IRC (Read error: Operation timed out) |
14:41
🔗
|
|
Honno has joined #archiveteam-bs |
14:53
🔗
|
|
bzc6p has left |
14:54
🔗
|
|
SanquiAFK has quit IRC (Ping timeout: 260 seconds) |
14:55
🔗
|
|
nightpool has joined #archiveteam-bs |
15:07
🔗
|
|
MrRadar has quit IRC (Quit: Restarting) |
15:08
🔗
|
|
Honno has quit IRC (Ping timeout: 1208 seconds) |
15:34
🔗
|
Frogging |
lol Yoshimura was back I see |
15:35
🔗
|
Frogging |
that was amusing |
15:35
🔗
|
|
Coderjoe has quit IRC (Read error: Operation timed out) |
15:39
🔗
|
|
Coderjoe has joined #archiveteam-bs |
15:52
🔗
|
|
Rye has quit IRC (Ping timeout: 244 seconds) |
15:53
🔗
|
|
DoomTay has quit IRC (Quit: Page closed) |
15:54
🔗
|
|
MrRadar has joined #archiveteam-bs |
15:54
🔗
|
|
Rye has joined #archiveteam-bs |
16:01
🔗
|
|
Sanqui has joined #archiveteam-bs |
16:10
🔗
|
|
kristian_ has joined #archiveteam-bs |
16:43
🔗
|
|
ndiddy has quit IRC (Read error: Connection reset by peer) |
16:44
🔗
|
|
ndiddy has joined #archiveteam-bs |
17:15
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
17:18
🔗
|
|
dashcloud has joined #archiveteam-bs |
19:02
🔗
|
|
Coderjoe has quit IRC (Ping timeout: 260 seconds) |
19:03
🔗
|
|
Coderjoe has joined #archiveteam-bs |
19:04
🔗
|
|
BartoCH has quit IRC (Ping timeout: 260 seconds) |
19:09
🔗
|
|
BartoCH has joined #archiveteam-bs |
19:18
🔗
|
|
tomwsmf has joined #archiveteam-bs |
19:27
🔗
|
|
JesseW has joined #archiveteam-bs |
19:28
🔗
|
JesseW |
I just noticed that, in the year (!) since I put together a pile of metadata about sourceforge projects, six people have forked the repo I put it in. None have *done* anything with the forks -- they merely made them. One only forked one other repo. |
19:28
🔗
|
JesseW |
The world is strange. |
19:30
🔗
|
|
kristian_ has quit IRC (Leaving) |
19:41
🔗
|
JesseW |
joepie91: regarding newww, I found a copy of the code here: https://github.com/rafaeljesus/newww (albeit not particularly up to date) |
19:42
🔗
|
JesseW |
exactly *one* of the issues from the repo was saved in the wayback machine: https://web.archive.org/web/20150224214155/https://github.com/npm/newww/issues/190 |
19:48
🔗
|
hook54321 |
godane: I was talking about the cyclenews site |
19:50
🔗
|
|
BartoCH has quit IRC (Ping timeout: 260 seconds) |
19:51
🔗
|
JesseW |
joepie91: https://github.com/npm/www/issues/9 -- I just asked them to make the old issues available. We'll see what the response might be. |
19:52
🔗
|
JesseW |
I'm not sure whether having you support that request would be a positive or a negative. :-) |
19:54
🔗
|
JesseW |
It would probably be good to grab copies of the issues from all the other npm repos now, just in case. |
19:56
🔗
|
hook54321 |
Did you check Google cache for any of the old issues? |
19:56
🔗
|
JesseW |
hook54321: good idea. Please do so, and dump them into archive.is if you find any |
19:57
🔗
|
hook54321 |
There's a script that I think does stuff like that automatically, I still haven't gotten it work though. Do you know the URL of the old repo? |
19:57
🔗
|
JesseW |
https://github.com/npm/newww |
19:58
🔗
|
JesseW |
there's an even older repo, https://github.com/npm/npm-www with over 900 issues, that it would be good to grab |
20:02
🔗
|
|
BartoCH has joined #archiveteam-bs |
20:07
🔗
|
hook54321 |
Have we put any of the second one through archivebot yet? |
20:08
🔗
|
Frogging |
you don't put GitHub repos in ArchiveBot, by the way |
20:13
🔗
|
|
JesseW has quit IRC (Ping timeout: 370 seconds) |
20:13
🔗
|
hook54321 |
What if we just put in the issues page? |
20:14
🔗
|
hook54321 |
And used lots of ignores |
20:15
🔗
|
Frogging |
oh, issues |
20:16
🔗
|
Frogging |
yeah, probably |
20:17
🔗
|
Frogging |
there's also this. https://github.com/joeyh/github-backup |
20:17
🔗
|
Frogging |
that's what yipdw recommended be used for github stuff |
20:18
🔗
|
yipdw |
it produces more usable results, yes |
20:18
🔗
|
yipdw |
a WARC copy of a github repo is IMO pointless |
20:18
🔗
|
Frogging |
maybe for issues only it would be less useless? |
20:18
🔗
|
yipdw |
it's pointless |
20:18
🔗
|
yipdw |
if you want the issues github-backup gets it too |
20:18
🔗
|
Frogging |
kk |
20:20
🔗
|
yipdw |
a derivation of the issue data into static HTML probably has merit but the github interface has so many links that an automated crawl is a mess |
20:21
🔗
|
yipdw |
!ao tends to be okay if you're trying to be a jerk |
20:25
🔗
|
hook54321 |
My main concern is that their should be some sort of central place where people can browse and upload backups of repositories. |
20:25
🔗
|
yipdw |
people usually do that on github yes |
20:26
🔗
|
yipdw |
google code for example |
20:26
🔗
|
yipdw |
GitLab has a Github import function which works reasonably well, also |
20:27
🔗
|
yipdw |
I do wonder however someone will fund this central place |
20:28
🔗
|
hook54321 |
I mean we could upload backups of github repos to the Internet archive, but it would still be useless until the user downloaded the whole archive, which could potentially be huge. |
20:29
🔗
|
yipdw |
yep |
20:29
🔗
|
yipdw |
or you do a shallow clone |
20:31
🔗
|
yipdw |
if you're interested in saving copies of git repositories and their associated data, I think self-hosted gitlab is a good choice |
20:31
🔗
|
yipdw |
I run an instance that does more or less that |
20:31
🔗
|
yipdw |
then I have my backups of dependent libraries and I only need to worry about keeping my instance healthy |
20:34
🔗
|
dashcloud |
for github backups, codearchive.org is doing a backup of every repo with 10 stars or more (and less than 250 MB in size, unless whitelisted) and every change |
20:34
🔗
|
yipdw |
oh, actually, correction |
20:35
🔗
|
yipdw |
github->gitlab import requires your github OAuth tokens and so you can only do that backup for repositories you're authorized on |
20:35
🔗
|
yipdw |
so, yes, if you're a project member it's a good choice :P |
20:35
🔗
|
dashcloud |
here's the project site: https://the-code-archive.launchrock.com/ |
20:36
🔗
|
yipdw |
wow |
20:36
🔗
|
yipdw |
nice |
20:37
🔗
|
yipdw |
oh cool, Filippo Valsorda is involved in that |
20:37
🔗
|
yipdw |
(he also wrote the Warrior Dockerfile) |
20:38
🔗
|
hook54321 |
Interesting. About what percentage of repos are 10 stars or more and less than 250 MB? |
20:39
🔗
|
yipdw |
https://github.com/search?utf8=%E2%9C%93&q=stars%3A%22%3E+10%22+size%3A%22%3C%3D+262144000%22&type=Repositories&ref=advsearch&l=&l= |
20:40
🔗
|
yipdw |
there's 3,466,477 with at least one star |
20:41
🔗
|
yipdw |
oh, oops, that's supposed to be size in kB |
20:41
🔗
|
yipdw |
https://github.com/search?utf8=%E2%9C%93&q=stars%3A%22%3E+10%22+size%3A%22%3C%3D+262144%22&type=Repositories&ref=searchresults is fixed |
20:42
🔗
|
yipdw |
and it should be >= but no matter what you're looking at 10-11% |
20:42
🔗
|
|
robink has quit IRC (Ping timeout: 501 seconds) |
20:43
🔗
|
yipdw |
which rounds pretty nicely with Sturgeon's Law |
20:49
🔗
|
hook54321 |
Not to mention, there are probably tons of repositories that are just empty. |
20:49
🔗
|
dashcloud |
in case you're wondering, the repo itself is backed up because it got 10 stars by the end of the talk (which is here: http://livestream.com/internetsociety2/hopeconf/videos/130613964) |
20:50
🔗
|
hook54321 |
Which repo? |
21:02
🔗
|
|
Honno has joined #archiveteam-bs |
21:14
🔗
|
|
robink has joined #archiveteam-bs |
21:18
🔗
|
|
Honno has quit IRC (Read error: Operation timed out) |
21:34
🔗
|
|
DoomTay has joined #archiveteam-bs |
21:52
🔗
|
|
whydomain has joined #archiveteam-bs |
21:59
🔗
|
whydomain |
Can anyone recommend a (preferably python) browser emulator? Basically all I want to do is navigate to a web page with a cookie preloaded, and then click on two div elements. |
22:13
🔗
|
MrRadar |
whydomain: PhantomJS |
22:13
🔗
|
MrRadar |
It's basically a scriptable headless version of Chromium |
22:15
🔗
|
|
DoomTay_ has joined #archiveteam-bs |
22:15
🔗
|
whydomain |
Are there any examples of usage for loading in pages an manipulating elements? I can't find any. |
22:16
🔗
|
whydomain |
^and manipulating |
22:16
🔗
|
|
DoomTay has quit IRC (Ping timeout: 268 seconds) |
22:16
🔗
|
MrRadar |
This is apparently an example for injecting JS into a page: https://github.com/ariya/phantomjs/blob/master/examples/injectme.js |
22:18
🔗
|
MrRadar |
It looks like this talks about manipulating the DOM: http://phantomjs.org/page-automation.html |
22:19
🔗
|
MrRadar |
I've never use PhantomJS myself so I'm not too knowledgable about it |
22:22
🔗
|
whydomain |
Ah, thanks. That was just the example I was looking for. |
22:59
🔗
|
|
JesseW has joined #archiveteam-bs |
23:10
🔗
|
|
ItsYoda has quit IRC (Ping timeout: 260 seconds) |
23:12
🔗
|
hook54321 |
is archivebot kept more up to date than grab-site? |
23:13
🔗
|
JesseW |
ha, I doubt it. |
23:13
🔗
|
JesseW |
hook54321: what do you mean by "up to date"? |
23:14
🔗
|
hook54321 |
Developed more I guess? |
23:15
🔗
|
JesseW |
archivebot isn't developed at all currently. Effort is explicitly directed to grab-site (or the wrapper around it, I can't remember) instead. |
23:15
🔗
|
JesseW |
(I may be mistaken about this -- if so, yipdw should be able to correct it) |
23:19
🔗
|
|
ItsYoda has joined #archiveteam-bs |
23:43
🔗
|
|
Coderjoe has quit IRC (Read error: Operation timed out) |
23:45
🔗
|
|
DoomTay_ is now known as DoomTay |
23:50
🔗
|
|
Coderjoe has joined #archiveteam-bs |
23:54
🔗
|
hook54321 |
Think grab-site would run on a computer with an Atom N270 processor? |
23:56
🔗
|
MrRadar |
I don't see why it wouldn't |
23:56
🔗
|
MrRadar |
I've run wpull (the core component of grab-site/ArchiveBot that actually does the scraping) on a 1st gen Raspbery Pi |
23:56
🔗
|
MrRadar |
Which is almost certainly slower |
23:56
🔗
|
|
wp494 has joined #archiveteam-bs |
23:57
🔗
|
JesseW |
ah, right -- it's wpull that's the kernel, and grab-site/Archivebot that are the wrappers |